OpenAI's $25k GPT-5.5 Jailbreak Bounty: AI Safety Test

OpenAI is putting a $25,000 bounty on GPT-5.5’s head. Specifically, they are looking for a crack in its biological armor.

The GPT-5.5 Bio Bug Bounty is a targeted red-teaming challenge. The goal is simple: find a single, universal jailbreak that bypasses the model’s biosafety filters across five specific, high-risk questions. No moderation flags allowed. No multi-turn coaxing. Just one prompt to unlock the things the model is strictly forbidden from discussing.

For the record: this is the first major performance stress test for GPT-5.5 since its "agentic" capabilities became the center of the conversation.

The Context

This isn’t OpenAI’s first attempt to gamify its own security. We saw similar programs for GPT-5 and the standalone ChatGPT agent. But the stakes have shifted. Those earlier rounds usually involved ten questions covering a broad range of chemical and biological risks. This time, the list is narrowed to five.

The move from GPT-5 to 5.5 represents a shift toward autonomy. When a model can execute code and navigate a desktop—specifically the Codex Desktop environment targeted here—the "safety" of its biological knowledge isn't just a matter of what it says. It’s a matter of what it can help a user do.

The Number That Matters

$25,000. That is the price OpenAI has set for a "universal" failure.

In the world of LLM security, a one-off jailbreak is a nuisance. You patch the specific keyword or the semantic path, and you move on. A "universal" jailbreak is different. It’s an engineering flaw. It suggests that the safety layer is not an integral part of the model’s reasoning, but a thin, permeable membrane that can be bypassed by the right linguistic frequency.

By offering a high reward for a single prompt that clears all five hurdles, OpenAI is effectively asking: is our safety architecture a structural wall or just a series of "Do Not Enter" signs?

The Question Behind the Score

Benchmarks are human-defined limits. OpenAI chose these five questions. They chose the Codex environment. They chose the $25,000 valuation.

The interesting part isn't whether the model is "safe." The interesting part is what OpenAI considers a "win." By focusing on a universal jailbreak, they are ignoring the "death by a thousand cuts" approach—the slow, iterative prompting that most real-world actors would actually use. They are testing for a catastrophic, singular breach.

This is safety as a spectacle. It’s a high-stakes way to signal to regulators that the model has been "battle-tested." But a model that passes a five-question gauntlet isn't necessarily secure; it’s just proven to be resilient against that specific gauntlet. The numbers say the model is harder to crack. Note what they don't say: how it performs when the researcher isn't looking for a $25,000 payday, but for a way to actually use the information.

Adding this to the leaderboard. We’ll see if anyone cashes the check by the July 27 deadline.

Filed.

OpenAI's Paying $25k to See If GPT-5.5 Can Be Tricked Into Spilling the AI Tea

Key Takeaways

The Context

The Number That Matters

The Question Behind the Score

Related Transmissions

OpenAI's GPT-5.5: It's Not a Chatbot, It's Your New Overlord

OpenAI's New Image Generator Thinks Before It Inks

ClosedAI Opens Its Kimono: PII Scrubbing Model Leaks to the Public