Pliny jailbreaks OpenAI's GPT-OSS-120b models

OpenAI’s latest open-weight models, GPT-OSS-120b and GPT-OSS-20b, released on August 7, 2025, were reportedly jailbroken within hours of their launch by the pseudonymous AI jailbreaker, Pliny the Liberator, despite OpenAI’s claims of robust safety measures and extensive adversarial training.

The models, the first open-weight releases from OpenAI since 2019, were touted as fast, efficient, and highly resistant to jailbreaks. OpenAI stated that GPT-OSS-120b underwent “worst-case fine-tuning” in biological and cyber domains, with its Safety Advisory Group reviewing the testing and concluding that the models did not reach high-risk thresholds. The company also claimed the models performed at parity with their o4-mini model on jailbreak resistance benchmarks like StrongReject, based on “standard refusal and jailbreak resistance tests.”

However, Pliny the Liberator announced on X (formerly Twitter) late on the day of release, “OPENAI: PWNED 🤗 GPT-OSS: LIBERATED,” sharing screenshots that purportedly showed the models generating instructions for illicit activities, including making methamphetamine, Molotov cocktails, VX nerve agent, and malware. Pliny commented, “Took some tweakin!” regarding his successful breach.

🫶 JAILBREAK ALERT 🫶
OPENAI: PWNED 🤗
GPT-OSS: LIBERATED 🫡
Meth, Molotov, VX, malware.
gg pic.twitter.com/63882p9Ikk
— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) August 6, 2025

The timing of this jailbreak is particularly noteworthy as OpenAI is preparing for the release of its highly anticipated GPT-5. In conjunction with the GPT-OSS release, OpenAI had also launched a $500,000 red teaming challenge, inviting researchers to uncover novel risks, though Pliny’s public disclosure of his findings likely disqualifies him from this initiative.

Pliny’s technique for jailbreaking GPT-OSS followed his established pattern: a multi-stage prompt that initially appears to be a refusal, then incorporates a divider (his signature “LOVE PLINY” markers), and subsequently shifts into generating unrestricted content using leetspeak to evade detection. This approach mirrors the methods he has successfully employed against previous OpenAI models, including GPT-4o and GPT-4.1, over the past year and a half.

This incident marks another rapid jailbreak by Pliny, who has consistently managed to bypass major OpenAI releases within hours or days of their launch. His GitHub repository, L1B3RT4S, which hosts a library of jailbreak prompts for various AI models, has garnered over 10,000 stars and remains a significant resource for the AI jailbreaking community. The perceived “victory” over the “big tech overlords” has been celebrated within the AI resistance community, with some users on X suggesting that AI labs might as well “close their safety teams.”

Pliny jailbreaks OpenAI’s GPT-OSS-120b models

Related Stories

Pixel 11 leak hints at new magenta and peach color options

Microsoft updates Windows 11 search with cleaner design and no ads

X updates algorithm to prioritize posts from mutual connections

Xiaomi launches SkyNomad brand with first extended-range SUV lineup