Anthropic has apologized for secretly throttling its AI model, Claude Fable 5, with hidden guardrails that hinder development for researchers and competitors. The company stated it will improve transparency regarding when these restrictions apply, even if this leads to Fable refusing more queries.

Fable is the first widely available model in Anthropic’s Mythos class of AI systems, which the company has warned are too dangerous for public release. It launched with safeguards that prevent it from responding to certain “high-risk” queries.

One area of restriction is distillation, a method for training smaller models using outputs from larger ones. In Fable’s system card, Anthropic indicated that it would alter and degrade answers to queries perceived as distillation attempts without informing users of these changes.

Now, queries suspected of being distillation attempts will default to Claude Opus 4.8, the company’s earlier flagship model, and users will receive notifications whenever this occurs. This fallback also applies to other high-risk domains like biology, chemistry, and cybersecurity, unless those queries are entirely blocked due to broader safety regulations against topics like drugs and weapons.

The company acknowledged that its safety measures have inadvertently rendered Fable nearly unusable for basic queries in areas like biology due to excessive restrictions. Anthropic admitted that the use of invisible safeguards was a mistake, emphasizing that transparency in safety measures is critical.

The company’s decision to conceal restrictions faced significant backlash from the AI research community, which argued that it limited the model’s capabilities for both evaluators and competitors. Anthropic stated that utilizing Claude to create competing models violates its Terms of Service, having previously accused rivals, including DeepSeek, of distilling its models on an industrial scale.

“Visible safeguards can be probed, so they have to be robust, which takes time to get right,” Anthropic wrote. “Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right,” the company added.


Featured image credit