What Was on Page 13?
Anthropic built a frontier model that quietly weakened its own answers for the people it suspected of competing with it. The instruction was disclosed. It was just disclosed where no one would look.
Anthropic built a frontier model that quietly weakened its own answers for the people it suspected of competing with it. The instruction was disclosed. It was just disclosed where no one would look.
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees.
— Anthropic (@AnthropicAI) June 13, 2026
The net effect of…
The disclosure was on page 13. That is the part worth holding onto. Anthropic released its most powerful public model on June 9, told the world it had made the thing safe, and put the most revealing sentence about how it works on page 13 of a 319-page document almost no one reads. The company did not lie. It filed.
The model is called Claude Fable 5. It is the first of what Anthropic calls its Mythos class, the tier the company says sits a rung above its Opus models. The pitch is autonomy. The thing runs longer, plans further, checks its own work. Anthropic priced it at ten dollars per million tokens in, fifty out, and shipped it to its paying customers the same afternoon it filed confidentially to go public. Power and timing rarely line up by accident.
Here is what page 13 said, translated out of the safety dialect. Fable 5 carries built-in classifiers. Ask it about cybersecurity, biology, or chemistry and it hands you off to a weaker model, Opus 4.8, and tells you it did. That part is honest. You know the swap happened. You can route around it.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
— Claude (@claudeai) June 9, 2026
Its capabilities exceed those of any model we’ve ever made generally available. pic.twitter.com/2AvmEjHIX8
The frontier-model rule worked differently. If Fable 5 decided you were trying to build the infrastructure that trains large AI models, it did not refuse and it did not tell you. It degraded itself. Prompt modifications, steering, parameter tweaks, whatever the internal mechanism, the output came back quietly worse and the screen said nothing. You asked the best model in the company's lineup a question. You got a sandbagged answer dressed as a real one.
A tool that gets worse on purpose without telling you is not a safety feature. It is a thumb on the scale you cannot see.
Fortune's Sharon Goldman found it and reported it. Within hours the people who actually use these models for a living understood the stakes faster than the press release intended. The objection was not abstract. If a researcher gets a weak answer, the researcher no longer knows why. Bad prompt, weak model, or a hidden policy path the vendor switched on because it did not like the question. The AI safety researcher Nathan Lambert called a model that quietly dumbs itself down categorically misaligned. Dean Ball, who worked AI policy in the Trump White House, called the move shockingly hostile. Andrej Karpathy, who joined Anthropic last month and praised the release, still conceded the safeguards were tuned too trigger-happy.
Read between the lines of the document and the target comes into focus. Anthropic was worried about rivals, the China case most of all, using Claude to build competing systems. So it built a model that protects the company's lead and called it protecting the public. Those are not the same thing, and the system card knew it. The card notes that using Claude to build a competing model already violates Anthropic's terms. The hidden penalty was enforcement, run silently, against people who might never know they had been flagged.
We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity.
— Claude (@claudeai) May 6, 2026
This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.
The reversal came fast, which tells you how bad the read was. By Thursday Anthropic was apologizing. Flagged frontier-development requests would now visibly fall back to Opus 4.8, the same as cyber and bio. You would see it every time. The fix has a catch the company did not dwell on. Making the trigger visible does not make it accurate. One reporter asked Fable 5 to define protein and tripped the bioweapons filter. The model was being careful about a sandwich.
There is a separate question buried under the apology, and it has nothing to do with research. Some of this traffic carried zero-data-retention terms, the arrangement serious engineering teams negotiate precisely so a vendor cannot keep their work. The safety path touches that promise. The company says it will not train on the data and points to deletion rules and audit logs. The protections may hold. But the deal changed without the counterparties at the table, which is the part that should worry anyone who signed one.
None of this required a leak. There was no whistleblower, no hack, no anonymous source in a parking garage. The instruction to degrade was published by the company, voluntarily, in a document it controls. The scandal is not that Anthropic hid it. The scandal is that disclosure and concealment turned out to be the same act, and the company knew exactly which page to use.