Claude Fable 5: a deep dive beyond the hype

Ethan Mollick has been testing frontier AI models since before ChatGPT launched. The Wharton professor, who writes the newsletter One Useful Thing, has long described using them as working with a wizard. You chant the spell and something happens.

This week he wrote that the spell has become powerful enough that he is no longer sure he is the wizard. After early access to Claude Fable 5, Anthropic's new public model, his verdict was that it represents a real leap over anything he had used before.

This week’s video is Claude Fable 5: A Deep Dive Beyond the Hype.

What stayed with him was how little he did to get the results. “I describe what I want, I pay for it, and I judge the result,” he wrote. “The conjuring happens somewhere I cannot watch.”

From Mythos to Fable

In April, Anthropic announced a model called Mythos that it said was too powerful to release, because it was good enough at finding holes in software security to cause real damage. It went to a small group of cyber defenders under a government programme called Project Glasswing.

Fable 5 is that same underlying model The two share an architecture, separated only by a set of safeguards. Even the naming makes the point: Fable comes from the Latin fabula, which sits close to the Greek mythos.

The safeguards hand certain requests to a less capable model. Ask Fable 5 about cybersecurity, biology or chemistry and the answer comes back from Claude Opus 4.8 instead. Anthropic says this happens in fewer than 5% of sessions, and admits harmless requests get caught too.

A week before launch, Anthropic warned that AI systems might soon be able to build their own successors and called for the major labs to agree a way to slow down. The company arguing for a brake pedal is also shipping its most powerful model and filing for an IPO that could value it around a trillion dollars.

What the benchmarks show

Mollick's clearest example is a map. He asked Fable 5 to build an isochronic map, which shows how far you can travel from a city in a given time. The model launched cheaper AI agents to retrieve over 2,200 flights and rail schedules, wrote the code while they ran, then set more agents to test it.

The published numbers tell a consistent story. On SWE-Bench Pro, a test of agentic coding, Fable 5 scores 80.3%, against 69.2% for the previous Claude Opus 4.8, 58.6% for GPT 5.5 and 54.2% for Gemini 3.1 Pro. The gap widens as the tasks get harder and longer.

On FrontierCode, which measures whether a model can handle difficult coding work to the standard of a real production codebase, Fable 5 manages 29.3% where Opus reaches 13.4% and GPT 5.5 just 5.7%. On Humanity's Last Exam, a broad multidisciplinary reasoning test, it scores 64.5% with tools.

One caveat sits in the footnotes. The cybersecurity and biology figures reflect the unrestricted Mythos model. Ask the public Fable those questions and the safeguards drop you to the weaker Opus, so the starred scores describe capability most users will never touch.

Customer reports match. Anthropic says Stripe used Fable 5 to migrate a fifty-million-line codebase in a single day, work that would have taken a team over two months. The unit of work has changed: you describe an outcome and receive a finished thing. Mollick's word for the new role is ‘patron.’

Upgrade to Absolutely Agentic Premium

Join our growing subscriber community and get access to:

Our daily Agentic Intelligence newsletter curates all the best stories from the top AI newsletters, development labs, news websites and social media into one email.
Ad-free podcast - premium ad free RSS feed that you can put into your podcast app of choice. We publish about 3 hours of video a month and now its available in audio form.
Subscriber area includes long form guides to the most important AI trends affecting content and marketing, along with automations and prompts that will save you a lot of time.

UPGRADE TO PREMIUM

The price of the power

What made Mollick uneasy was that the work was good and he had so little part in it. The model made hundreds of small choices about data, method and design, and simply made them. A more capable model gives you less control over how it works, because more is happening than a person can hold in their head.

But there are drawbacks: Fable 5 costs twice as much as Opus, its safeguards trip too often, and the jagged frontier of AI competence remains.

The benefits are real too. The same capability is hardening critical software against attack, accelerating drug discovery, and building tools nobody could afford to make before. Mollick thinks we may end up needing more programmers, to keep up with the flood of new software that becomes possible.

What we don't yet know is whether being sidelined is a phase or a destination. If the tools improve, we go back to steering. If they don't, the most useful AI is also the least legible, and the open question is whether we are ready to commission work whose process we don't really understand.