In June 2026, Anthropic suspended Fable 5 — banned by a US export-control order and unavailable overnight. The reflex is to wait for a replacement, but you don’t have to: there’s now a way to get Fable 5–level output without Fable 5.
That Fable 5 alternative is OrcaRouter, and its Routing DSL is the layer where you reconstruct Fable 5–level output from the models you can still get. The rest of this piece walks through how it works — and, just as importantly, why the underlying technique is grounded engineering rather than a marketing slogan.
A Fable 5–level endpoint, out of the box
You don’t have to build any of this yourself to get the result. OrcaRouter ships the whole panel-composition strategy as a ready-made endpoint: set your model to orcarouter/fusion and you get Fable 5–level output out of the box — a pre-tuned Routing DSL that fans out to a panel of strong models and picks the best answer for you, with nothing to configure. It’s a drop-in, OpenAI-compatible endpoint — you point your app at OrcaRouter and use your OrcaRouter key, and your existing code keeps working unchanged across 200+ models, with no token markup.
A fan-out request is billed as the sum of its panel members plus the judge — only on the requests that actually fan out, and with zero markup.

The rest of this piece is what’s happening under that endpoint, and how to shape it yourself when you want to.
Capability you compose, not capability you wait for
For roughly two years, “stronger AI” has meant “the next bigger checkpoint.” Progress arrived as releases: you waited, a lab shipped, you upgraded, you waited again — a rhythm that trained the whole industry to treat capability as something it receives rather than something it constructs.
There’s a second line of progress, quieter and mostly out of the headlines: instead of chasing one larger breakthrough, you orchestrate the models you already have into a system that collaborates and checks its own work. A suspended or gated model is, from this angle, a supply problem — and orchestration is a supply answer. The interesting unit of progress stops being the checkpoint and becomes the topology.
The mechanism: why a panel beats its own members
The claim that “several models combined can outperform any one of them” sounds like marketing until you see why. The key word is decorrelated. Models trained on different data with different architectures have different blind spots; when they’re wrong, they tend to be wrong in different ways. Run them independently and then select the right answer — by a vote, a judge, or a passing test — and the errors don’t stack, they partially cancel, so combined accuracy climbs above any single member’s.

Why a panel of models beats its members: they make different mistakes, and a judge keeps the best answer
This isn’t one paper but a through-line in how the field spends inference-time compute to buy accuracy: self-consistency (sample many reasoning paths, take the majority), mixture-of-agents (layer models so each refines the last), LLM-as-a-judge (one model scores the others), and the broader compound AI systems thesis that frontier capability is migrating from single models to systems. The honest framing isn’t that composition makes a small model secretly large — it’s that composition turns disagreement between models into a higher-accuracy signal you can harvest.
How OrcaRouter expresses it: parallel fan-out plus an arbiter
Prefer to configure it yourself rather than leave everything on autopilot? You can — and this is where you do it. OrcaRouter turns the orchestration into something you declare in a YAML file, with conditions written in Google’s CEL (sandboxed, read-only, evaluated in microseconds). Rules match top to bottom; the first match wins. The move that reconstructs frontier-level quality is parallel (fan-out) plus an arbiter:
use:
parallel: # 2–5 models answer in parallel
– { model: “anthropic/claude-opus-4-8” }
– { model: “openai/gpt-5.5” }
– { model: “google/gemini-3.1-pro” }
arbiter:
strategy: best_of_n # a judge model ranks the candidates
model: “anthropic/claude-opus-4-8”

OrcaRouter routes by difficulty, then fans out the hard tail to a panel and a judge
Four arbiter strategies map to four ways of picking a winner — and a panel is only as good as its selector: first (race; lowest latency), majority (a free vote, no extra call), best_of_n (a judge ranks candidates; highest general quality), and tests_pass (run the code, whoever passes wins — execution-grounded, ideal for coding). Worried the panel itself stumbles? Add a confidence cascade: when a winning response trips a signal like patch_invalid (the patch won’t apply) or self_doubt (the model hedges), OrcaRouter automatically re-dispatches to a stronger, higher-effort leg — so you pay for the extra call only when there’s evidence you need it.
Intelligence bought with topology, not a higher price tier
Fan-out bills every leg — which is exactly why difficulty-gated fan-out matters. OrcaRouter scores each request’s difficulty, so the easy majority of traffic goes to a single cheap model while only the hard tail convenes a panel:
rules:
– id: trivial
when: difficulty < 0.3
use: { model: “google/gemini-3-flash” }
– id: standard
when: difficulty < 0.7
use: { model: “openai/gpt-5.5” }
– id: hard
when: difficulty >= 0.7
use:
parallel:
– { model: “anthropic/claude-opus-4-8” }
– { model: “openai/gpt-5.5” }
– { model: “google/gemini-3.1-pro” }
arbiter: { strategy: best_of_n, model: “anthropic/claude-opus-4-8” }
default:
delegate: balanced

Most traffic takes the cheap path; only the small hard tail uses a panel
Your blended cost is dominated by the cheap path, because that’s where the volume is; your quality ceiling is set by the panel, because that’s where the hard requests go. You spend frontier money only on requests that are genuinely frontier-hard.
Safe to ship
Changing routing is a high-stakes operation, so OrcaRouter wraps it in a safety net. lint checks the schema, CEL types, and model references on save. dry-run fires your rules against synthetic requests so you can see which one each hits. shadow mode evaluates the DSL on live traffic without adopting it, reporting the routing diff, the A/B quality delta, and the projected cost change. canary then ramps real traffic 5% → 25% → 100% with one-click rollback. You measure a new strategy against your own traffic before committing to it.
Build it, don’t wait
None of this requires a research lab. Out of the box you point your app at OrcaRouter and let it route; when you want to go further, you express the orchestration in a YAML file — route by difficulty and task, fan out to a panel on the hard tail, add a judge and a fallback cascade, and tune for cost, latency, or quality — then de-risk the rollout with lint, dry-run, shadow mode, and a canary slider. The frontier stops being a model you wait for and becomes a graph you author, reproduce, and control — one that doesn’t disappear when a single model does. Start with the Routing DSL docs: docs.orcarouter.ai/routing/routing-dsl.





