GPT-5.6 Sol Ultra Is Here. It Doesn’t Just Answer

Three Tiers, One Winner

OpenAI released GPT-5.6 Sol on June 26, alongside Terra and Luna — a three-tier lineup that replaces the old Pro/Mini naming with Sol, Terra, and Luna.

The benchmark numbers are impressive. Sol scored 88.8% on Terminal-Bench 2.1, a test that measures how well a model can navigate a command-line environment, plan steps, run commands, interpret errors, and recover — more like real software engineering than a multiple-choice exam . With Ultra Mode enabled, that score jumped to 91.9%, surpassing Claude Mythos 5’s 88.0% .

On ExploitBench, Sol matched the performance of Anthropic’s Mythos Preview — the model that identified over 10,000 critical vulnerabilities — while using about one-third of the output tokens . On GeneBench v1, Sol outperformed GPT-5.5 while consuming fewer tokens . In CTF-style capture-the-flag challenges, Sol hit a 96.7% success rate .

GPT-5.6 Sol Ultra Is Here. It Doesn’t Just Answer — It Delegates. — Tibo

The pricing is also notable. Sol costs $5 per million input tokens and $30 per million output — roughly half of Claude Fable 5. Terra costs half of Sol, and Luna is 80% cheaper than Terra .

Sol will also be available on Cerebras hardware in July, delivering inference speeds up to 750 tokens per second .

Ultra Mode: The Model That Builds Its Own Team

The benchmark jump isn‘t the story. The Ultra Mode is.

Ultra Mode doesn’t just make the model“think longer” — it lets the model spin up multiple sub-agents to work on different parts of a problem in parallel . This is different from Anthropic‘s Agent Teams, where humans design the collaboration workflow. Ultra Mode is the model doing the task decomposition and coordination itself.

The Terminal-Bench 2.1 score that beat Mythos was achieved with Ultra Mode enabled. That’s not a reasoning upgrade. That‘s an architecture upgrade — a model that can orchestrate its own workforce.

The “Find It Yourself” Challenge

On July 2, SiliconANGLE co-founder and editor-in-chief John Furrier posted on X: “Can’t wait to see what people will do with GPT-5.6 Sol Ultra. Stash your hardest prompts somewhere.” The post invited the community to test the model and share the hardest prompts that break it.

That’s not a marketing line — it‘s a direct challenge to the AI community. The goal is for people to try breaking the new model with their toughest questions.

There is also a regulatory dimension. The release was limited to about 20 trusted partners at the request of the US government, with access approved on a case-by-case basis . OpenAI made it clear this wasn’t its preferred model: “We don‘t believe this kind of government access process should become the long-term default.”

P.S. Sol Ultra is live — but only for a handful of partners. The benchmark gap over Mythos is real, but the bigger question is what happens when developers start pushing the sub-agent architecture beyond benchmarks. That’s what the “stash your hardest prompts” invitation is actually about — not validation, but discovery. The hardest prompts people find will become the next dataset.

GPT-5.6 Sol Ultra Is Here. It Doesn’t Just Answer — It Delegates.

Three Tiers, One Winner

Ultra Mode: The Model That Builds Its Own Team

The “Find It Yourself” Challenge

Continue Down This Path

Unitree Robotics Cleared Its IPO in 73 Days. The Hardware Race Is Won. The Brain Race Is Next.

OpenAI’s 5% Solution: Buying a Seat at the Table

CRAZE