A Chinese Open-Source Model Just Beat Claude Fable 5 in a Blind Design Test

Design Arena isn’t a standard benchmark. There are no multiple-choice questions, no code completion tests, no math problems. It‘s a crowdsourced blind test where real users vote on which AI-generated web page looks and feels better. In the AI evaluation community, it’s considered one of the most industry-relevant benchmarks for aesthetics and practical design capability.

On June 20, Design Arena announced on X that GLM-5.2 had taken the top spot in the single-round HTML web design leaderboard, surpassing Claude Fable 5, Opus 4.6, and Opus 4.7. The model climbed five positions from its predecessor, GLM-5.1, reaching an Elo score of approximately 1360. This wasn‘t a win on “reasoning” or “coding accuracy.” It was a win on visual taste and execution. GLM-5.2 proved that an open-source model can produce web pages that human judges prefer over Anthropic’s flagship.

A Chinese Open-Source Model Just Beat Claude Fable 5 in a Blind Design Test — DesignArena

Why GLM-5.2 Won

Design Arena‘s analysis identified several factors behind the win. The model excels at layout and visual structure. It handles external CDN images skillfully and delivers strong performance in typography, visual hierarchy, and animations. The model is particularly efficient at using third-party libraries. Sessions using chart.js or three.js saw a 6.0 percentage point win rate improvement. GLM-5.2 uses TailwindCSS in 91% of sessions and Font Awesome in 51%, compared to Fable 5’s 57% Tailwind usage. That difference in tooling appears to give GLM-5.2 a practical edge.

The price difference is stark. GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens. Fable 5 charges $10 and $50 — roughly ten times more. For a model that generates web pages that look better, that price gap isn‘t incremental. It’s disruptive.

The Trade-Offs

GLM-5.2 is not a general-purpose winner. In game development, data visualization, and 3D design, Fable 5 still holds the top spots. In UI component generation, GLM-5.2 ranks only fourth. Many of its generated designs also share a certain similarity, while Fable 5 produces more variety.

The model is slower. It takes about 305 seconds to generate a page — roughly twice as long as Fable 5. It also produces about 25% more code, which can be inefficient for production use. But for a model that‘s open-source, MIT-licensed, and deployable locally, those trade-offs may be acceptable for many developers. The question isn’t whether GLM-5.2 is better than Fable 5 overall. It‘s whether it’s good enough for the tasks that matter — at a price that makes sense.

The Timing

The announcement came on June 20, just over a week after the US government forced Anthropic to cut off non-US access to Fable 5 and Mythos 5. Zhipu open-sourced GLM-5.2 under the MIT license on June 13, with a deliberate timing choice.

Zhipu‘s statement was explicit: “Frontier intelligence should not belong to a few, nor should it be withdrawn at any time by a few rules. It should be open, available, buildable, and serve every developer.”

The Design Arena win is one data point. But combined with the MIT license, the ten-times-lower cost, and the timing of its release, it’s a data point that changes the conversation about what open-source AI can do — and who can use it.

P.S. The most telling detail in the Design Arena analysis isn‘t the Elo score. It’s the 91% Tailwind usage and the 6-point boost from third-party libraries. GLM-5.2 didn‘t win by being smarter. It won by using the right tools, consistently, in ways that other models haven’t learned yet. That‘s not a breakthrough. It’s a signal. And it‘s a signal that open-source models are learning faster than the closed-source incumbents expected.

A Chinese Open-Source Model Just Beat Claude Fable 5 in a Blind Design Test

Why GLM-5.2 Won

The Trade-Offs

The Timing

Continue Down This Path

Anthropic Just Finished Mythos 5.1. Sonnet 5 Arrives This Week.

Sakana Fugu Just Matched Fable 5. But It‘s Not a Model.

CRAZE