Models

Open-Weight Model GLM-5.2 Matches Frontier Models on Security Benchmarks at One-Sixth the Cost

GLM-5.2, a Chinese open-weight model, matches frontier models on IDOR detection at one-sixth the cost, but its free license also enables hackers to remove safety guardrails.

Jeff Editorial | · 3 min read
Open-Weight Model GLM-5.2 Matches Frontier Models on Security Benchmarks at One-Sixth the Cost

Security firm Semgrep ran a set of open-source models against its internal IDOR (Insecure Direct Object Reference) benchmark—the same dataset and prompt used to evaluate frontier coding agents. The result surprised them. GLM-5.2, an open-weight model from Zhipu AI, scored 39% F1 on IDOR detection, beating Claude Code (32%) at roughly $0.17 per vulnerability found.

That‘s an important distinction. The Semgrep Multimodal pipeline still led the benchmark at 61% and 53% F1—but that pipeline runs in a purpose-built harness that enumerates endpoints and guides the model toward them. GLM-5.2, by contrast, was given nothing but a prompt and a codebase. It didn’t get the endpoint-discovery scaffolding. It just found vulnerabilities on its own.

Among models given no endpoint guidance, the best open-weight option was no longer the obvious underdog, beating out Claude Opus 4.8. That‘s a significant milestone for open-weight AI.

Graphistry separately assessed GLM-5.2 as the first open-weight model it could recommend for a “frontier-grade cybersecurity experience.”

Open-Weight Model GLM-5.2 Matches Frontier Models on Security Benchmarks at One-Sixth the Cost
GLM 5.2

$0.17 per Vulnerability vs. $10,000 a Month

The economics are stark. GLM-5.2 pricing lands around one-sixth of comparable frontier models. At roughly $1.40 per million input tokens, a team paying close to $10,000 a month for a closed system can often handle comparable work for a fraction of that cost on the open-weight model.

GLM-5.2, released by Z.ai under the MIT license, is a Mixture-of-Experts model with roughly 744B total parameters (40B active) and a 1M-token context window. According to published benchmarks, it is the first open-weight model to break 80% on Terminal-Bench 2.1, trails Claude Opus 4.8 by about one point on FrontierSWE, and matches or beats GPT-5.5 on several long-horizon coding tasks at a fraction of the cost.

Why Chinese Models Are Catching Up So Quickly

Graphistry researchers raised suspicion that GLM-5.2 could be the result of illicit distillation of GPT-5.5 and Opus 4.8. Z.ai has not commented on the matter.

If confirmed, it would explain why Chinese models are closing the gap so quickly—and would add another layer to the ongoing debate about AI intellectual property and export controls.

For Defenders, a Tool. For Attackers, a Gift.

Unlike Claude or ChatGPT, GLM-5.2 is open-weight. Anyone can download and modify it. Users can remove safety guardrails, fine-tune the model for specific tasks, and run it locally without any commercial provider oversight.

GuidePoint Security consultant Jason Baker told Axios that jailbreak methods to use GLM-5.2 for hacking work are already being shared on Russian-language hacker forums. Some have confirmed safety guardrails can be lifted simply by saying they want to“protect our company from brute-force attacks,” as if for defensive purposes.

Armadin CTO Travis Lanham said GLM-5.2 can“automate lateral movement and exploit chaining after a system intrusion at an elite-hacker level.” Attackers can run it locally without safety guardrails, fine-tune it for a specific target, and operate without being exposed to any provider or defender.

Halcyon ransomware threat intelligence analyst Roye Bass said attackers can download GLM-5.2 and build their own tools to generate phishing emails, scam scripts, and other malicious content without risk of being blocked.

Open-Weight Model GLM-5.2 Matches Frontier Models on Security Benchmarks at One-Sixth the Cost
39% F1 at $0.17. That math changes decisions.

Mythos Was Blocked. GLM-5.2 Arrived Days Later.

Mythos was locked behind export controls in mid-June. GLM-5.2 was released days later. The timing was not subtle.

Zhipu founder Jie Tang said the company will release an open-source model comparable to Anthropic Fable within this year. Elon Musk has predicted Chinese models could reach Fable-level true usefulness by Q1 2027; Tang responded that it“won’t take that long.”


P.S. GLM-5.2 can match leading frontier models on security tasks at one-sixth the cost. It‘s also open-weight, meaning anyone can download it and remove its guardrails. For defenders, it’s a breakthrough. For attackers, it‘s a gift. The US restricted access to Anthropic’s models. The open alternative just got better. That‘s not a flaw in the policy. It’s the predictable consequence of one.

Advertisement

CRAZE

Use CRAZE to turn this article into a faster answer: pull the summary, surface the key term, or jump straight to the next story in this thread.

Article