What is Z.ai's GLM-5.2 and when did it launch?
Z.ai launched GLM-5.2 on June 13, 2026 — a ~753-billion-parameter mixture-of-experts language model and the third major release in the GLM-5 line. It follows GLM-5 (February 11), GLM-5-Turbo (March 15), and GLM-5.1 (April 7), making four flagship-tier releases in roughly four months. GLM-5.2 is the only open-weights model in the current frontier tier. It is available as zai-org/GLM-5.2 on Hugging Face under an MIT license with no regional restrictions.
How does GLM-5.2 score on SWE-bench Pro compared to GPT-5.5?
GLM-5.2 scores 62.1 on SWE-bench Pro. GPT-5.5 scores 58.6. GLM-5.1, the previous version, scored 58.4. That means an open-weights model now leads a closed frontier model on a real software-engineering benchmark, according to VentureBeat.
The Terminal-Bench 2.1 result is also notable. GLM-5.2 scores 81.0, up from GLM-5.1's 62.0. That is a roughly 19-point jump in one generation on terminal-style agentic coding. Z.ai also reports GLM-5.2 as the top open-source model on FrontierSWE, PostTrainBench, and SWE-Marathon.
How does GLM-5.2 perform on agentic tool-use benchmarks?
On MCP-Atlas — a benchmark measuring Model Context Protocol tool orchestration — GLM-5.2 scores 77.0. GPT-5.5 scores 75.3. Claude Opus 4.8 leads at 77.8. GLM-5.2 sits less than one point behind Claude Opus 4.8 and ahead of GPT-5.5.
On Humanity's Last Exam with tools, Z.ai reports GLM-5.2 at 54.7 versus GPT-5.5's 52.2. GLM-5.2 supports OpenAI-compatible function and tool calling, plus an Anthropic-compatible coding endpoint. That lets it drop into agent harnesses built for Claude without changes to the harness itself.
What is the 1M-token context window and how does it compare to GLM-5.1?
GLM-5.2 ships with a 1,000,000-token context window, labeled glm-5.2[1m] in Z.ai's configuration. Each response can return up to 131,072 output tokens. That is roughly a 5x increase over GLM-5.1's ~200,000-token window, as MarkTechPost reports.
You might also like
A 1M-token window means a coding agent can hold an entire mid-sized repository in working memory — source files, tests, configuration, and conversation history — without constant summarization. Z.ai's docs list use cases including whole-repository refactors, long-horizon agent runs, and large-document analysis past 200K tokens.
What are GLM-5.2's two thinking-effort levels?
GLM-5.2 offers two reasoning modes: High and Max. Z.ai recommends Max effort for complex, multi-step coding work. In Claude Code, the /effort command controls this setting. The xhigh, max, and ultracode options all map to GLM-5.2's Max effort. Developers can also set reasoning_effort: "max" and thinking: {type: "enabled"} directly in the API, or disable thinking entirely for fast, low-cost responses.
How much does GLM-5.2 cost per token?
| Pricing metric | GLM-5.2 |
|---|---|
| API input | $1.40 / 1M tokens |
| API output | $4.40 / 1M tokens |
| Cached input | ~$0.26 / 1M tokens |
| Self-host | Yes (MIT license) |
Pricing is via OpenRouter, as cited by VentureBeat. The closed alternatives — GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro — are all priced higher, though their exact rates vary by tier and change frequently.
Here's what we know so far: the cost gap is substantial. VentureBeat describes GLM-5.2's pricing as roughly one-sixth the cost of GPT-5.5. For teams running high-volume coding agents, that difference compounds quickly.
What reasoning and math benchmarks has Z.ai published?
Z.ai reports GLM-5.2 at 99.2 on AIME 2026 and 91.2 on GPQA-Diamond. These are Z.ai's own launch numbers. No independent third-party replication of these scores had been published at the time of launch.
Z.ai also published no SWE-bench, Terminal-Bench, or Code Arena numbers at the initial announcement on June 13. The SWE-bench Pro and Terminal-Bench figures cited in this article come from Z.ai's subsequent head-to-head comparisons, not the launch announcement itself.
How does GLM-5.2 fit into the broader 2026 frontier model landscape?
The four models drawing the most attention in mid-2026 are GLM-5.2, GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. GLM-5.2 is the only one with open weights. The other three are closed, API-only, and do not allow self-hosting.
| Model | Weights | SWE-bench Pro | MCP-Atlas | Input price |
|---|---|---|---|---|
| GLM-5.2 | Open (MIT) | 62.1 | 77.0 | $1.40/1M |
| GPT-5.5 | Closed | 58.6 | 75.3 | Higher |
| Claude Opus 4.8 | Closed | n/a | 77.8 | Higher |
| Gemini 3.1 Pro | Closed | n/a | n/a | Higher |
Claude Opus 4.8 leads on MCP-Atlas at 77.8. Gemini 3.1 Pro is the closest competitor on long-context document work. GPT-5.5 remains a strong generalist coder with tight integration into the OpenAI tooling ecosystem. None of that changes the SWE-bench Pro result.
Discussions about open-weights models like GLM-5.2 sit alongside broader policy conversations about AI access — the kind of access questions that surfaced when the US government weighed DeepSeek trade restrictions. Open licensing and self-hosting rights are increasingly part of how developers evaluate frontier models, not just benchmark scores.
The GLM-5.2 architecture uses "IndexShare" sparse attention, which reuses one indexer across every four sparse-attention layers. This cuts attention cost at long context — a structural advantage for agents that accumulate large tool-call histories.
For builders tracking how AI labs are positioning at the policy and industry level, the G7 AI Summit brought together leaders from OpenAI, Anthropic, and DeepMind — the companies behind GLM-5.2's closed competitors. And for context on how AI productivity claims translate to real economic outcomes, see our coverage of AI and the US deficit.
Z.ai confirmed that open weights for GLM-5.2 were pending release the week following the June 13 launch. That is the next confirmed milestone from the sources.

0 Comments
Log in to comment
Not a member yet? Join the community
Pick a meme
KlipyHave a great take?
Drop your email — we'll send a magic link so you can post it. No password.
Not a member of the community? Join today.
Join the community →