Skip to main content

Z.ai GLM-5.2 Beats GPT-5.5 on SWE-bench Pro

Z.ai launched GLM-5.2 on June 13, 2026 — an open-weights model that tops GPT-5.5 on SWE-bench Pro and costs roughly one-sixth as much per token.

Z.ai GLM-5.2 Beats GPT-5.5 on SWE-bench Proapidog.com

What is Z.ai's GLM-5.2 and when did it launch?

Z.ai launched GLM-5.2 on June 13, 2026 — a ~753-billion-parameter mixture-of-experts language model and the third major release in the GLM-5 line. It follows GLM-5 (February 11), GLM-5-Turbo (March 15), and GLM-5.1 (April 7), making four flagship-tier releases in roughly four months. GLM-5.2 is the only open-weights model in the current frontier tier. It is available as zai-org/GLM-5.2 on Hugging Face under an MIT license with no regional restrictions.

How does GLM-5.2 score on SWE-bench Pro compared to GPT-5.5?

GLM-5.2 scores 62.1 on SWE-bench Pro. GPT-5.5 scores 58.6. GLM-5.1, the previous version, scored 58.4. That means an open-weights model now leads a closed frontier model on a real software-engineering benchmark, according to VentureBeat.

The Terminal-Bench 2.1 result is also notable. GLM-5.2 scores 81.0, up from GLM-5.1's 62.0. That is a roughly 19-point jump in one generation on terminal-style agentic coding. Z.ai also reports GLM-5.2 as the top open-source model on FrontierSWE, PostTrainBench, and SWE-Marathon.

How does GLM-5.2 perform on agentic tool-use benchmarks?

On MCP-Atlas — a benchmark measuring Model Context Protocol tool orchestration — GLM-5.2 scores 77.0. GPT-5.5 scores 75.3. Claude Opus 4.8 leads at 77.8. GLM-5.2 sits less than one point behind Claude Opus 4.8 and ahead of GPT-5.5.

On Humanity's Last Exam with tools, Z.ai reports GLM-5.2 at 54.7 versus GPT-5.5's 52.2. GLM-5.2 supports OpenAI-compatible function and tool calling, plus an Anthropic-compatible coding endpoint. That lets it drop into agent harnesses built for Claude without changes to the harness itself.

What is the 1M-token context window and how does it compare to GLM-5.1?

GLM-5.2 ships with a 1,000,000-token context window, labeled glm-5.2[1m] in Z.ai's configuration. Each response can return up to 131,072 output tokens. That is roughly a 5x increase over GLM-5.1's ~200,000-token window, as MarkTechPost reports.

You might also like

A 1M-token window means a coding agent can hold an entire mid-sized repository in working memory — source files, tests, configuration, and conversation history — without constant summarization. Z.ai's docs list use cases including whole-repository refactors, long-horizon agent runs, and large-document analysis past 200K tokens.

What are GLM-5.2's two thinking-effort levels?

GLM-5.2 offers two reasoning modes: High and Max. Z.ai recommends Max effort for complex, multi-step coding work. In Claude Code, the /effort command controls this setting. The xhigh, max, and ultracode options all map to GLM-5.2's Max effort. Developers can also set reasoning_effort: "max" and thinking: {type: "enabled"} directly in the API, or disable thinking entirely for fast, low-cost responses.

How much does GLM-5.2 cost per token?

Pricing metric GLM-5.2
API input $1.40 / 1M tokens
API output $4.40 / 1M tokens
Cached input ~$0.26 / 1M tokens
Self-host Yes (MIT license)

Pricing is via OpenRouter, as cited by VentureBeat. The closed alternatives — GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro — are all priced higher, though their exact rates vary by tier and change frequently.

Here's what we know so far: the cost gap is substantial. VentureBeat describes GLM-5.2's pricing as roughly one-sixth the cost of GPT-5.5. For teams running high-volume coding agents, that difference compounds quickly.

What reasoning and math benchmarks has Z.ai published?

Z.ai reports GLM-5.2 at 99.2 on AIME 2026 and 91.2 on GPQA-Diamond. These are Z.ai's own launch numbers. No independent third-party replication of these scores had been published at the time of launch.

Z.ai also published no SWE-bench, Terminal-Bench, or Code Arena numbers at the initial announcement on June 13. The SWE-bench Pro and Terminal-Bench figures cited in this article come from Z.ai's subsequent head-to-head comparisons, not the launch announcement itself.

How does GLM-5.2 fit into the broader 2026 frontier model landscape?

The four models drawing the most attention in mid-2026 are GLM-5.2, GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. GLM-5.2 is the only one with open weights. The other three are closed, API-only, and do not allow self-hosting.

Model Weights SWE-bench Pro MCP-Atlas Input price
GLM-5.2 Open (MIT) 62.1 77.0 $1.40/1M
GPT-5.5 Closed 58.6 75.3 Higher
Claude Opus 4.8 Closed n/a 77.8 Higher
Gemini 3.1 Pro Closed n/a n/a Higher

Claude Opus 4.8 leads on MCP-Atlas at 77.8. Gemini 3.1 Pro is the closest competitor on long-context document work. GPT-5.5 remains a strong generalist coder with tight integration into the OpenAI tooling ecosystem. None of that changes the SWE-bench Pro result.

Discussions about open-weights models like GLM-5.2 sit alongside broader policy conversations about AI access — the kind of access questions that surfaced when the US government weighed DeepSeek trade restrictions. Open licensing and self-hosting rights are increasingly part of how developers evaluate frontier models, not just benchmark scores.

The GLM-5.2 architecture uses "IndexShare" sparse attention, which reuses one indexer across every four sparse-attention layers. This cuts attention cost at long context — a structural advantage for agents that accumulate large tool-call histories.

For builders tracking how AI labs are positioning at the policy and industry level, the G7 AI Summit brought together leaders from OpenAI, Anthropic, and DeepMind — the companies behind GLM-5.2's closed competitors. And for context on how AI productivity claims translate to real economic outcomes, see our coverage of AI and the US deficit.

Z.ai confirmed that open weights for GLM-5.2 were pending release the week following the June 13 launch. That is the next confirmed milestone from the sources.

Frequently asked questions

**What score did GLM-5.2 get on SWE-bench Pro?**
GLM-5.2 scored 62.1 on SWE-bench Pro, according to Z.ai's published results. That places it ahead of GPT-5.5, which scored 58.6, and GLM-5.1, which scored 58.4. SWE-bench Pro is a software-engineering benchmark. Z.ai also reports GLM-5.2 as the top open-source model on FrontierSWE, PostTrainBench, and SWE-Marathon.
**How large is GLM-5.2's context window?**
GLM-5.2 supports a 1,000,000-token context window, labeled `glm-5.2[1m]` in Z.ai's configuration. Each response can return up to 131,072 output tokens. That is roughly five times larger than GLM-5.1's ~200,000-token window. The larger window allows a coding agent to hold an entire mid-sized repository in working memory without summarization.
**What does GLM-5.2 cost per million tokens?**
GLM-5.2 is priced at $1.40 per million input tokens and $4.40 per million output tokens via OpenRouter, as cited by VentureBeat. Cached input costs approximately $0.26 per million tokens. VentureBeat describes this as roughly one-sixth the cost of GPT-5.5. The model can also be self-hosted under its MIT license, which eliminates API costs entirely.
**Is GLM-5.2 open source?**
Yes. GLM-5.2 is released under an MIT license with no regional restrictions. It is available as `zai-org/GLM-5.2` on Hugging Face. Z.ai confirmed at launch that the open weights were pending release the week following the June 13 announcement. GLM-5.2 is the only open-weights model among the four frontier models compared in mid-2026.
**What are GLM-5.2's two thinking-effort levels?**
GLM-5.2 offers High and Max reasoning modes. Z.ai recommends Max for complex, multi-step coding tasks. In Claude Code, the `/effort` command controls the setting, and the `xhigh`, `max`, and `ultracode` options all map to Max effort. Developers can also configure reasoning directly in the API using `reasoning_effort: "max"` and `thinking: {type: "enabled"}`.

Sources

  1. according to VentureBeat venturebeat.com
  2. as MarkTechPost reports marktechpost.com

Keep reading

0 Comments

Log in to comment

Not a member yet? Join the community