Back to Repositories

zai-org/GLM-4.5

Finally, an open-source model that can actually use tools without falling apart. GLM-4.5 is Zhipu AI's (Z.ai) flagship Mixture-of-Experts foundation model built from the ground up for agentic workloads -- and it nails tool calling at a 90.6% success rate, beating even Claude Sonnet 4 (89.5%). The architecture packs 355 billion total parameters but only activates 32 billion per inference pass, making it roughly 8x more efficient than an equivalent dense model. That means you can run serious reasoning workloads without burning through your entire GPU budget. What makes GLM-4.5 genuinely interesting is the dual-mode design. Flip it into thinking mode for complex multi-step reasoning and tool orchestration, or run non-thinking mode when you just need a fast, direct response. This is not a gimmick -- on AIME 2024 math competition problems, thinking mode scores 91.0%, which blows past Claude Opus 4's 75.7%. The smaller GLM-4.5-Air variant (106B total, 12B active) still hits 89.4% on the same benchmark, which is absurd for a model you can run on just two H200 GPUs. The training pipeline is worth studying: Zhipu pretrained on 22 trillion tokens (15T text + 7T code/reasoning), then made three specialized copies of the base model -- one for reasoning, one for agentic tasks, one for general knowledge -- and distilled them back into a single unified model. The result is a model that scores 63.2 across 12 industry benchmarks, ranking 3rd globally behind only GPT-4 and Claude 4. Deployment is straightforward with SGLang or vLLM. FP8 quantization cuts hardware requirements in half (8x H100 instead of 16x), and the 128K context window handles long document workflows without chunking headaches. Everything ships under MIT license -- full commercial use, no restrictions, no catch. Weights are on both Hugging Face (43K+ monthly downloads) and ModelScope. The repo itself has 4.2K stars and includes inference code, deployment guides for Ascend NPUs and AMD GPUs, plus fine-tuning recipes for LLaMA-Factory and SWIFT.

models
Python

Why It Matters

GLM-4.5 fills a gap that has frustrated developers building AI agents: most open-source models either reason well OR follow tool-calling protocols well, but rarely both. GLM-4.5 does both in a single model under MIT license, which changes the economics of agentic AI deployment entirely. You no longer need to route between separate specialized models or pay per-token API fees to closed providers. The competitive positioning is striking. Coming out of Zhipu AI in China, GLM-4.5 directly challenges both DeepSeek and Western frontier models on their home turf -- tool use, coding, and mathematical reasoning. The 90.6% tool-calling accuracy puts it ahead of Claude Sonnet 4, and the AIME math scores embarrass models costing 10x more to run. For teams building autonomous coding agents, customer support bots, or data analysis pipelines, GLM-4.5 offers frontier-adjacent performance at open-source prices. The broader GLM family (4.5, 4.6V for vision, 4.7 for coding, and the newer GLM-5 at 744B) gives you an upgrade path without switching ecosystems. That kind of model family coherence used to be exclusive to OpenAI and Anthropic.

Repository Stats

Stars
4.3k
Forks
441
Last Commit
2/1/2026

Category

Related Resources

Weekly AI Digest