Z.ai, formerly known as Zhipu AI, released the GLM-5.1 model on Tuesday. This open-source flagship model is designed for agentic engineering and can autonomously handle a single coding task for up to eight hours, performing planning, execution, testing, and optimization in a continuous loop.
The GLM-5.1 model scored 58.4 on the SWE-Bench Pro benchmark, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. This release follows a post-training refinement of GLM-5, which was introduced in February as a 744-billion-parameter Mixture-of-Experts model, utilizing approximately 40 billion active parameters per token and trained entirely on Huawei Ascend chips without Nvidia hardware.
According to Z.ai’s documentation, GLM-5.1 enhances coding and agentic capabilities through techniques such as multi-task supervised fine-tuning and reinforcement learning stages. The model is capable of sustaining an eight-hour autonomous execution, completing a full “experiment–analyze–optimize” loop. In demonstrations, it successfully built a complete Linux desktop system from scratch within eight hours, executing 655 iterations and increasing vector database query throughput by 6.9 times.
GLM-5.1 features a context window of 200,000 tokens and supports up to 128,000 output tokens. It has been optimized for agentic coding workflows and is compatible with tools like Claude Code and OpenClaw. The model achieved a 3.6x geometric mean speedup on real machine learning workloads in the KernelBench Level 3 optimization benchmark.
GLM-5.1 is available to all GLM Coding Plan subscribers, with its weights published under an MIT license. Z.ai, which went public on the Hong Kong Stock Exchange in January with a valuation of $31.3 billion, offers API access for GLM-5.1 at a cost of $1.00 per million input tokens and $3.20 per million output tokens.
The launch escalates competition in the open-source coding model space, positioning GLM-5.1 at the forefront on the SWE-Bench Pro ahead of its closed-source counterparts. While Z.ai claims the model’s capabilities are aligned with Claude Opus 4.6, independent evaluations show that it achieves approximately 94.6 percent of Opus 4.6’s broader coding score, indicating some gaps in reasoning and creative tasks.








