OpenAI announces GPT‑5.1-Codex-Max, a new coding model built for long-running tasks

Last week, OpenAI released the GPT-5.1 series of models for ChatGPT users and developers. The GPT‑5.1 Instant model is warmer, more intelligent, and better at following instructions, while GPT‑5.1 Thinking is an advanced reasoning model for complex tasks. OpenAI also released GPT-5.1-Codex, a version of GPT-5.1 optimized for agentic coding tasks in Codex and other developer environments.

Today, OpenAI announced GPT‑5.1-Codex-Max, a new frontier agentic coding model built for long-running tasks. The GPT‑5.1-Codex-Max model is trained to work across multiple context windows through a technique called compaction. In fact, OpenAI claims that this new model can work reliably over millions of tokens in a single task. Despite its improved performance, OpenAI claims that this new model is faster and more token-efficient than the regular GPT-5.1 model.

The OpenAI team wrote the following regarding this new coding model:

"GPT‑5.1-Codex-Max was trained on real-world software engineering tasks, like PR creation, code review, frontend coding, and Q&A, and outperforms our previous models on many frontier coding evaluations."

GPT-5.1-Codex scored 73.7% on SWE-Bench Verified (n=500), 66.3% on SWE-Lancer IC SWE, and 52.8% on TerminalBench 2.0. This new GPT-5.1-Codex-Max model delivers even higher results, reaching 77.9% on SWE-Bench Verified, 79.9% on SWE-Lancer IC SWE, and 58.1% on TerminalBench 2.0.

The previous Codex models were mostly optimized to run on Unix-based environments. However, this new GPT-5.1-Codex-Max model is trained to operate in Windows environments.

During complex refactors and long-running agent loops, most of the coding models on the market right now fail after some time due to context-window limits. Through compaction, GPT‑5.1-Codex-Max automatically compacts its session when it approaches its context window limit, allowing this model to work independently for hours at a time. During internal testing, OpenAI claims it noticed that GPT‑5.1-Codex-Max was able to work on tasks for more than 24 hours.

Finally, GPT‑5.1-Codex-Max also comes with token efficiency improvements due to more effective reasoning. On SWE-Bench Verified, GPT‑5.1-Codex-Max used 30% fewer thinking tokens to achieve the same results as GPT-5.1-Codex. Additionally, the new Extra High (‘xhigh’) reasoning effort will make the model think longer for complex tasks.

The new GPT‑5.1-Codex-Max is now available in Codex CLI, IDE extension, cloud, and code review for users with ChatGPT Plus, Pro, Business, Edu, and Enterprise subscriptions. This model will also be coming to the API soon. Furthermore, OpenAI is replacing GPT‑5.1-Codex with GPT‑5.1-Codex-Max as the default model in Codex.

Tags