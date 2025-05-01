Today, Microsoft announced Phi-4-reasoning, a 14B-parameter small reasoning model that is said to deliver strong performance on complex reasoning tasks. Microsoft trained this new model via supervised fine-tuning of Phi-4 on a curated set of "teachable" prompts generated using o3-mini. Microsoft also introduced Phi-4-reasoning-plus, a 14B-parameter variant of Phi-4-reasoning that delivers even better performance by generating longer reasoning traces.

According to Microsoft's whitepaper, these new Phi-4-reasoning models outperform several larger open-weight models, such as DeepSeek-R1-Distill-Llama-70B, and even match the performance levels of the full DeepSeek-R1 model on certain benchmarks. They are also said to outperform Anthropic's Claude 3.7 Sonnet and Google's Gemini 2 Flash Thinking models on all tasks except GPQA and Calendar Planning.

The impressive claimed performance of Phi-4-reasoning suggets that careful data curation for supervised fine-tuning (SFT) is effective for reasoning language models, and performance may be further improved using reinforcement learning (RL).

Phi-4-reasoning has several limitations as well. First, the Phi-4 model primarily works with English text. Second, it is mainly trained on Python using common coding packages. Third, it has a context length of just 32k tokens. Additional limitations can be found in the whitepaper.

Microsoft stated that these new Phi-4-reasoning models are designed to accelerate research on language models. They are expected to be useful for developing AI applications in memory- or compute-constrained environments, latency-bound scenarios, and reasoning-intensive tasks.

Interested developers can check out these new models at Hugging Face and Azure AI Foundry.