When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Microsoft: Our Phi-4-reasoning takes on larger models, matches DeepSeek-R1 performance

Microsoft Logo

Today, Microsoft announced Phi-4-reasoning, a 14B-parameter small reasoning model that is said to deliver strong performance on complex reasoning tasks. Microsoft trained this new model via supervised fine-tuning of Phi-4 on a curated set of "teachable" prompts generated using o3-mini. Microsoft also introduced Phi-4-reasoning-plus, a 14B-parameter variant of Phi-4-reasoning that delivers even better performance by generating longer reasoning traces.

According to Microsoft's whitepaper, these new Phi-4-reasoning models outperform several larger open-weight models, such as DeepSeek-R1-Distill-Llama-70B, and even match the performance levels of the full DeepSeek-R1 model on certain benchmarks. They are also said to outperform Anthropic's Claude 3.7 Sonnet and Google's Gemini 2 Flash Thinking models on all tasks except GPQA and Calendar Planning.

Microsoft Phi-4-Reasoning

The impressive claimed performance of Phi-4-reasoning suggets that careful data curation for supervised fine-tuning (SFT) is effective for reasoning language models, and performance may be further improved using reinforcement learning (RL).

Phi-4-reasoning has several limitations as well. First, the Phi-4 model primarily works with English text. Second, it is mainly trained on Python using common coding packages. Third, it has a context length of just 32k tokens. Additional limitations can be found in the whitepaper.

Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs.

The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning.

📌Competitive results on reasoning benchmarks with… pic.twitter.com/p2FkjD4qfu

— Ahmed Awadallah (@AhmedHAwadallah) May 1, 2025

Microsoft stated that these new Phi-4-reasoning models are designed to accelerate research on language models. They are expected to be useful for developing AI applications in memory- or compute-constrained environments, latency-bound scenarios, and reasoning-intensive tasks.

Interested developers can check out these new models at Hugging Face and Azure AI Foundry.

Report a problem with article
Amazon Logo
Next Article

Amazon unveils Nova Premier, its most advanced yet underwhelming model

apple logo
Previous Article

US Judge rules Apple willfully violated and ignored court's 2021 decision

Join the conversation!

Login or Sign Up to read and post a comment.

0 Comments - Add comment