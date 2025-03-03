In January, Microsoft announced its plans to bring NPU-optimized versions of the DeepSeek-R1 model directly to Copilot+ PCs powered by Qualcomm Snapdragon X processors. In February, DeepSeek-R1-Distill-Qwen-1.5B was first made available in AIToolkit for VSCode.

Today, Microsoft announced the availability of DeepSeek R1 7B and 14B distilled models for Copilot+ PCs via Azure AI Foundry. The ability to run 7B and 14B models locally on Copilot+ PCs will enable developers to build new kinds of AI-powered applications that were not possible before.

Since the models run on NPUs, users can expect sustained AI compute power with less impact on their PC battery life and thermal performance. Also, their CPU and GPU will be available to perform other tasks.

Microsoft highlighted that it used Aqua, an internal automatic quantization tool, to quantize all the DeepSeek model variants to int4 weights. Unfortunately, the model token speed is quite low. Microsoft is reporting only 8 tok/sec on the 14B model and close to 40 tok/sec with the 1.5B model. Microsoft mentioned that it is working on further optimizations to improve the speed. As Microsoft continues to optimize performance, the impact of these models on Copilot+ PCs is expected to grow significantly.

Interested developers can download and run the 1.5B, 7B, and 14B variants of DeepSeek models on Copilot+ PCs via the AI Toolkit VS Code extension. The DeepSeek model is optimized in the ONNX QDQ format and is downloaded directly from Azure AI Foundry. These models will also be coming to Copilot+ PCs powered by Intel Core Ultra 200V and AMD Ryzen processors in the future.

This move from Microsoft signifies a push towards more powerful on-device AI capabilities, opening up new possibilities for AI-driven applications.