Microsoft introduces Windows AI Foundry, a unified platform for local AI development

Microsoft already supports local AI apps on Windows via Windows Copilot Runtime, which offers various AI features through Windows AI APIs and Windows machine learning (ML). The models behind Windows Copilot Runtime on Copilot+ PCs run locally and continuously in the background.

At Build 2025, Microsoft is introducing Windows AI Foundry, a unified platform for local AI development on Windows, by bringing together Windows Copilot Runtime and several new capabilities. Windows AI Foundry will offer ready-to-use AI APIs powered by built-in AI models, tools to customize Windows built-in models, the ability to bring open-source models from Azure AI Foundry, and an inference runtime enabling developers to bring their models.

App developers depend on a wide array of AI models from various vendors. So, Windows AI Foundry will integrate AI models from Azure Foundry Local and even other model catalogs like Ollama and NVIDIA NIMs. Microsoft's own Foundry Local model catalog will have optimized AI models that can run across CPUs, GPUs, and NPUs. Developers can use the "winget install Microsoft.FoundryLocal" command to browse, download, and test models based on device compatibility. Once the model is selected, developers can use the Foundry Local SDK to easily integrate Foundry Local into their app.

Windows ML is the built-in AI inferencing runtime in Windows that enables simplified and efficient model deployment across CPUs, GPUs, and NPUs. It is based on DirectML and works on silicon from various providers, including AMD, Intel, NVIDIA, and Qualcomm. App developers building on Windows ML do not have to worry about future silicon updates since Windows ML will be able to keep all required dependencies up to date and will adapt to new silicon under the hood.

Microsoft also announced support for LoRA for the Phi Silica model. LoRA enables fine-tuning a small subset of the model's parameters with custom data. This efficient fine-tuning will improve performance on certain types of tasks. LoRA is now available in public preview with Windows App SDK 1.8 Experimental 2 on Snapdragon X Series NPUs and will be available on Intel and AMD Copilot+ PCs in the coming months.

Finally, Microsoft announced new Semantic Search APIs for developers to create AI-powered search experiences using their app data. This AI-powered search can run locally, and it supports RAG (Retrieval-Augmented Generation). These Semantic Search APIs are available in private preview on all Copilot+ PCs.