Azure OpenAI introduces GPT-4o Mini Audio models for real-time speech AI

Microsoft's Azure OpenAI service expands with GPT-4o-Mini-Realtime and Audio Preview models, enabling developers to build advanced speech AI applications.

Omer Dursun Neowin · Feb 6, 2025 03:56 EST

Microsoft has announced the availability of GPT-4o-Mini-Realtime-Preview and GPT-4o-Mini-Audio-Preview for Azure OpenAI Service. According to the company, these two new additions to the Azure OpenAI Service family are positioned to revolutionize how voice-driven interactions and AI-powered content creation are imagined.

The GPT-4o-Mini-Realtime-Preview model introduces a transformative approach to real-time voice interactions. Developers can now unlock voice-based experiences for their applications, such as customer service chatbots and virtual assistants. This model's advanced audio capabilities enable natural and intuitive interactions, reducing response times.

Apart from the capability for real-time, the GPT-4o-Mini-Audio-Preview model yields high-quality audio interactions at less than a fraction of the price of the already existing GPT-4o audio models. The cost-effective model will make it much more accessible for businesses to leverage AI-powered audio capabilities in their applications-from sentiment analysis to text-to-audio content creation.

Chat Completions API with GPT-4o-Audio Preview model is designed to transform the way users interact with AI by incorporating natural audio elements, adding depth to applications that require nuanced understanding and response generation.

Allan Carranza, senior product manager of Azure OpenAI, claims that both will be integrated with the existing Realtime API and Chat Completion API to provide continuity in the experience of model families on Azure's OpenAI service.

Carranza also stated that the applications for these new models span a wide variety of industries— on-premise voice bots and virtual assistants will be able to answer questions more effectively, increasing overall customer satisfaction. Content creators can transform their workflows in speech generation for video games, podcasts, and film studios. He also says healthcare and legal services will be able to provide real-time audio translation and break down language barriers with this technology.

The GPT 4o models associated with Realtime API and Chat Completions API both support audio and speech capabilities, each offering unique functionalities for AI-driven user experiences.

The new GPT-4o-Mini-Realtime-Preview and GPT-4o-Mini-Audio-Preview models are now available in the Azure AI Foundry public preview.