Microsoft has unveiled its Live Interpreter API, a new capability within Azure Speech Translation. It is designed to provide effortless, real-time, multilingual communication without requiring a user to set an input language. Some of its key features include automated and continuous language identification (LID), support for 76 input languages and 143 locales, significant latency improvements on par with human interpreters, and the ability to use a personal voice that preserves style and tone. It is now in public preview.
The new API has many use cases, for example it can be used in multilingual contact centers, online meetings and events, multilingual classrooms, and social commerce live streaming. With this new API, software can eliminate language-switching menus allowing seamless switching mid-conversation.
One of the interesting features of the API is personal voice which allows you to preserve the style and tone of a voice so that it sounds like the original speaker. It even maintains intonation and pacing with enterprise-grade consent controls.
The Live Interpreter API is built on Azure Speech Translation to deliver continuous language identification, full language coverage, and low-latency speech-to-speech translation. Microsoft has partnered with Anker Innovations as a real-world example of what’s possible with the new API, here"s what Anker Innovations said:
“We’re excited to partner with Microsoft and to demonstrate what’s possible when AI meets every day tech. Built on the Azure Speech Translation Live Interpreter capability, we’re able to deliver smarter, more intuitive, and truly immersive audiovisual experiences for users around the world.”
Microsoft has provided a QuickStart Guide for developers to begin using the new API. For end users, this API is not something you’ll interact with directly, but through apps or websites that choose to integrate it.