Microsoft achieves lowest word error rate in speech recognition

In a new blog post, Microsoft has announced that it has achieved the lowest word error rate (WER) in speech recognition in the industry. The company's chief scientist reports that it has managed to achieve a WER of 6.3%, which is considerable decrease of 0.3% as compared to what IBM attained last week.

In a research paper, the company states that:

Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set. We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data.

This piece of news certainly bodes wells for Microsoft, given the fact that it uses speech recognition technologies in various services, such as the personal assistant Cortana on Windows 10, as well as Skype Translator. It is also a significant step towards the company's goal of improving AI to an extent, such that humans can interact with computers as they would with other humans, promoting personal computing. Xuedong Huang, Microsoft's chief speech scientist states that:

This new milestone benefited from a wide range of new technologies developed by the AI community from many different organizations over the past 20 years.

Microsoft has also cited deep neural networks inspired by the biological processes of the brain to achieve this milestone. It also praised the Computational Network Toolkit's (CNTK) deep learning algorithms as well as GPU clusters for this advancement in speech recognition, stating that Cortana can now absorb ten times more data in the same amount of time.

Geoffrey Zweig, principal researcher and manager of Microsoft’s Speech & Dialog research group says that the milestone is also significant in the company's goal to provide the best AI solutions to its customers, which is a key component of Microsoft's conversation as a platform (CaaP) strategy.

Source: Microsoft