Back in September 2016, Microsoft stated that it had achieved the lowest word error rate (WER) in speech recognition, at 6.3%. Prior to that, IBM held that record at 6.9%. A couple of months later, Microsoft once again announced that it had further improved its WER, which was now at 5.9%. The company believed this to be equivalent to human parity.
Now, a few months after the aforementioned claim, IBM has struck back with a WER of 5.5%, which it says is almost on par with humans.
IBM claims that along with lowering the word error rate in speech recognition to 5.5%, it had also determined that human parity is at an even lower threshold, at 5.1%. The company stated that:
Reaching human parity – meaning an error rate on par with that of two humans speaking – has long been the ultimate industry goal. Others in the industry are chasing this milestone alongside us, and some have recently claimed reaching 5.9 percent as equivalent to human parity…but we’re not popping the champagne yet. As part of our process in reaching today’s milestone, we determined human parity is actually lower than what anyone has yet achieved — at 5.1 percent.
It went on to say that it achieved a WER of 5.5% using Long Short-Term Memory (LSTM) and WaveNet language models with three strong acoustic models. This word error rate in speech recognition was then measured using the SWITCHBOARD and CallHome corpora - the former has been utilized as a benchmarking tool for the past couple of decades.
While the company now holds the record for the lowest WER in the industry, it says that it plans to continue improving and achieve human parity. That said, it did note that developments in speech recognition are built on the foundations of decades of research, and that IBM will continue to work to "match the complexity of how the human ear, voice and brain interact".