Lyrebird is launching an API to copy anyone's voice from a one-minute audio recording

Lyrebird's speech synthesis API promises to create a realistic, 'emotional' copy of anyone's voice after hearing them speak for just one minute, raising some interesting ethical issues.

Andy Weir Senior News Editor Neowin @gcaweir · Apr 24, 2017 08:32 EDT · Hot! with 11 comments

The magnificent lyrebird is best known for its astonishing ability to mimic the calls of other birds, and many other sounds that it comes across, from a car alarm to the sound of a camera shutter. It was the ideal choice as the name of Canadian company Lyrebird, which is launching a new API that it says will enable developers to recreate any person's voice from just one minute of audio recordings.

As TNW points out, Lyrebird's API is conceptually similar to Adobe's Project VoCo technology - which was revealed last year - although there appear to be some key differences. While VoCo seems to require system resources to create digital voices, Lyrebird's API relies on cloud resources. "Our GPU clusters generate 1000 sentences in less than half a second," the company claims.

VoCo also needs to 'hear' at least 20 minutes of original audio for its speech synthesis, but Lyrebird says that from a single minute, it can "compress voice DNA into a unique key [and] use this key to generate anything with its corresponding voice." The company even says that it will include the ability to "control the emotion of the generated voice", infusing it with anger, sympathy, stress, and other emotions, with corresponding inflections to make the voice sound more natural.

Lyrebird has posted numerous samples of synthesized audio recordings on its website, but makes it clear that the API is "still under development". As you can hear for yourself, the samples aren't perfect, but this is just the start, and it's easy to envisage how the technology will be refined to eventually enable the creation of digital voices that sound realistic enough to fool the listener into believing that they're hearing a real person.

That raises some significant issues, and Lyrebird isn't shying away from them. There is obvious potential for tools like this one to be abused in order to mislead others, perhaps even for criminal purposes. And at a time when people are quick to share content without first questioning its authenticity or accuracy, the dissemination of 'falsified' recordings of public figures across the web could lead to unimaginable consequences.

On the Ethics page of its website, Lyrebird stated:

Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.

By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.

Lyrebird is hoping that developers will put its technology to better use, "for personal assistants, for reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios." While its artificial intelligence and machine learning capabilities remain in development, the company hasn't yet revealed when it will make its API generally available.

Source: Lyrebird via TNW