Grandma, is that you? Photo: Shutterstock
Amazon AI scientists teach its Alexa voice assistant to speak with the voice of a human — even those who have died — after being trained with just a few short audio clips.
Demonstrated for the first time at Amazon’s recent re: MARS 2022 event, it showcases the new feature that allows a young boy to ask Alexa“Can grandma read me?” The Wizard of Oz†
The voice assistant forces him to synthesize the dead woman’s voice and read the text aloud as the boy follows the book.
The technology is still in development, but Rohit Prasad, Amazon senior vice president and chief scientist for Alexa AI, positioned it as a way to add personality and warmth to the generic voices of today’s AI voice assistants.
“One thing that has surprised me the most about Alexa is the camaraderie we have with it,” Prasad explained, noting that “in this companionship role, human qualities of empathy and affect are key to building trust.”
“These traits have become even more important in these times of the ongoing pandemic,” he continued, “when so many of us have lost someone we love.”
“While AI can’t take away that pain of loss, it can certainly make their memories last.”
Building the technology required stepping back from the problem of conventional text-to-speech (TTS) engines that enable voice assistants to speak fluently using voices trained over many hours of recording by studio voice actors.
Instead, Prasad explained, the engineers approached the problem as a voice conversation task and analyzed the… prosody from the target voice — the non-linguistic aspects of the way we speak — to feed a personal voice filter that allows Alexa to speak in the target’s voice rather than its own.
“This required intervention where we had to learn to produce a high quality voice with less than a minute of audio versus hours of audio in the studio,” Prasad said.
Your vote is their password
Amazon may be positioning the voice mimicry technology as a sentimental favorite that makes AI-powered assistants more humane, but the technology is sure to soon find favor with criminals who have already experimented with using voice deepfakes to commit major fraud. .
In 2019, for example, a British CEO was misled to send more than $330,000 ($243,000) to a scammer who used AI technology to mimic the voice of the CEO of his German parent company.
Such tactics are likely only going to become more common over time as better voice-mimicking technology reaches the mainstream.
Using the techniques Prasad described, malicious actors were able to create a convincing synthetic voice of a business executive, political dignitary, or celebrity by simply training Alexa in part of a speech at an annual general meeting, corporate function, or other event.
The device could then be manipulated to come up with all sorts of persuasive statements, which could be chained to facilitate fraud on a whole new level.
Ever-improving technology means such problems are not far away, with companies like aflorhythmic combining ‘synthetic voice cloning’ with increasingly convincing visual deepfakes to produce synthetic people who can – as in the case of the ‘Digital Dom’ synth launched last year – emulate real people with uncanny accuracy.
This could have implications in the metaverse, where false voices could eventually be ported into new environments so that fraudsters can pretend to be just about anyone.
Ultimately, deepfake researcher and BT Young Scientist & Technologist of the Year winner Greg Tarr told Cybercrime Magazine, audio and video deepfakes are getting so good that online residents just have to remain skeptical of everything they hear and can’t verify in the real world.
“Because these technologies are becoming more and more available to the public and you don’t need technical experience to convince fake people,” Tarr said.
“It’s going to flatten out so much that we won’t be able to detect deepfakes anymore — and that’s something we’re going to have to get used to.”
“At that point,” he said, “we need to mature as a society rather than maturing the technology because there’s a limit to that; we need to be less dependent on the information we consume from unreliable sources.”
#Amazon #teaches #Alexa #deepfake #grandmother