Lyrebird seeks its fortune in fake voices
Imagine this: You are a news reporter and your source plays you the recording of a top-secret meeting. You recognize the voices. You feel the adrenaline rush as you smell a big story.
You decide to cross-check with one of the people in the meeting, and he tells you that no such conversation took place. He denies every bit. So did your normally reliable source try to sell you a lemon? But what about the recording? You heard the voices yourself. You are in a dilemma.
Does that sound over the top?
A Montreal-based tech start-up, Lyrebird, has developed software that it claims can impersonate anyone’s voice: make it say whatever you want it to say, and even add the emotion, stress and intonation you want in the voice. All you will require is 60 seconds of that person’s voice recording, and the software, called “voice imitation algorithm”, will do the rest. Its website (Lyrebird.ai) has an audio clip of three people discussing the technology, and you can clearly make out the voices of US President Donald Trump, former US president Barack Obama and former US secretary of state Hillary Clinton. All fake. The voices are a bit metallic, robot-like, but the technology is a work in progress. This is just the beginning.
The Artificial Intelligence (AI) company is the brainchild of a three-member team—including a dual-degree holder from the Indian Institute of Technology, Kanpur—at the Montreal Institute for Learning Algorithms lab at the University of Montreal. They have taken the name Lyrebird from the real-life Australian bird that can mimic any sound it hears—of another bird, or even sounds such as a mobile phone ring tone. The naturalist David Attenborough’s video of the Lyrebird in action has been viewed some 17 million times on YouTube.
I don’t know how the technology works, but a simplistic definition of AI is making a computer function like the human brain. Try “Google AI Experiments”, click “Quick, Draw!” and a “neural network” tries to guess what you are drawing. The site has several interesting experiments that give you a general idea.
Lyrebird says, “Users will be able to generate entire dialogs with the voice of their choice, or design from scratch completely new and unique voices tailored for their needs.” It will offer “a large catalog of different voices and let the user design their own unique voices tailored for their needs.” You will be able to control the emotion of the generated voice. Alexandre de Brébisson, Lyrebird co-founder, told Scientific American, “We can generate thousands of sentences in one second, which is crucial for real-time applications. Lyrebird also adds the possibility of copying a voice very fast and is language agnostic.”
While the technology is being talked about as “a big leap forward”, it raises concerns about privacy and security. Just recently, I got a mail from my bank giving me the option of switching to voice ID for phone banking. There are banks abroad that have started using voice-recognition systems for online banking. And I wonder if someone can use Lyrebird’s technology to manipulate voice and hack into a bank account. How on earth are we going to track fake voices, and in real time?
I watch a lot of science fiction and my recent memory of AI in movies is the humanoid robot Ava in Ex-Machina. She is devious. Come to think of it, so was HAL the computer in 2001: A Space Odyssey.
Lyrebird says it is aware of the dangers. “This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else,” it says in the Ethics section of its website. The company hopes that once it releases its technology to the public, “everyone will soon be aware that such technology exists and that copying the voice of someone else is possible”.
So their solution is, the more people use the software, the greater the awareness of its dangers.
The company says its technology “can be used for personal assistants, for reading of audio books, for connected devices, speech synthesis for people with disabilities, for animation movies or for video-game studios”.
That’s the good side of this technology, but imagine this flip side: You receive a recording in which you can clearly hear your spouse/partner in an intimate conversation with another person. What do you do? How would you react? It’s an Othello question.
Shekhar Bhatia is a science buff and a geek at heart.