Voice is already being used as a form of biometric for smart speakers. For instance, the voice match mechanism in Google Home allows users to teach Google Assistant to recognize their voice and seek personal information such as calendar updates or messages after it recognizes their voice. Among banks, HSBC is offering a voice biometric system that, while creating a voice ID, looks for 100 identifiers such as accent, pronunciation, cadence, size of larynx, vocal tract, role of tongue and nasal cavity, which can be used to verify future log-ins by a customer.
“Voice is emerging as an additional channel for communication with banking apps and chatbots. However, banking requires continuous authentication, and that is why identifying the speaker is important. The idea is to use voice as an additional form of authentication," said Jaideep Dhok, general manager banking, financial services and insurance at Persistent Systems.
Persistent Systems and ValidSoft have jointly developed a secure digital voice authentication for continuous user validation. It will be integrated with Persistent’s banking solutions.
Voice biometric systems first process the voice input to extract characteristics that are specific to a speaker to build a statistical model, also known as voice print or voice signature. To verify the new input against an enrolled voice, the same process is repeated and a similarity measure is obtained by matching the pattern.
According to ValidSoft, voice input includes characteristics that relate to the way air flows from the lungs to the mouth when a person is talking, and more precisely to how this air flow is affected by the shape of the vocal tract. The information processed by voice biometric systems closely relates to a physical characteristic of the speaker’s vocal tract.
Also, voice biometric engines are designed to deal with variability. So, if a user has a cold and sounds different to another person, to an advanced biometric engine it will sound just the same.
Despite being one of the most natural form of biometrics, the adoption has been limited because of the poor quality of voice references marred by background noise.
Researchers at HSE University and Nizhny Novgorod State Linguistic University have developed an artificial intelligence (AI) based voice recognition system, which can reduce the error rate of such systems to 2% at a signal-to-noise ratio of 10dB or higher.
Voice biometrics is improving rapidly, but the big question around them is whether it is as reliable as other forms of biometrics.
“We can expect accuracy to continue to improve. One of the biggest challenges in voice biometrics, though, is the ability of bad actors to replicate a person’s voice. This can be done synthetically, or by using voice samples from video or other recordings," said Pam Dixon, executive director at the World Privacy Forum.
Studies have shown that basic voice recognition systems can be duped with voice impersonation or a replay attack where the attacker plays a recorded voice to trick the system using efficient voice synthesizers and advanced audio tools.
Venkat Krishnapur, vice president of engineering and managing director at McAfee India, pointed out that, as the technology evolves, its deployment will be increasingly capable of refuting most attempts of infiltration by utilising distinct vocal qualities, unique to every individual such as accent, pronunciation, speech rate, tone and pitch.
AI and natural language processing can be leveraged to make voice authentication even more secure. “Harnessing the power of AI engines, voice biometrics and natural language understanding (NLU) could be leveraged to authenticate individuals to a much higher degree of accuracy, making voice biometrics also a reasonably viable fingerprint of the future," adds Krishnapur. In addition to identifying voices based on the role of the vocal tract, advanced recognition engines can also detect replay attacks by looking for highest and lowest frequencies or by detecting distortion caused by replaying of audio. Though, the tech has evolved, a lot more needs to be done to not just improve security but also its efficiency.
Dixon feels voice is best used in combination with other forms of authentication, such as knowing a passphrase or having an ID. However, voice by itself has risks because of the spoofing issues.