Bangalore: By October, the Indian arm of the Burlington, Massachusetts-based speech solutions provider Nuance Communications Inc. will launch an extensive song search service where customers can speak in Telugu, Tamil, Bengali or Oriya to find and download songs of their choice.
Nuance will launch the service with two telecom operators in 13 languages, shrugging aside the lacklustre adoption of earlier voice-based mobile services in India.
US companies and educational institutions started research on speech recognition software on computers in the 1960s, but the concept never caught on—except in pockets such as medical transcription services, where it is used to transcribe physicians’ verbal records into text.
The major hiccup is that the software, built on a series of algorithms, can never fully understand human accents, diction, dialects and turn of phrase. Such software understands English best, with a success rate of 90%, as most of the research has been rooted in the English language in the US.
Nuance says its software has a speech recognition success rate of 85% in Hindi, and 70-72% in other regional languages, which will still leave it struggling to understand three out of 10 voice-based searches by people speaking in Bengali, Oriya or other local languages.
India has at least 1,500 recognized languages, each with several distinct dialects.
Despite this, Sumit Goswami, head marketing (India/Asean regions) at Nuance, expects 8-10 million customers to try the service once launched. “…at least half of the them will come back to be repeat users.” Goswami declined to divulge the names of the two operators his company has tied up with.
Mobile value-added service firms have been focusing on voice as they chase the new rural, semi-literate or illiterate cellphone user. According to Madhusudan Gupta, senior research analyst at research firm Gartner Inc., out of at least 384 million mobile phone users in India at the end of 2008, 28% were from rural areas.
Over the past two years, telecom operators have been launching an array of voice-based services ranging from song downloads to astrology to cricket commentary.
Vodafone Essar Ltd says it has one million regular customers, out of 78 million, for the voice-based song search service it launched in end-2007. “The voice recognition works as long as the customer speaks clearly and there is no ambient noise,” says a Vodafone Essar spokesman.
Voice recognition software won’t work when “people call and say ‘yeh gana sunao (Hindi for play this song)’”, says Jyotirmoy Chakraborty, CEO of Bangalore-based start-up Ubona Technologies Pvt. Ltd, which provides voice recognition software to telecom operators. The software has a restricted vocabulary in Indian languages, limited to names of movies and songs, and cannot follow whole sentences.
Chakraborty thinks a success rate of 75% is good enough for commercial deployment in entertainment-based services as “customers are ready to retry”.
So why should speech recognition technology that failed to take off on computers and laptops fly on mobile phones?
“The natural behaviour with a mobile phone is to talk,” says Rutvik Doshi, product manager at the Indian arm of the world’s largest Internet search company, Google Inc. “Typing on a mobile phone is also clumsy.”
Doshi thinks the lack of microphones on computers, awkwardness in talking to a machine, ease of typing on a keyboard and voice recognition inaccuracy have held such software back on computers.
Google is counting on these services clicking on phones— it launched a voice-based, mobile Internet search application in July on the back of learnings from a voice-based directory service that it launched in four Indian cities.
But the problem of accuracy exists, be it on a computer or mobile phones, particularly in the case of open-ended, unstructured speech, says Ashok Jhunjhunwala, professor, department of electrical engineering at the Indian Institute of Technology, Madras (IIT-M).
Jhunjhunwala, who also leads the Telecommunications and Computer Networking Group, or TeNet Group, a research-focused coalition of faculty from IIT-M’s engineering departments, says voice recognition-based mobile phone services will work only to a limited extent. “For instance, a software can never follow this conversation.”