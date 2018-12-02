Voice interaction seems to be an idea, whose time has finally come.

Ever since 2001: A Space Odyssey, voice recognition has been the holy grail for computer geeks. Speech and language are the first communication technologies, and the main driver of human evolution. But the idea of a voice-activated machine, for information, advice, transactions and, maybe, even friendship, has been a mirage, not a reality, given that speech recognition has been a major challenge.

But voice interaction seems to be an idea, whose time has finally come. Why now? AI advances, using deep neural networks and supporting graphics processing unit (GPU) hardware, have made it possible to train speech engines to reach high accuracy levels, using large amounts of audio data.

And, nowhere is this more relevant than in India. From as long back as the Mahabharata, India has been an oral society, without the West’s history of “type to search” using PCs, which powered the online revolution. Illiteracy, numerous languages, lack of familiarity with multilingual keyboards, mean that other ways of interacting with the digital world are necessary. Affordable smartphones and very cheap data, make India, mobile first. And mobiles are perfect for using voice as a UI!

Last year, 30% of Google search in India was voice driven. Hindi search grew 400% in a single year—a testament to the voracious appetite for online tools and content in local languages. Now, if you are not in vernacular, you are not in India!

Recognising the incredible potential for speech technologies in India, Interspeech, the world’s foremost speech research conference, took place in September in India for the first time. It’s theme: Speech research in multilingual societies in emerging markets! Global leaders in speech discussed the huge potential for voice in India. Hundreds of researchers presented how their flavour of deep neural networks, activation functions and model hyper-parameters, progressed speech research. India’s numerous dialects, accents and languages are a researcher’s utopia—challenges to push the boundaries of speech recognition. Priyanka Chopra advertises hair oil on TV, speaking Hindi and English in a single sentence, or code switching, as its known, in technical parlance. For Indians, it makes perfect sense, but impossible for the mono-lingual British or Americans to understand!

All the global technology behemoths at Interspeech, from Baidu to Google to Facebook and Microsoft, acknowledged the importance of local language speech recognition to reach the next 300 million Indians. E-commerce giants, such as Amazon and Walmart/Flipkart, already know that to realise the Indian market’s potential, targeting the top 10% of English-speaking Indians is not enough.

The local-language Indian audience is the real market! And the race to reach multilingual India has started. Last month, Flipkart acquired Liv.ai, a speech tech start-up, to compete with Amazon’s Alexa, which is five years in the making. Amazon released a Hindi website last week. Google and Microsoft are rolling out their own initiatives in Indian languages.

In supporting Indian users, there is another opportunity—the potential for India to build global speech giants, fuelled by its many languages, dialects and noisy environments. China built its tech giants behind the Great Firewall to exclude American competitors.

India has been open to global technology companies. But with voice, India’s unique challenges (barriers for the faint hearted), could be the “opportunity” for fostering home-grown giants. India has the talent. Indians in India, and globally, are some of the world’s finest speech researchers, and Indian tech entrepreneurs are among the world’s best. Panini, the Sanskrit linguist (approximately 500 BCE), was arguably the world’s first computational linguist.

Indian product companies recognise the opportunity. Some, such as Slang.ai, are making speech tech usable for developers. Others, such as Liv.ai (FlipKart) and Voxta, are building their own recognition engines. Even large Indian corporates are throwing capital at the opportunity and hiring experienced individuals, hoping to short-circuit the learning curve for building speech products.

Either way, the Indian consumer will be the winner! Once, western online users helped fuel the development of internet giants, which became household names. Now India, with its incredible diversity of users—languages, dialects, accents—could be the hothouse for researchers, entrepreneurs, and the capital will come together to build global speech tech giants and a voice internet!

Sirish Reddi is co-founder and director of Voxta