Active Stocks
Wed May 22 2024 15:59:56
  1. Tata Steel share price
  2. 173.30 -0.57%
  1. ITC share price
  2. 439.75 1.10%
  1. State Bank Of India share price
  2. 819.30 -1.35%
  1. NTPC share price
  2. 373.40 0.40%
  1. Kotak Mahindra Bank share price
  2. 1,700.40 0.19%
Business News/ Technology / News/  Artpark-IISc, Google to bring innovation to India’s diverse languages
BackBack

Artpark-IISc, Google to bring innovation to India’s diverse languages

Vaani launched at the Google for India 2022 brings together high-quality datasets that reflect the true diversity of natural spoken language and transcribed text from every district of India

Artpark-IISc, Google to bring innovation to India’s diverse languages. (Photo: Reuters)Premium
Artpark-IISc, Google to bring innovation to India’s diverse languages. (Photo: Reuters)

Bengaluru-based Artpark (AI & Robotics Technology Park), a non-profit aimed at promoting technology innovations in artificial intelligence (AI) and robotics, set up by the Indian Institute of Science (IISc), has teamed up with Google to unveil an all-India inclusive language data initiative for open-sourcing datasets.

The new initiative, touted ‘Vaani’ launched at the “Google for India 2022" event in New Delhi, “brings together high-quality datasets that reflect the true diversity of natural spoken language and transcribed text from every district of India".

With this launch, Vaani joins the Bhāshā AI umbrella of Artpark and IISc’s pan-India language initiatives that include SYSPIN (Synthesizing Speech in Indian languages) and RESPIN (Recognizing Speech in Indian languages) which cover nine languages including Magadhi and Maithili.

“To propel research and innovation these datasets are being open-sourced via Vaani’s website (vaani.iisc.ac.in) and in the future may also be available through other platforms like ‘Bhashini’ of MeitY (Ministry of Electronics and Information Technology)," according to a statement.

Globally, there is a lot of hype about large language models like GPT-3. But they require huge text corpora and humongous computing power to train, as Prasanta Kumar Ghosh, IISc, who leads these initiatives, “in our work, we found at least 50 varieties of ‘Bengali’ and some that even I, as a native Bengali speaker, had difficulty understanding". Even Hindi, with its more than four-dozen dialectal variations does not have nearly as much text data. “Machines have no hope! So research and innovation for inclusive language AI require capturing this diversity in our datasets," he said.

Also, as Indians primarily communicate by speech, it warrants very different approaches and breakthroughs for machines to transcribe, understand, or translate while also taking into account the language variations every few kilometres. In such a context, technologies like automatic speech recognition (ASR) and natural language processing (NLP) can only be unleashed through open-source and mission-mode efforts.

Raghu Dharmaraju, president, Artpark, added, “Over the past decade, most apps for frontline health and agriculture workers have failed because digital interfaces feel alien to them. More than 1 billion Indians still cannot speak or type in English…So, if citizens can communicate with digital services in their mother tongue… over the next decade, that will be key to India’s economic growth and for a more equitable distribution of its benefits," Dharmaraju said.

The initiative, currently focused in 80 districts of 10 states, will expand to every district over the next couple of years. Artpark and IISc will also launch challenges for researchers and startups to build applications in areas like health, agriculture, and financial inclusion using these datasets.

You are on Mint! India's #1 news destination (Source: Press Gazette). To learn more about our business coverage and market insights Click Here!

ABOUT THE AUTHOR
Sohini Bagchi
"Sohini Bagchi is a senior assistant editor with TechCircle with over 15 years of experience in technology journalism. She has previously worked with IDG Media and Trivone Digital Services. Sohini is also a published author of fiction and non-fiction books. Her debut novel 'Road to Cherry Hills' enjoyed critical acclaim worldwide. Her second book 'Techtonic Shift' traces the history and evolution of computers and the Internet. Sohini has a masters degree in communications from Manipal Institute of Communication, Karnataka. She is trained in Karate and enjoys blogging and stargazing when she is not working. "
Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.
More Less
Published: 19 Dec 2022, 04:07 PM IST
Next Story footLogo
Recommended For You