Voice for the blind
- #MeToo: Subodh Gupta steps down from Serendipity Arts Festival
- India Post announces full-fledged entry into e-commerce
- ‘Aquaman’, ‘Spider-Man’ dominate week devoid of Bollywood releases
- Priya Prakash Varrier, Priyanka Chopra, Sapna Choudhary among 2018’s most Googled personalities
- IndiaCast partners with Thai company JKN Media for content distribution
Bengaluru: Siddalingeshwar Ingalagi was left blind at the age of four after his retina detached. Though the 24-year-old is literate, a simple thing such as reading news in his native language Kannada was not that easy for him. While any other 24-year-old would have accessed news over his smartphone, Ingalagi had to rely on TV or radio broadcasts. Though there is assistive technology to help the blind, it was mostly made for the English speakers. For regional speakers like Ingalagi, there was no good solution available.
It was the predicament of people like Ingalagi that A.G. Ramakrishnan, professor at the department of electrical engineering and chairman of the Medical Intelligence and Language Engineering Laboratory at the Indian Institute of Science, tried to solve along with H.R. Shivakumar, a PhD student at the institute.
After nearly 15 years of development, at a cost of about Rs.5.3 lakh, raised from Karnataka State Council for Science and Technology and Tamil Software Development Fund of the Tamil Nadu government, the duo built a text to speech (TTS) engine from scratch that is able to read out e-text in two regional languages—Kannada and Tamil.
“All the tools that exist for the blind are in English, there are not many good tools for other languages,” said 56-year-old Ramakrishnan. “We wanted to change that.”
Their aim was to not only develop a tool for regional speakers but also to make it sound as close to a real human voice. “Most of the tools out there sound robotic. We wanted to make it sound as natural as possible and in a way that is intelligible,” says Shivakumar, who was the chief architect and programmer of the speech engine, which has been named Madhura.
Ramakrishnan spent nearly 15 years developing the software. He relied on the support of doctoral research students for working on this project. Most of the time, the students wanted to work on fancier projects than one that solves a problem for the blind, so few people struck on to see it through.
Almost 15 students and project staff worked on creating this but it was brought to life when an IBM employee Shivakumar joined Ramakrishnan’a team in 2007 and pursued this problem for his doctoral research.
While existing tools are created on top of available open-source software, Ramakrishnan and his team wanted to create it from scratch, writing their own code. For this, they created a corpus of 1,000 sentences that has all possible combinations of words and phonemes (unit of speech). They had a native speaker speak those phonemes such that any combination of word can be created from that corpus. Not just that, to make it sound natural, the software was programmed to have pauses after commas and full stops and other intonations.
The advantage of creating it on their own has resulted in a clearer and more natural voice, says Shivakumar.
With most TTS engines, people get bored with the voice as it is mechanical and drones on, says Ramakrishnan and he didn’t want that in the TTS they were building. “When you speak a sentence today and say the same thing tomorrow, you may have a different way of saying it. Our software is also programmed to change the way a sentence is spoken each time,” explains Ramakrishnan.
What is innovative about their software?
Ramakrishnan says the method in which their sentence is split into basic units is unique and the way it is put together in the context of surrounding units, is also unique. The software, once installed in a computer, is recognized by screen readers, which then uses the speech engine to read out any text.
For Ingalagi, listening to news is now a ritual. He goes to a few Kannada news websites, listens through the headlines, and then whichever news interests him, he dumps the audio file onto his mobile phone. “I listen to the news when I travel now,” he says.
Besides users such as Ingalagi, agencies such as Nabard too are exploring the use of this software, said Ramakrishnan. With initiatives such as the Jan Dhan Yojana, banks send text messages of the account balance and other information to account holders. Besides the disabled, even the illiterate can use this software.
The software is being given free to any non-profit organization that work for the blind. Ramakrishnan says a company will be set up by the end of this year to sell it commercially to companies who could also use this for building navigation tools in regional languages. They will also expand their offering to 14 regional languages from the current two.
The duo is also exploring partnering with companies so that they could reach out to the blind with this software as part of their corporate social responsibility initiatives.
Mint has a strategic partnership with Digital Empowerment Foundation, which hosts the Manthan and mBillionth awards.