Listen to the voice3 min read . Updated: 30 Nov 2010, 08:39 PM IST
Listen to the voice
Listen to the voice
There’s this hilarious video of two Scots stuck in a lift that I stumbled upon when I was looking for an easy-to-use speech-recognition software. One man enters the lift, looks around, and asks the other nervously: “But where are the buttons?" The other points to the roof of the lift and says, “voice-recognition technology".
So the first passenger looks at the roof and says, softly, “Eleven." An electronic voice says, “Could you please repeat that." He says it again, and gets the same response. Then, angrily, one after the other, they go “elleven, eee-leven, eh-ley-ven, leven, eee-ley-when," only to be told, “If you want to get out of the elevator simply say, ‘open the doors please’."
I like the idea of speech or voice recognition. I’m into sci-fi movies and I would like to “speak" with my computer like Captain Kirk in Star Trek. I would like to wake up in the morning and command my laptop to “please read out my emails", or maybe just say, “emails, please". Every morning I check my iPhone calendar and make notes about things to do, appointments, bills to pay, and so on. Keying in this information is not difficult, but a bit of a chore. I would love to be able to dictate to my smartphone: “Friday, 8pm, flight." At the moment, however, the only bit of occasional speech commanding I do is on my MacBook: Just for fun I ask it to “switch to Firefox" or “open Skype". But no one’s impressed.
But since then this technology has come a long way. I now use Google Voice Search, a mobile app for iPhone and BlackBerry, and voice my query instead of keying it in. It’s an amazingly useful piece of technology. I say “Pizza Delhi" and in an instant I get the phone numbers and locations of Pizza Hut outlets near where I live. It is what they call “location aware"—it uses your location and answers your queries. I say “movie show times" and I get the number of the nearest PVR theatre. It saves me the bother of keying in these words.
Unlike many voice-activated technologies, you don’t have to “train" Google Voice Search. In all my months of use, it’s rarely got my Indian accent wrong. And so I decided to try out Dragon Dictation, a speech-to-text iPhone app created by Nuance, the company that makes the very popular speech-recognition software—Dragon NaturallySpeaking and Dragon Dictate.
The app is really simple: You tap a button on the screen, speak your lines, and tap the “Done" button. That’s all there is to it. It instantly converts your speech into text; you edit the errors if any, and email the text, send it as an SMS or post it on Facebook. The company says that the software requires no training, and is up to five times faster than typing on the keyboard.
But my initial experience was not trouble-free. It worked nicely for short sentences—“now is the time for all good men to come to the aid of the party"—but when I read out two paragraphs from a newspaper, I got some errors. I get the feeling that it’s more to do with the way I dictate a line (I tend to pause and hesitate); I think the trick is to speak clearly.
Nevertheless, compared with my frustrating experience of some years ago, this is a quantum leap in technology, and I’ve decided to try out the desktop versions of speech-recognition technology. I can think of two options: Nuance’s Dragon NaturallySpeaking and the Windows Vista version of speech recognition. The Dragon, according to reviews, is top of the line, but the problem is the price: The Premium package costs $200, or Rs9,200 (the Home version is half the price but in all software packages, Premium is always more fun).
So I am starting with the Windows Vista version of speech recognition. I only hope I don’t have to go through a long and painful training session.
Shekhar Bhatia is a former editor, Hindustan Times, a science buff and a geek at heart.
Write to Shekhar at firstname.lastname@example.org