Tech Tips: Are You Talking To Your Computer Again?


by DO-IT Staff, Terry Thompson

I'm writing this installment of Tech Tips while sitting in the airport waiting for a flight. The man sitting next to me says "The airport sure is crowded today." I agree with him, and then I realize he's not talking to me: He's talking on his wireless phone. As I glance around at my fellow travelers at the gate, I notice that over half are carrying on conversations, but none of these are with people next to them. Wireless phones have done much to reduce peoples' inhibitions about speaking openly in public. This leads me to wonder if someday people will be equally comfortable about operating their computers by voice, and verbally composing documents using speech recognition technology.

Speech recognition technology has been around for decades. Researchers in AT&T's Bell Labs began trying to get computers to transcribe human speech in 1936. The first company to release a commercial speech recognition product was Covox in 1982. That same year, Dragon Systems was founded by James and Janet Baker, two former IBM researchers who had been working on speech recognition at IBM. In 1990, they released Dragon Dictate™, the first large-vocabulary speech-to-text system for general-purpose dictation. Their primary market for Dragon Dictate™ was individuals with disabilities, particularly those with mobility impairments who otherwise had difficulty typing text into the computer. Since these early days, speech recognition has caught on in mainstream markets, and people with disabilities are rarely even mentioned in product marketing materials. Nevertheless, speech recognition technology can benefit people with limited use of their hands or limited dexterity, people with repetitive stress injuries such as carpal tunnel syndrome, and people with learning disabilities who have difficulty writing. It allows people to speak naturally, and transcribes what they say, or at least, what it thinks they say.

Accuracy and ease-of-use are the greatest obstacles to speech recognition being a perfect solution for everyone. Speech recognition products make mistakes-you say "It's right for us", and "his arthritis" appears on the screen. Fortunately, speech recognition products can be trained to understand their users' pronunciation. I have personally known people with severely compromised speech who have successfully trained their computers to understand them, but it required extraordinary patience and time.

To be a successful user of speech recognition, you need to be able to identify when the computer has made a mistake, and you have to correct it. Otherwise its mistake gets reinforced, and it learns incorrectly. Think of speech recognition as an infant - it's preprogrammed to understand language, but doesn't understand anything yet, and won't understand anything until its parent (you) works with it, teaches it, and corrects its mistakes.

With each new version, speech recognition products become more accurate. Given the potential consumer market for speech recognition, and the potential boost to productivity (we speak 150 words per minute - very few people can type that fast), the federal government and many private companies are dedicating extensive time and money to continued research and development. Someday, everyone will be talking to their computers, and their computers will understand them, or will be intelligent enough to ask for clarification.

Until then, we still have to put in considerable time and effort to get speech recognition products to work for us, but doing so can save considerable time in the end, and allow many individuals with disabilities to create documents much more quickly than they otherwise could.

Twenty-three years after it was founded, Dragon continues be the leading consumer speech recognition product, though it has bounced around a bit over the years. In 2000, Dragon Systems was acquired by Lernout & Hauspie (L&H). In 2001, L&H's speech recognition products (including their recently acquired Dragon products) were acquired by Scansoft. On October 18, 2005, Scansoft changed its name to Nuance. Despite the name change, speech recognition continues to be at the forefront of Nuance's product offerings. The latest version of Dragon Naturally Speaking is version 8, which comes in three primary versions: standard, preferred, and professional. The price goes up with each of these versions, but so does the functionality. The Nuance website includes a Feature Comparison so you can see which version is right for you (http://www.nuance.com/naturallyspeaking/matrix/).

Speech recognition is also available in Microsoft Office 2002 and higher. It's not as feature-rich as Dragon Naturally Speaking, but it allows you to dictate text into any Office program, as well as select menus and other Office program features. It isn't installed by default, but to try it, select Tools > Speech from the Microsoft Word menu. You will be prompted to install speech, and to train Microsoft Speech to recognize your voice.

For Mac users, Mac OS X provides speech recognition abilities out of the box, though it is only capable of understanding spoken commands for controlling applications. It doesn't do dictation. For dictation you have to buy a separate product such as iListen™ (http://www.macspeech.com/) or IBM ViaVoice™ for Mac OS X (http://www.nuance.com/viavoice/).

In his keynote speeches, Bill Gates is known to hype the Conversational User Interface (CUI), pronounced "cooey". Like many technological visionaries, he believes that someday we will all be interacting conversationally with our computers as if they were human beings. Today's speech recognition products are not particularly good at conversation. But they are useful tools, which given an investment of time and patience, can make a significant difference in the ability of many individuals to use the computer and efficiently compose documents.