Speech Interface

It seems to me…

Computers are getting smarter all the time.  Scientists tell us that soon they will be able to talk to us.  (And by ‘they’, I mean ‘computers’.  I doubt scientists will ever be able to talk to us.)”  ~ Dave Barry.

All currently available methods of interaction with any type of computing device remains primitive and in need of considerable improvement.  The recent gesture and application-oriented user interface on notepad computers represent only a partial improvement over more traditional WIMP (window, icon, menu, and pointing device) interfaces.  In case no one noticed, humans normally communicate by talking; only the hearing-impaired communicate using their hands (though I’ve noticed some people use their hand while driving to communicate with other drivers though that probably implies mental rather than hearing impairment).

Voice control is slowly being integrated onto phones and notepads permitting mapping searches, location navigation, calling people/businesses, sending texts/e‑mails, and browsing to Web sites.  Voice input is significantly faster than typing since most people are able to speak more rapidly than they can type (unfortunately, especially when they have nothing to say…).

The average user of recent versions of Microsoft Windows may not even realize Speech Recognition came as part of their operating system since it is not on the Start menu.  The voice-to-text engine found in MS Vista Speech Recognition is powerful, intuitive, and extensive.

Apple is heavily integrating voice control into its new IPhone after an acquisition last year made it possible.  An intelligent assistant,known as “Siri”, built into the IPhone allows you to make things happen simply by asking your phone to do something.  Androids provide a similar but much less capable voice function.

Speech recognition is the process of capturing spoken words using a sound input device such as a microphone or telephone and converting them into digitized words or commands.  The first speech recognizer appeared in 1952 and consisted of a device only able to recognize single spoken digits but this capability has improved significantly over the years.  Many electronic systems and software now can accept full conversational-speed speech input which then is translated into computer-recognizable form using current speech recognition software.

More than just speech recognition, speech understanding is the ultimate interface but will have to wait for further advances in artificial intelligence.  ELIZA, a computer program written at MIT by Joseph Weizenbaum prior to 1966 was an early example of primitive natural language processing and one of the first chatterbots ever written (and still available for download).  Though ELIZA used typed text input, it provided an elementary illusion of a conversational capability.

Current improvements in full natural language (speech) recognition and additional device availability provides increased incentive for research leading to eventual support for at least some form of limited conversation interaction.  Some day, hopefully in the not too distant future, we will have our own personal digital assistant able to respond to our spoken commands and request additional clarification when necessary.

On second thought, this might not be such a great idea.  Now when you walk into a Starbucks, there only is an occasional person on their computer talking to someone using Skype.  It always is obvious what they are doing since they are sitting by themselves with no one at adjoining tables talking sufficiently loud that the person to whom they are talking probably could hear them if they stepped outside.  Everyone has to listen to every word said: “Did you hear about the dumb thing that…”.  No, and we really do not want to.

Think what it would be like to have 100 or so Starbuck customers sipping their tall half-skinny half-1 percent extra hot split quad shot (two shots decaf, two shots regular) latte with whip while telling their computer what to do (or more likely – where to go).

Ah, yeh!  On second thought, let’s think about this for awhile…

That’s what I think, what about you?

Advertisements

About lewbornmann

Lewis J. Bornmann has his doctorate in Computer Science. He became a volunteer for the American Red Cross following his retirement from teaching Computer Science, Mathematics, and Information Systems, at Mesa State College in Grand Junction, CO. He previously was on the staff at the University of Wisconsin-Madison campus, Stanford University, and several other universities. Dr. Bornmann has provided emergency assistance in areas devastated by hurricanes, floods, and wildfires. He has responded to emergencies on local Disaster Action Teams (DAT), assisted with Services to Armed Forces (SAF), and taught Disaster Services classes and Health & Safety classes. He and his wife, Barb, are certified operators of the American Red Cross Emergency Communications Response Vehicle (ECRV), a self-contained unit capable of providing satellite-based communications and technology-related assistance at disaster sites. He served on the governing board of a large international professional organization (ACM), was chair of a committee overseeing several hundred worldwide volunteer chapters, helped organize large international conferences, served on numerous technical committees, and presented technical papers at numerous symposiums and conferences. He has numerous Who’s Who citations for his technical and professional contributions and many years of management experience with major corporations including General Electric, Boeing, and as an independent contractor. He was a principal contributor on numerous large technology-related development projects, including having written the Systems Concepts for NASA’s largest supercomputing system at the Ames Research Center in Silicon Valley. With over 40 years of experience in scientific and commercial computer systems management and development, he worked on a wide variety of computer-related systems from small single embedded microprocessor based applications to some of the largest distributed heterogeneous supercomputing systems ever planned.
This entry was posted in Artificial, Intelligence, Interface, Joseph Weizenbaum, Recognition, Speech, Speech, Understanding and tagged , , , , , , , , , , . Bookmark the permalink.

7 Responses to Speech Interface

  1. Pingback: End of Computers | Lew Bornmann's Blog

  2. mercadee says:

    Speech recognition (also known as automatic speech recognition, computer speech recognition, speech to text, or just STT) converts spoken words to text. The term “voice recognition” is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech.

    • lewbornmann says:

      Speech recognition technology has rapidly advanced in recent years but significant limitations remain in just how far it can be extended without comparable advances in artificial intelligence. I’m amazed at just how far the technology has advance since I wrote several text interpretation programs in the 1960s while in graduate school.

  3. dieta says:

    Speech recognition (also known as automatic speech recognition, computer speech recognition, speech to text, or just STT) converts spoken words to text. The term “voice recognition” is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech.

  4. Pingback: My Computer Understands Me… | Lew Bornmann's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s