Glove-Talk II: A Neural Network Interface Which Maps
  Gestures to Parallel Formant Speech Synthesizer Controls
  Sidney Fels and Geoffrey Hinton 
  Department of Computer Science 
  University of Toronto & University of Toronto 
  Toronto, ON, Canada, M5S 1A4
  
  Abstract
  Glove-Talk-II is a system which translates hand gestures to speech
  through an adaptive interface. Hand gestures are mapped continuously to 10 control
  parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as
  an artificial vocal tract that produces speech in real time. This gives an unlimited
  vocabulary, multiple languages in addition to direct control of fundamental frequency and
  volume. Currently, the best version of Glove-TalkII uses several input devices (including
  a Cyberglove, a Contact Glove, a polhemus sensor, and a foot-pedal), a parallel formant
  speech synthesizer and 3 neural networks. The gesture-to-speech task is divided into vowel
  and consonant production by using a gating network to weight the outputs of a vowel and a
  consonant neural network. The gating network and the consonant network are trained with
  examples from the user. The vowel network implements a fixed, user-defined relationship
  between hand-position and vowel sound and does not require any training examples from the
  user. Volume, fundamental frequency and stop consonants are produced with a fixed mapping
  from the input devices. One subject has trained for about 100 hours to speak intelligibly
  with Glove-TalkII. He passed through eight distinct stages while learning to speak. He
  speaks slowly with speech quality similar to a text-to-speech synthesizer but with far
  more natural-sounding pitch variations. 
   Download [ps] [pdf]
  [home page]  [publications]