A basic problem in speech recognition
We cannot identify phonemes perfectly in noisy speech
The acoustic input is often ambiguous: there are
several different words that fit the acoustic signal
equally well.
People use their understanding of the meaning of the
utterance to hear the right word.
We do this unconsciously
We are very good at it
This means speech recognizers have to know which
words are likely to come next and which are not.
Can this be done without full understanding?