Statistical language modelling
Goal: Model the distribution of the next word in
a sentence.
N-grams are the most widely used statistical
language models.
They are simply conditional probability tables
estimated by counting n-tuples of words.
Curse of dimensionality: lots of data is needed if n
is large.