Steve's Explanation of Shrinkage

Shrinkage is a technique used to help train a Hidden Markov Model (HMM) under sparse training data conditions. When such conditions are present, it can be difficult to train an HMM to emit tokens (such as words in a sentence or sounds in a speech signal), since the model did not have enough data to build up a good representation of what to expect.

Shrinkage remedies this by grouping certain states in the HMM together, based on their similar functions or expected tokens, and represents this similarity as a parameter value. An example of a kind of grouping are states that are the prefix to a name (e.g. The Right Honourable Member of Parliament Sir blah blah blah...). Several parameters can be used to represent different groupings in the model, and the resulting set of parameters forms a sort of hierarchy, with the HMM states as the leaves and the parameters as the parent nodes. Parent nodes that are closer to the root of this hierarchy represent the more general groupings of states, while nodes near the leaves of this hierarchy correspond to more specific groupings.

The way this helps is that when a token (word, speech fragment, etc.) is observed by the model in training, the training is distributed over the HMM node and all its parents. A different set of parameters is used to determine how the training is distributed across the grouping parameters. The result of this training is that the grouping parameters will have been trained with more cases than the individual HMM states, and uses this "experience" to help the HMM states to classify the tokens they encounter during normal operation. This is analogous to managers who knows the work of workers who only know a particular task. The manager can lend expertise to workers who don't know how to handle a particular situation that another worker has encountered before.

This explanation is derived from my interpretation of Information Extraction with HMMs and Shrinkage, by Dayne Freitag and Andrew McCallum