Learning Stochastic Grammars

                Stephen M. Omohundro
            NEC Research Institute, Inc.

The last decade has seen a remarkable convergence in the approaches
used in such diverse fields as neural networks, speech understanding,
machine learning, control theory, computational learning theory,
statistics, decision analysis, and image analysis. In each case, the
underlying models can be interpreted as probability distributions and
the fundamental operations can be interpreted in statistical terms.
There are now powerful formalisms (such as Bayesian, Markov, and
neural networks) for representing distributions with conditional
independence relationships over fixed sets of random variables. Many
important application domains (eg. vision, speech, theorem proving,
compilation etc.) have state spaces that don't naturally decompose
into a fixed set of random variables. To apply probabilistic
techniques to these domains, we must develop more powerful classes of
computationally tractable models.

In this talk, I'll present stochastic grammars as an example of such a
richer class of models. "Hidden Markov Models" are the probabilistic
analog of regular grammars. They have been widely used in
cryptography, speech recognition, and computational biology. The
"forward-backward" algorithm efficiently answers statistical inference
questions using dynamic programming. For a fixed structure, the
Baum-Welch algorithm can be used to learn effective parameters from a
set of training strings. "Stochastic Context-free Grammars" are the
probabilistic analog of context-free grammars. The "inside-outside"
algorithm can be used for inference and for a fixed grammar structure,
the Baum-Welch learning algorithm can again be used to set parameters.

Ultimately, we would like machine learning systems to automatically
discover the important abstractions in a domain and the structure of
the relationships between those abstractions. For stochastic grammars,
this corresponds to finding the appropriate set of non-terminal
symbols and the structure of grammar productions which relate them.
I'll describe an algorithm by Andreas Stolcke and myself that we call
"model merging" for learning the structure of Hidden Markov Models and
stochastic regular grammars from training strings. It works by
initially treating the training data itself as a grammar and then
repeatedly merging states to form the non-terminal structure of the
grammar. I'll discuss connections between stochastic grammars and
graphical probability models and argue for the need for richer
structures which encompass certain properties of both classes.

--------------------------------------------------------------------------

Stephen M. Omohundro received his Ph.D. in Physics from U. C. Berkeley
in 1985. He then went to Thinking Machines Corporation where he
codesigned Star Lisp, a parallel programming language for the
Connection Machine. In 1986 he joined the faculty of the computer
science department at the University of Illinois at Champaign-Urbana
where he also wrote the three-dimensional graphics code for
Mathematica. In 1988 he moved to the International Computer Science
Institute in Berkeley where his research projects included: efficient
geometric learning algorithms, learning stochastic grammars, a vision
system for lip reading, and the object-oriented language Sather. In
1995 he joined the NEC Research Institute in Princeton where he is
applying machine learning to problems in software engineering and
vision.