Learning Stochastic Grammars Stephen M. Omohundro NEC Research Institute, Inc. The last decade has seen a remarkable convergence in the approaches used in such diverse fields as neural networks, speech understanding, machine learning, control theory, computational learning theory, statistics, decision analysis, and image analysis. In each case, the underlying models can be interpreted as probability distributions and the fundamental operations can be interpreted in statistical terms. There are now powerful formalisms (such as Bayesian, Markov, and neural networks) for representing distributions with conditional independence relationships over fixed sets of random variables. Many important application domains (eg. vision, speech, theorem proving, compilation etc.) have state spaces that don't naturally decompose into a fixed set of random variables. To apply probabilistic techniques to these domains, we must develop more powerful classes of computationally tractable models. In this talk, I'll present stochastic grammars as an example of such a richer class of models. "Hidden Markov Models" are the probabilistic analog of regular grammars. They have been widely used in cryptography, speech recognition, and computational biology. The "forward-backward" algorithm efficiently answers statistical inference questions using dynamic programming. For a fixed structure, the Baum-Welch algorithm can be used to learn effective parameters from a set of training strings. "Stochastic Context-free Grammars" are the probabilistic analog of context-free grammars. The "inside-outside" algorithm can be used for inference and for a fixed grammar structure, the Baum-Welch learning algorithm can again be used to set parameters. Ultimately, we would like machine learning systems to automatically discover the important abstractions in a domain and the structure of the relationships between those abstractions. For stochastic grammars, this corresponds to finding the appropriate set of non-terminal symbols and the structure of grammar productions which relate them. I'll describe an algorithm by Andreas Stolcke and myself that we call "model merging" for learning the structure of Hidden Markov Models and stochastic regular grammars from training strings. It works by initially treating the training data itself as a grammar and then repeatedly merging states to form the non-terminal structure of the grammar. I'll discuss connections between stochastic grammars and graphical probability models and argue for the need for richer structures which encompass certain properties of both classes. -------------------------------------------------------------------------- Stephen M. Omohundro received his Ph.D. in Physics from U. C. Berkeley in 1985. He then went to Thinking Machines Corporation where he codesigned Star Lisp, a parallel programming language for the Connection Machine. In 1986 he joined the faculty of the computer science department at the University of Illinois at Champaign-Urbana where he also wrote the three-dimensional graphics code for Mathematica. In 1988 he moved to the International Computer Science Institute in Berkeley where his research projects included: efficient geometric learning algorithms, learning stochastic grammars, a vision system for lip reading, and the object-oriented language Sather. In 1995 he joined the NEC Research Institute in Princeton where he is applying machine learning to problems in software engineering and vision.