\bio{D. J. C. MacKay}{was born in Britain in 1967. He received the B.A. degree in
Natural Sciences (Physics) from Trinity College, Cambridge in
1988. He then studied as a Fulbright scolar at Caltech for a
doctorate which was awarded in 1992. After spending
four years as the Royal
Society Smithson Research Fellow at Darwin College,
Cambridge, he became a lecturer in the Department of
Physics at the University of Cambridge. He was promoted to
a readership in 1999.
He has published papers on a wide variety of topics
including Bayesian methods, adaptive models and error-correcting codes.}
\begin{biography}{David J.C. MacKay}
was born in Britain in 1967. He received the B.A. degree in
Natural Sciences (Physics) from Trinity College, Cambridge in
1988. He then studied as a Fulbright scolar at Caltech for a
doctorate which was awarded in 1992. After spending
four years as the Royal
Society Smithson Research Fellow at Darwin College,
Cambridge, he became a lecturer in the Department of Physics of
the University of Cambridge in 1995, where he is now a Professor.
He has published papers on Bayesian
methods for adaptive models, on the application of neural
network methods to industrial data modelling problems,
on language modelling and protein sequence modelling, on cryptanalysis
and coding theory, and on Hebbian learning. He is curently writing
a textbook on information theory, inference and learning algorithms.
\end{biography}
Name: David J. C. MacKay
Title: Dr.
Affiliations: Does this mean membership of learned societies? If so, none.
If this means `whom do I work for' then it's
Department of Physics
University of Cambridge
I am also a Fellow of Darwin College, Cambridge.
address: Cavendish Laboratory,
Madingley Road,
Cambridge CB3 0HE. U.K.
email: mackay@mrao.cam.ac.uk
www: http://www.inference.phy.cam.ac.uk/mackay/
tel: +44 1223 339852 fax: 354599 home: 276411
My first research work was in 1986 at RSRE Malvern, where I was given the task of testing high precision digitizers statistically, by putting in uniformly distributed random voltages and looking at the distribution of digital read-outs.
That summer I sent a contribution to an Inst of Physics magazine giving a solution to the problem of constructing a spiral mirror. Other topics that amused me at that time were
I presented this work as a poster at a meeting at RSRE late in 1987. I also showed a poster on the idea of exploring constant energy surfaces in weight space for neural networks with more parameters than data points.
In 1988 I went to Caltech. My first research project was to investigate what was going on in Linsker's models of Hebbian learning. After a couple of months of work I teamed up with Ken Miller and published a series of papers on this.
I also did some work with Seymour Benzer on the expression of beta-galactosidase in Drosophila which had had a p-element inserted in the genome. I looked at polka-dot expression patterns in the brains of larvae and adults and found a few interesting patterns, including one strain in which the stained cells appeared to be associated with the optic chiasm of the adult.
Someone in Benzer's group showed me the paper by Delbruck on estimation of a mutation rate in an exponentially growing bacterial population. Their estimator was clearly unreliable, but at the time I still had not fully absorbed Bayesian ideas, and all I did was suggested a collection of alternative estimators, some of which had better sampling properties.
I thought about Bayesian methods and Maximum Entropy on and off. I was aware of a difficulty with setting `alpha' and I came up with a derivation of `alpha=1' which Skilling rightly shot down as nonsense.
I came up with a maximum entropy / Bayesian viewpoint for Hopfield networks and solved a special case (a cycle-free network); I erroneously believed I had solved a more general case for a time.
At this time I did know that Bayesian methods could help with complexity control and model comparison. In 1990, and maybe earlier, I decided to write a review paper explaining how minimum description length and Bayesian methods were equivalent (if MDL was done carefully), so I wrote a paper on Occam factors. I also wrote software to do mixture-of-Gaussians modelling, and density modelling with some other distributions. For each model I fitted a Gaussian approximation to the posterior and computed the evidence.
Attending Maxent 90 was a key moment for me; there I consolidated what I knew about Bayesian complexity control, and David Robinson told me a lot of details about the hierarchical modelling for image reconstruction.
At the end of 1990 I decided to implement Bayesian methods for neural networks (something I had been telling mlp users to do for some time). By April 1991 I had written two papers on regression problems (`Bayesian Interpolation' and `A Practical Bayesian Framework...') which I presented at Snowbird (my first talk).
I decided to do classifiers next, and a friend encouraged me to do something on active data selection. My data selection paper, which I rattled out rather hastily, contained errors and failures to cite the literature which were fortunately caught (with the help of a referee) before publication of my thesis.
Radford Neal got in touch with me on April 17th 1990 and we started our long-lasting discussion of Bayesian methods for neural nets, and many other topics. I visited Toronto for the first time in November just before NIPS.
Maxent 91 in Seattle was an enjoyable meeting, with stimulating discussions with John Skilling. I gave a talk at NIPS in 1991, and organized a workshop with Steve Nowlan.
In 1992, back in Cambridge, I worked on defending my thesis against the attacks of Wolpert and others. I did this by writing review papers, writing a paper on optimization versus `integrating out' hyperparameters, and entering the ASHRAE prediction competition using my software. I also looked into the immense size of chromatic aberration in the human eye.
At Snowbird in 1992, Peter Brown told me how IBM's `smoothing' method works for language modelling, and I worked out a Bayesian hierarchical model alternative.
I also started working on latent variable models for proteins. (Density networks).
In 1993, my paper on the ASHRAE prediction competition (which I won), and my paper with Radford on `Automatic Relevance Determination' were rejected by NIPS.
In 1993 I was asked to look at a cryptanalysis problem; this brought me into information theory and coding theory. I came up with a variational free energy minimization algorithm motivated by Cheeseman's description of solving the colouring problem. This marked the start of my work on ensemble adapting methods, which I was taught about by Radford.
Radford Neal and I then discussed how to use this solution to solve new problems. We were interested in the challenge of getting closer to the Shannon limit for error-correcting codes. We invented some new codes, of which MN codes seemed especially interesting - using a sparse source to introduce redundancy, instead of extra parity bits. In July we came up with the belief propagation decoding method. In November 1994 we came up with Gallager's codes. In December we realised we had rediscovered Gallager.
In 1995, I wrote a paper on Ensemble Learning and Evidence Maximization.
In 1996, Sejnowski spoke about `ICA'. Shortly thereafter, I wrote down a (much simpler?) maximum likelihood derivation of ICA. In Erice in 1996, I had the idea of combining Jordan and Jaakkola's variational methods with Gaussian process classifiers and got Mark Gibbs to implement it.
In 1997, I wrote a paper on Ensemble Learning for Hidden Markov Models.
Meanwhile, back in the error-correcting code business, my work with Radford led a revival of interest in Gallager's codes, and in 1999, Gallager received a gold medal at the information theory symposium. Matthew Davey and I came up with some enhancements to Gallager codes that turned them into record-breakers. We demonstrated that Gallager codes could outperform Reed-Solomon codes, and IBM are now considering using Gallager codes in disc drives.
Since 1999, a major new research project has been the development in my group of Dasher, a keyboard alternative that is intended to be information-efficient, by both making more efficient use of human gestures (so that only one finger or one eye is needed to communicate rapidly), and making use of integrated language models that exploit the predictability of one's language.
In December 2000, my research group won Hopfield and Brody's `mouse brain' competition.
In 2001, I started working with Graeme Mitchison on transferring successful ideas from the field of classical error correcting codes to the next-door field of quantum error-correction.
In 2002, the Gatsby charitable foundation gave me a Senior Research Fellowship to allow me to devote more time to research. Fingers crossed! New interests include computation using spike-timing in networks of spiking networks, and Go-playing algorithms.
In 2003 I finished my textbook on Information theory, Inference, and Learning Algorithms.