This page contains the abstracts and accompanying material for the talks in the final programme, as well as Larry Wasserman's talk which was unfortunately cancelled at the last minute (at bottom).

SATURDAY MORNING SESSION       ...other sessions: afternoon cancelled

7:30 - 7:40

Opening Remarks - Matt Beal and Yee Whye Teh

7:40 - 8:10

Introductory Tutorial on Nonparametric Methods and Infinite Models
Radford M. Neal (Toronto)

Talk slides: [pdf]

8:15 - 8:35

Hierarchical models with multiple mixtures of Dirichlet processes
Michael Escobar, George Tomlinson and Christine McLaren (Toronto, Toronto, UCI)

This talk will discuss the use of hierarchical models with multiple layers of mixtures of Dirichlet processes. For example, with modern diagnostic equipment, one can measure the individual cell sizes from a sample of ones blood. One can then use a mixture of Dirchlet process to model the distribution cell sizes for each distribution. This model is somewhat equivalent to putting a distribution on the family of kernel density estimates. Escobar and West (1995) showed how the kernel density estimator approximates a Bayesian method of estimating denisties based on a mixture of Dirichlet processes (MDP). However, since the MDP method is a proper Bayesian model, one can use hierarchical priors and calculate posterior distributions of functionals of interest.

Using these techniques, we develop a highly flexible hierarchical model in the space of distributions. This model allows us to model samples of densities and to find outliers in the space of distributions. This talk will discuss the methods used to compute these models and to assess outliers. These techniques will be used to identify diseased subjects based on the distribution of the size of the subject's red blood cells.
Extended abstract: [pdf]

8:40 - 9:00

The Hierarchical Dirichlet Process
Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei (Berkeley x2, Toronto, Berkeley)

Certain interesting data sets can naturally be grouped into subsets, each of which having its own distinct features. Some of these data sets can be well characterized as having arisen from a model consisting of several mixture models, and the aim is to extract features (mixture components) that explain the data well.
Dirichlet processes (DPs) are often used as a nonparametric replacements for mixture models, but there is the problem that naively combining independent DPs as components of a multiple mixtures model produces a probabilistically unsound model. We describe the hierarchical Dirichlet process (HDP), a new Bayesian nonparametric model which is well suited to this modelling task and can solve this problem. An HDP consists of a local Dirichlet process mixture model for each subset of the data, and a global Dirichlet process which describes the properties of the whole data set, and which ties together the different local Dirichlet processes. We explore the properties of this model, give Markov chain Monte Carlo sampling methods for posterior inference, and show it working on some problems.
Talk slides: [ppt]

9:05 - 9:15

Break

9:15 - 9:35

Application of nonparametric Bayesian methods in genetic inference
Eric P. Xing, Roded Sharan, Michael I. Jordan (Berkeley)

The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. In this paper we present a novel statistical model for haplotype inference. Our model is a Bayesian model based on a prior known as the Dirichlet process, a nonparametric prior which provides control over the size of the unknown pool of population haplotypes. The model also incorporates a likelihood that allows statistical errors in the haplotype/genotype relationship, trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference. The overall result is a flexible Bayesian model that is reminiscent of parsimony methods in its preference for small haplotype pools. We apply this new approach to the analysis of both simulated and real genotype data, and compare to extant methods.
Extended abstract: [ps]

 9:40 - 10:00

Approximate Solutions to Nonparametric Bayesian Hierarchical Modelling with Applications to Information Filtering
Volker Tresp, Kai Yu and Anton Schwaighofer (Siemens AG, Munich, Graz U.of Technology)

In some cases---for example in applications involving hierarchical Bayes---one learns a ``prior'' parameter distribution from repeated experiments. It often occurs that the ``learned prior'' does not correspond to a family of distributions which can easily be specified. In this paper, we present a solution to this problem by formulating our prior in terms on an infinite-dimensional Dirichlet process. This usage of a Dirichlet process in a statistical model is sometimes referred to as Dirichlet enhancement and is a main approach in nonparametric hierarchical Bayesian modeling. Typically, nonparametric hierarchical modeling relies on an efficient implementation of Gibbs sampling. We demonstrate how Gibbs sampling can be used in our context. In addition we present two novel non sample-based approximations. The first approximation is based on a MAP approximation using an EM algorithm. The second one is based on a variational approximation. We apply a Dirichlet enhanced hierarchical model to information filtering where nonparametric hierarchical modeling allows the principled combination of both content filtering and collaborative filtering. We demonstrate the effectiveness of our approximation using two applications: the retrieval of art images and the information filtering of Reuters news data.
Extended abstract: UAI paper [pdf]

10:05 - 10:25

Extended Bayesian Statistical Inference and Renormalization Group
Toshiaki Aida (Okayama Japan)

The work discusses the improvement and the generalization of the predictive distribution in Bayesian statistical inference in a most general setting.
(1) I have made clear what is the origin of the effectiveness of Bayesian statistical inference, applying a field theoretical argument to a few examples of non-parametric models, which also let us know the connection between Bayesian framework and renormalization group.
(2) Following the above, the predictive distribution in Bayesian statistical inference is generalized from the point of view of renormalization group, which is similar (but not equal ) to the modification of a prior distribution and leads to a general prescription in order to obtain faster decay of prediction error.
(3) These are discussed generally without our assuming any specific models. Our argument is so general that it can be applied to any type of models: non-parametric, parametric, linear, non-linear...
Extended abstract: [ppt]

SATURDAY AFTERNOON SESSION       ...other sessions: morning cancelled

4:00 - 4:10

Welcome back

4:10 - 4:30

Nonparametric Bayesian models for semi-supervised learning
Charles Kemp, Sean Stromsten, Tom Griffiths, Josh Tenenbaum (MIT, Stanford)

We describe nonparametric Bayesian approaches to generalizing class labels from few labeled examples, guided by a much larger set of unlabeled examples. We posit some (potentially infinite) latent structure underlying both the observed features and the unobserved class labels, which allows the unlabeled examples to influence how class labels are generalized from labeled examples. In one approach, we assume a latent tree-structure to the domain. The tree (or a distribution over trees) may be inferred using the unlabeled data. A prior over concepts generated by a mutation process on the inferred tree(s) allows efficient computation of the optimal Bayesian classification function from the labeled examples. This approach performs well on real-world datasets and extends naturally to handle two difficult problems: learning from very sparse data, and learning from positive examples only. Time permitting, we will also discuss an approach to Bayesian semi-supervised learning with text data, based on an infinite version of the Latent Dirichlet Allocation (LDA) topic model.

4:35 - 4:55

Some Remarks about Bayesian Infinite Regression and Gaussian Processes
Carl Edward Rasmussen (Max Planck, Tübingen)

Talk slides: [pdf]

5:00 - 5:10

Break

5:10 - 5:30

Expectation Propagation for Infinite Mixtures
Thomas P. Minka and Zoubin Ghahramani (CMU, Gatsby)

I will describe a method for approximate inference in infinite models that uses deterministic Expectation Propagation instead of Monte Carlo. For infinite Gaussian mixtures, it provides cluster parameter estimates, cluster memberships, and model evidence. Model parameters, such as the expected size of the mixture, can be efficiently tuned via EM with EP as the E-step. The same approach can apply to other infinite models such as infinite HMMs.
Extended abstract: [pdf]
Talk slides: [pdf]
Tom Minka's page for this work.

5:35 - 5:55

Variational Approximations for the Truncated Dirichlet Process
David M. Blei and Michael I. Jordan (Berkeley)

The truncated Dirichlet process is a finite approximation to the full Dirichlet process based on its stick-breaking construction (Sethuraman, 1994). Ishwaran and James (2001) use this distribution to develop a blocked Gibbs sampling algorithm for nonparametric Bayesian inference. We develop a deterministic mean-field variational approximation algorithm for the same model. This approach is easily applied to nonparametric mixture models where the prior distribution on the mixture components is conjugate to the data distribution. We demonstrate that, in high dimensions, this technique is faster than both the blocked and full Dirichlet process Gibbs samplers.
Extended abstract: [pdf]

6:00 - 7:00

Panel Discussion

including:
  • Michael Escobar (Toronto)
  • Zoubin Ghahramani (Gatsby, London UK)
  • Michael Jordan (Berkeley)
  • Radford Neal (Toronto)
  • Carl Rasmussen (Max Planck, Tübingen)
  • 7:00

    Closing Remarks - Matt Beal and Yee Whye Teh

    CANCELLED TALKS       ...other sessions: morning afternoon

    4:35 - 4:55

    Frequentist Properties of Infinite Dimensional Bayes
    Larry Wasserman (CMU)

    Nonparametric Bayesian methods involve placing a prior on an infinite dimensional space and computing the posterior. I will argue that it is essential to examine the frequentist properties of these Bayesian methods since the prior cannot be specified with complete confidence. We look at three properties: consistency, rate optimality and correct coverage. Many priors are consistent, fewer are rate optimal and whether any yield correct coverage remains an open question. I will then discuss some recent frequentist methods for nonparametric inference that do yield correct coverage. The latter is joint work with Chris Genovese.
    Extended abstract: [ps.gz]
    Talk slides: [ps.gz]