Seminar: "From here to infinity -- bridging finite and Bayesian nonparametric mixture models in model-based clustering"
AUEB STATISTICS SEMINAR SERIES OCTOBER 2021
Sylvia Frühwirth-Schnatter (Department of Finance, Accounting, and Statistics, WU Vienna University of Economics and Business, Vienna, Austria)
From here to infinity -- bridging finite and Bayesian nonparametric mixture models in model-based clustering
This talk reviews the concept of mixture models and their application for model-based clustering from a Bayesian perspective and discusses some recent developments.
Two broad classes of mixture models are available. On the one hand, finite mixture models are employed with a finite number K of components in the mixture distribution. On the other hand, Bayesian nonparametric mixtures, in particular Dirichlet process and Pitman-Yor process mixtures, are very popular. These models admit infinitely many mixture components and imply a prior distribution on the partition of the data, with a random number of data clusters. This allows to derive the posterior distribution of the number of clusters given the data which contains useful information regarding unobserved heterogeneity in the data.
One reason for the popularity of Bayesian nonparametric mixtures is the common belief that finite mixture models are different in this regard and that, by selecting the number K of components in the mixture distribution, the number of data clusters is automatically forced to be equal to K. However, recent research in finite mixture models has revealed surprising similarities between finite and Bayesian nonparametric mixture models. It has been shown that also for finite mixtures there exists a pronounced difference between the number of components in the mixture distribution and the number of clusters in the data, in particular, if the mixture model is overfitting.
The concentration parameter in the Dirichlet prior on the mixture weights is instrumental in this respect and, for appropriate choices, finite mixture models also imply a prior distribution on the partition of the data with a random number of data clusters. In addition, a prior can be put on the number K of components in the mixture distribution. This allows to infer simultaneously K and the number of data clusters from the data within the framework of generalized mixtures of finite mixtures, recently introduced by Frühwirth-Schnatter, Malsiner-Walli and Grün (arXiv preprint 2005.09918v2). This model class encompasses many well-known mixture modelling frameworks, including Dirichlet process and sparse finite mixtures. A new, generic MCMC sampler (called telescoping sampler) is introduced that allows straightforward MCMC implementation and avoids the tedious design of moves in common trans-dimensional approaches such as reversible jump MCMC.
(this talk is based on joint work with Jan Greve, Bettina Grün, and Gertraud Malsiner-Walli and supported by the Austrian Science Fund (FWF), grant P28740)