Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Reliable and scalable variational inference for nonparametric mixtures, topics, and sequences

Description

Abstract:
We develop new algorithms for training nonparametric clustering models based on the Dirichlet Process (DP), including DP mixture models, hierarchical Dirichlet process (HDP) topic models, and HDP hidden Markov models. These Bayesian nonparametric models allow coherent comparisons of different clusterings of a given dataset. The nonparametric approach is particularly promising for large-scale applications, where other model selection techniques like cross-validation are too expensive. However, existing training algorithms fail to live up to this promise. Both Monte Carlo samplers and variational optimization methods are vulnerable to local optima and sensitive to initialization, especially the initial number of clusters. Our new algorithms can reliably escape poor initializations to discover interpretable clusters from millions of training examples. For the DP mixture model, we pose a variational optimization problem in which the number of instantiated clusters assigned to data can be adapted during training. The focus of this optimization is an objective function which tightly lower bounds the marginal likelihood and thus can be used for Bayesian model selection. Our algorithm maximizes this objective score via block coordinate ascent interleaved with proposal moves that can add useful clusters to escape local optima while removing redundant or irrelevant clusters. We further introduce an incremental algorithm that can exactly optimize our objective function on large datasets while processing only small batches at each step. Our approach uses cached or memoized sufficient statistics to make exact decisions for proposal acceptance or rejection. This memoized approach has the same runtime cost as previous stochastic methods but allows exact acceptance decisions for cluster proposals and avoids learning rates entirely. We later extend these algorithms to HDP topic models and HDP hidden Markov models. Previous methods for the HDP have used zero-variance point estimates with problematic model selection properties. Instead, we find sophisticated solutions to the non-conjugacy inherent in the HDP that still yield an optimization objective function usable for Bayesian model selection. We demonstrate promising proposal moves for adapting the number of clusters during memoized training on millions of news articles, hundreds of motion capture sequences, and the human genome.
Notes:
Thesis (Ph.D. -- Brown University (2016)

Access Conditions

Rights
In Copyright
Restrictions on Use
Collection is open for research.

Citation

Hughes, Michael C., "Reliable and scalable variational inference for nonparametric mixtures, topics, and sequences" (2016). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.7301/Z05Q4TH1

Relations

Collection: