We develop statistical methods for analyzing natural images, videos, motion capture (MoCap) sequences, and three-dimensional (3D) representations of articulated objects. Our goal is to discover and characterize regions, objects, actions, and the parts composing them. Such data typically exhibit wide variability in complexity, with some instances containing only a few objects (parts) and others exhibiting complex structure. Further, images and 3D object representations have strong spatial correlations, while MoCap and video sequences additionally exhibit temporal dependencies. Effective models for such data must automatically reason about the number of constituent objects and parts, while simultaneously modeling strong spatio-temporal interactions. Motivated by these challenges, we study and extend flexible Bayesian nonparametric priors. Focusing first on images, we explore a family of models that generalize the Pitman-Yor (PY) process to produce decompositions of images into depth-ordered segments (layers). Spatial correlations are captured through an ordered set of Gaussian processes that encourage piecewise smooth allocation of pixels to segments. We develop variational methods for effective learning and robust inference, and demonstrate competitive performance on standard image segmentation benchmarks. Next, we explore the distance dependent Chinese restaurant process (ddCRP), a distribution over partitions that allows user-specified affinity functions to capture dependencies between data instances. We show that a statistical model endowed with a ddCRP prior, and an expressive likelihood for modeling deformations, produces state-of-the-art segmentations of articulated 3D objects. We then develop a family of hierarchical ddCRP priors that allow dependencies both between data instances and their latent clusters. Coupled with vector auto-regressive likelihoods, this hierarchical ddCRP successfully discovers activities from related MoCap sequences. The performance of the distance dependent models crucially depends on the choice of the affinity functions. Designing functions that capture appropriate domain specific dependencies can be challenging. We develop extensions to the distance dependent models and borrow ideas from the approximate Bayesian computation (ABC) literature to develop algorithms for learning affinity functions from human annotated data. Through extensive experiments on image and video segmentation corpuses, we demonstrate that the learned models consistently outperform their hand-crafted counterparts.
Ghosh, Soumya,
"Bayesian Nonparametric Discovery of Layers and Parts from Scenes and Objects"
(2015).
Computer Science Theses and Dissertations.
Brown Digital Repository. Brown University Library.
https://doi.org/10.7301/Z0NZ8621