We propose a general-purpose probabilistic framework for scene understanding tasks. We show that several classical scene understanding tasks can be modeled and addressed under a common representation, approximate inference scheme, and learning algorithm. We refer to this approach as the Probabilistic Scene Grammar (PSG) framework. The PSG framework models scenes using probabilistic grammars which capture relationships between objects in terms of compositional rules that provide important contextual cues for inference with ambiguous data. We show how to represent the distribution defined by a probabilistic grammar using a factor graph. We also show how to estimate the parameters of a grammar using an approximate version of Expectation-Maximization, and describe an approximate inference scheme using Loopy Belief Propagation with an efficient message-passing scheme. Inference with Loopy Belief Propagation naturally combines bottom-up and top-down contextual information and leads to a robust algorithm for aggregating evidence. To demonstrate the generality of the approach, we evaluate the PSG framework on the scene understanding tasks of contour detection, face localization, and binary image segmentation. The results of the PSG framework are competitive with algorithms specialized for these scene understanding tasks.
Chua, Jeroen,
"Probabilistic Scene Grammars: A General-Purpose Framework For Scene Understanding"
(2017).
Computer Science Theses and Dissertations.
Brown Digital Repository. Brown University Library.
https://doi.org/10.26300/kqr4-7162