Image Compression and Data Clustering: New Takes on Some Old Problems

Full Metadata


Image Compression and Data Clustering: New Takes on Some Old Problems
Wang, Wei-Ying (creator)
Geman, Stuart (Advisor)
Harrison, Matthew (Reader)
Bienenstock, Elie (Reader)
Brown University. Department of Applied Mathematics (sponsor)
Copyright Date
We explore two topics in this thesis: (1) entropy rate on images, and (2) robust generalized clustering method. On the first topic, we examine the idea behind lossless compression algorithms. Specifically, we look into the predictive context modeling method, which currently dominates this field. Without assuming the ergodicity, we prove that the optimal compression rate can be achieved with large context under stationary conditions and some small assumptions. The convergence can be specified by ergodic components of the underlying stationary probability, which generalizes the Shannon-McMillian-Breimen theorem to 2D non-ergodic source. To utilize the theorem and to examine the performance of the existing compression algorithms, we built a provably optimal image compression scheme CBIC, the comparison-based image compression algorithm. Experiments show that CBIC is slightly better then a state of the art algorithm, CALIC. Also, the compression rate of CBIC improved slowly when we scale up the context size, which indicates the existing image compression algorithms are already near optimal For the second topic, we propose an unsupervised clustering algorithm that is able to cluster to multiple linear structures (“instances”) within the data, by minimizing a suitable loss function. The loss function is built out of “sparse distances” between data points and structures, meaning that it encourages structures that pass through one or more data points. At the same time, the loss function is highly resistant to “outliers.” Moreover, in some situations the minimizer of the loss function is interpolating, implying that every instance is defined by a small and known number of data points. In these cases, an otherwise NP-hard problem becomes polynomial. Specifically, n data points in d dimension can be clustered to m (d-1 dimensional) hyperplanes in time that is exponential in m but polynomial in n.
Image compression
unsupervised clustering
Thesis (Ph. D.)--Brown University, 2017
xiii, 180 p.


Wang, Wei-Ying, "Image Compression and Data Clustering: New Takes on Some Old Problems" (2017). Applied Mathematics Theses and Dissertations. Brown Digital Repository. Brown University Library.