Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Classic and Modern Challenges in Statistical Estimation

Description

Abstract:
The rise of the Internet has generated and has enabled the collection of massive amounts of data. However, this modern ubiquity and abundance of data is worth little unless we can efficiently extract from it useful information and insights. In this thesis, we focus on the question of data efficiency, that is, optimizing the amount of data needed to accomplish a statistical task to some given accuracy and confidence guarantees. Drawing on tools from across probability, statistics and theoretical computer science, we propose optimally data-efficient algorithms for two basic estimation problems. The problems and their solutions, defined in distinct data-access models, highlight and give techniques to overcome important data-collection and data-utilization challenges faced by algorithm designers in the modern era. The first problem is a classic and fundamental problem in statistics: what is the best way to estimate the mean of a real-valued distribution from independent samples? Under the minimal and essentially necessary assumption that the distribution has finite (but unknown) variance, we settle the problem by presenting an estimator with convergence tight to within a 1+o(1) multiplicative factor. This contrasts previous works that are either only tight up to multiplicative constants, or require strong additional assumptions such as knowledge of the variance, or a bounded 4th moment (kurtosis) assumption. Our estimator construction and analysis gives a generalizable framework, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter psi-estimator, and constructing bounds using mathematical programming techniques. The second problem is a coin-flipping problem motivated by crowdsourcing applications. Given a mixture between two populations of coins, "positive" coins that each have---unknown and potentially different---bias >= 1/2+Delta and "negative" coins with bias <= 1/2-Delta, we consider the task of estimating the fraction of positive coins to within a given accuracy through drawing coins from the mixture and flipping them. We give an adaptive algorithm and a fully-adaptive lower bound with matching sample complexity, simultaneously tight in all relevant problem parameters, up to a multiplicative constant. The fine-grained adaptive flavor of both our algorithm and lower bound contrasts with much previous work in distributional testing and learning.
Notes:
Thesis (Ph. D.)--Brown University, 2021

Citation

Lee, Chun Hin Jasper, "Classic and Modern Challenges in Statistical Estimation" (2021). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://repository.library.brown.edu/studio/item/bdr:tngkk2fu/

Relations

Collection: