Despite the fact that the human genome was sequenced ten years ago, there exists no database of cis-regulatory architecture that is validated conclusively by rigorous …
This dissertation describes three primary contributions to the field of medical imaging: (1) a mathematical model ("Blockhead") of the macroscopic and microscopic structure of the …
The human genome exhibits a rich structure resulting from a long history of genomic changes, including single base-pair mutations and larger scale rearrangements such as …
Variation in genomes occurs in many forms, from single nucleotide changes to gains and losses of entire chromosomes. Large-scale rearrangements, called structural variants (SVs), are …
Online stochastic combinatorial optimization problems are problems in which a decision maker is trying to minimize or maximize an objective by making a sequence of …
Recently, some researchers have attempted to exploit state-aggregation techniques to compute stable distributions of high-dimensional Markov matrices. While these researchers have devised an efficient, recursive …
The confluence of ubiquitous, high-performance networking and increased availability of online information has led to the emergence of a new class of large-scale stream processing …
Peer-to-peer systems have been proposed for a wide variety of applications, such as file-sharing, distributed storage, and distributed computation. These systems seek the benefits of …
There are two primary issues facing database systems designed for on-line transaction processing (OLTP): scalability and performance. Traditional disk-based OLTP architectures are the result of …
Current efforts in syntactic parsing are largely data-driven. These methods require labeled examples of syntactic structures to learn statistical patterns governing these structures. Labeled data …
Vehicle routing is a class of optimization problems where the objective is to find low cost delivery routes from depots to customers using vehicles of …
In correlation clustering, given similarity or dissimilarity information for all pairs of data items, the goal is to find a clustering of the items into …
The performance of multithreaded programs is often difficult to understand and predict. Multiple threads use various locking operations, resulting in the parallel execution of some …
GPUs are increasingly utilized for scientific and general-purpose workloads as cheap and efficient hardware accelerators. Despite the GPU's popularity, many codes ported to the GPU …
We develop statistical methods for analyzing natural images, videos, motion capture (MoCap) sequences, and three-dimensional (3D) representations of articulated objects. Our goal is to discover …
Information retrieval (IR) has become a ubiquitous technology for quickly and easily finding information on a given topic amidst the wealth of digital content available …
The tools and language of combinatorial topology have previously been very successful in characterizing distributed task solvability when processes fail only by crashing. In this …
Data processing frameworks provide application programmers an interface to manipulate and analyze data. This thesis studies a novel parallel stream processing model, designed for workflow-based …
Crowdsourced training data has become a mainstay in computer vision. Some of the most significant discoveries of the last few years were made possible by …
This dissertation shows that qualitative and quantitative characterization of patterned structures in brain connectivity data obtained using diffusion MRI not only improves the exploration of …