Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Integrated Search and Exploration Over Large Multidimensional Data

Description

Abstract:
The need for rich, ad-hoc data analysis is key for pervasive discovery. However, generic and reusable systems tools for interactive search, exploration and mining over large data sets are lacking. Exploring large data sets interactively requires advanced data-driven search techniques that go well beyond the conventional database querying capabilities, whereas state-of-the-art search technologies are not designed and optimized to work for large out-of-core data sets. These requirements force users to roll their own custom solutions, typically by gluing together existing libraries, databases and custom scripts, only to end up with a solution that is difficult to develop, scale, optimize, maintain and reuse. To address these limitations, we propose a tight integration of data management and search technologies. This combination would not only allow users to perform search efficiently, but also offer a single, expressive framework that can support a wide variety of data-intensive search and exploration tasks. As the first step in this direction, we describe a custom search framework called Semantic Windows, which allows users to conveniently perform structured search via shape and content constraints over a large multidimensional data space. As the second step, we describe a general-purpose exploration framework called Searchlight, which allows Constraint Programming (CP) machinery to run efficiently inside a Database Management System (DBMS) without the need to extract, transform and move the data. This marriage concurrently offers the rich expressiveness and efficiency of constraint-based search and optimization provided by modern CP solvers, and the ability of DBMSs to store and query data at scale, resulting in an enriched functionality that can effectively support data- and search-intensive applications. As such, Searchlight is the first system to support generic search, exploration and mining over large multidimensional data collections, going beyond point algorithms designed for point search and mining tasks. Fast, interactive query evaluation is only one of the requirements of effective data-exploration support. Finding the right questions to ask is another notoriously challenging problem, given the users’ lack of familiarity with the structure and contents of the underlying data sets, as well as the inherently fuzzy goals in many exploration-oriented tasks. In the third part of this work, we study the modification of initial query parameters at run-time: we describe how Searchlight can dynamically relax or constrain the parameters of a query, based on its progress, to offer more or fewer results to the user. This feature allows users to iterate over the data sets faster and without having to make accurate guesses on what parameters to use.
Notes:
Thesis (Ph. D.)--Brown University, 2017

Access Conditions

Rights
In Copyright
Restrictions on Use
Collection is open for research.

Citation

Kalinin, Alexander, "Integrated Search and Exploration Over Large Multidimensional Data" (2017). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.7301/Z0HT2MR2

Relations

Collection: