Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Query Processing for Data Analytics on Modern Multicore Systems

Description

Abstract:
Within the last decade, databases have undergone a major shift in designs largely due to two important hardware trends. While the increases in main-memory capacities made it possible to hold even large systems in the RAM space, the advent of multi-core processors created opportunities for a multitude of parallel query processing techniques. As a result, we have seen the advent of massively parallel, main-memory data management system designs that leverage these new hardware platforms. Currently, originating from physical limitations known as the dark silicon effect, we witness the evolution of modern processors towards heterogeneous designs, where the vendors replace some general purpose cores with high-performance, energy-efficient specialized compute-units. Motivated by these trends, we propose novel query processing techniques that target these modern processing environments for data analytics workloads. We first focus on result-reuse techniques for main-memory DBMSs. While existing reuse approaches require heavy-weight materialization operations, our novel reuse techniques cache internal data structures, in particular hash tables, created by query operators and make them directly reusable for downstream processing with little or no additional overhead. We implement a prototype called HashStash to confirm the feasibility of our approach, and demonstrate significant performance gains for typical analytical workloads. Then, we study novel query processing techniques for main-memory DBMSs operating on top of emerging heterogeneous compute platforms. These platforms host compute units with varying performance, functionality and execution characteristics, thus create new challenges for efficient query processing solutions. To target these systems, we propose SiliconDB, a new query processing approach that uses a fine-grained, adaptive workload execution model. SiliconDB splits queries into small chunks of work units, and uses queuing theory to dynamically assign these work elements to available compute units to maximize overall resource utilization. As the final component of this dissertation, we extend SiliconDB's relational query engine to directly operate on raw input columns without stalling on an initial data loading pipeline. We implement our approach using an FPGA-based heterogeneous multi-core platform and show how we leverage the salient properties of this environment to speed up processing.
Notes:
Thesis (Ph. D.)--Brown University, 2021

Citation

Dursun, Kayhan, "Query Processing for Data Analytics on Modern Multicore Systems" (2021). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://repository.library.brown.edu/studio/item/bdr:tff65psv/

Relations

Collection: