Hardware-Software Co-design of Resource-Efficient Deep Neural Networks

Tann, Hokchhay

Full Metadata

Overview

Year:: 2019
Contributor:: Tann, Hokchhay (creator); Reda, Sherief (Advisor); Rosenstein, Jacob (Reader); Fonseca, Rodrigo (Reader); Brown University. School of Engineering (sponsor)
Genre:: theses
Subject:: Computer architecture; Approximate Computing; Low-Power Design; Neural networks (Computer science); Deep Learning; Iris Recognition
Extent:: xix, 157 p.
DOI: https://doi.org/10.26300/85c0-xf09

Files

Description

Abstract:: The unprecedented success of deep learning technology has elevated the state-of-the-art accuracy performance in many application domains such as computer vision and voice recognition. At the same time, typical Deep Neural Network (DNN) models used in deep learning contain hundreds of millions of parameters and require billions of expensive floating-point operations to process each input. The large storage and computational overheads severely limit DNN's applicability on resource-constrained systems such as mobile and embedded platforms. Recently, a large number of resource optimization techniques and dedicated hardware architectures have been proposed to alleviate these overheads. The principal observation enabling such optimization approaches stems from the inherent error-resilient property of DNNs, where approximation-induced accuracy loss can be potentially recovered through retraining or finetuning. In addition, applications deploying DNNs in their processing pipeline tend to be resilient to small inaccuracies in the output produced by DNNs. With the growing importance of the field of machine learning and the increasing number of embedded systems, the success of DNN approximation techniques would be critical to enable resource-efficient operations. This thesis makes several contributions toward advancing the progress of DNN inference on embedded platforms. First, we introduce design methodologies to reduce the hardware complexities of DNN models and propose light-weight approximate accelerators that can efficiently process these models. Our methodologies include analysis and novel training algorithms for a spectrum of data precisions ranging from fixed-point, dynamic fixed point, powers-of-two to binary data precision for both the weights and activations of the models. We demonstrate custom hardware accelerator designs for the various data precisions which achieve low-power and low-latency while incurring insignificant accuracy degradation. To boost the accuracy of the proposed light accelerators, we describe ensemble processing techniques that use an ensemble of light-weight DNN accelerators to achieve the same or better accuracy than the original floating-point accelerator. We also introduce two flexible runtime strategies, which enable significant savings in DNN inference latency. Our methodologies are flexible in that they allow for dynamic adaptation between the quality of results (QoR) and execution runtime. First, we present a novel dynamic configuration technique that permits adjustments in the number of channels in the network depending on response time, power, and accuracy targets. Our second runtime technique enables flexible inference for DNNs ensembles, which is a popular and effective method to boost the inference accuracy. Next, we showcase our DNN design methodologies using an end-to-end iris recognition application. Here, we propose a resource-efficient end-to-end iris recognition flow, which consists of FCN-based segmentation, contour fitting, followed by Daugman normalization and encoding. To obtain accurate and efficient FCN architectures, we introduce a SW/HW co-design methodology, where we propose multiple novel FCN models. Incorporating each model into the end-to-end flow, we show that the recognition rates of our end-to-end pipelines outperform the previous state-of-the-art on the two datasets evaluated. To further simplify the models for efficient inference, we quantize the weights and activations of the models to dynamic fixed-point (DFP) format and propose a DFP accelerator. We realize our HW/SW co-design pipeline on an embedded FPGA platform. Finally, we extend our work to emerging computing paradigms for machine learning by introducing a novel methodology for a chemical-based single-layer neural network. We propose a parallel encoding scheme which simultaneously represents multiple bits in microliter-sized chemical mixtures. While the demonstration is still limited in scale, we consider this as a first step to building computing systems that can complement electronic systems for applications in ultra-low-power systems and extreme environments.
Notes:: Thesis (Ph. D.)--Brown University, 2019

Content

Access Conditions

Rights: In Copyright
Restrictions on Use: Collection is open for research.

Citation

Tann, Hokchhay, "Hardware-Software Co-design of Resource-Efficient Deep Neural Networks" (2019). Engineering Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.26300/85c0-xf09

Relations

Collection:

Engineering Theses and Dissertations

Theses and Dissertations for the Engineering department.
...