Power is one of the key challenges for improving the performance of modern CPU-GPU processors. Research efforts are needed at both design-time and run-time of processor to improve its power efficiency (Performance/Watt). To improve the run-time power management, accurate measurement based power models are needed. Further, the power efficiency of CPU-GPU processors for different workloads depends on the type of device they run on and the run-time conditions of the system [e.g., thermal design power (TDP) and existence of other workloads]. So, an online workload characterization and mapping method is needed. Furthermore, for future massively parallel processors, the low power techniques, like power gating (PG) should be evaluated for their potential benefits before going through the cost of implementing them. This thesis makes the following contributions towards improving the performance and power efficiency of CPU-GPU processors. First, we propose new techniques for post-silicon power mapping and modeling of multi-core processors using infrared imaging and performance counter measurements. Using detailed thermal and power maps, we demonstrate that in contrast to traditional multi-core CPUs heterogeneous processors exhibit higher intertwined behavior for dynamic voltage and frequency scaling (DVFS) and workload scheduling, in terms of their effect on performance, power and temperature. Second, we propose a framework to map workloads on appropriate device of CPU-GPU processors under different static and time-varying workload/system conditions. We implement the scheduler on a real CPU-GPU processor, and using OpenCL benchmarks, we demonstrate up to 24% runtime improvement and 10% energy savings compared to the state-of-the-art scheduling techniques. Third, to improve the performance and power efficiency of future massively parallel GPUs, we provide an integrated solution to manage leakage power by incorporating workload/run-time-awareness into the PG design methodology. On a hypothetical future GPU with 192 compute units, our results show that a PG granularity of 16 CU per cluster achieves 99% peak run-time performance without the excessive 53% design-time area overhead of per-CU PG. Further, we demonstrate that the incorporation of design-awareness into the run-time power management can maximize the benefits of power gating, and improve the overall power efficiency of future processors by additional 5%.
Dev, Kapil,
"New Techniques for Power-Efficient CPU-GPU Processors"
(2016).
Electrical Sciences and Computer Engineering Theses and Dissertations.
Brown Digital Repository. Brown University Library.
https://doi.org/10.7301/Z0BV7F18