TY - JOUR
T1 - Application and thermal-reliability-aware reinforcement learning based multi-core power management
AU - Dinakarrao, Sai Manoj Pudukotai
AU - Joseph, Arun
AU - Haridass, Anand
AU - Shafique, Muhammad
AU - Henkel, Jörg
AU - Homayoun, Houman
N1 - Funding Information:
Coauthor Dr. Shafique’s contributions in this work are supported in part by the German Research Foundation (DFG) as part of the GetSURE project in the scope of SPP-1500 priority program “Dependable Embedded Systems.” Authors’ addresses: P. D. Sai Manoj, George Mason University, 4400 Patriot Circle, Fairfax, VA, 22030; email: [email protected]; A. Joseph and A. Haridass, IBM Systems, Bannerghatta Rd, Bangalore, Karnataka, India; emails: {arujosep, anharida}@in.ibm.com; M. Shafique, Vienna University of Technology, Institute of Computer Engineering, Embedded Computing Systems, Treitlstraße 3, 1040 Wien, Österreich; email: [email protected]; J. Henkel, Haid-und-Neu-Str. 7, Bldg. 07.21, 76131 Karlsruhe, Germany; email: [email protected]; H. Homayoun, University of California, Davis, 1 Shields Ave, Davis, CA, 95616; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2019 Association for Computing Machinery. 1550-4832/2019/10-ART33 $15.00 https://doi.org/10.1145/3323055
Publisher Copyright:
© 2019 Association for Computing Machinery.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2019/10
Y1 - 2019/10
N2 - Power management through dynamic voltage and frequency scaling (DVFS) is one of the mostwidely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density impacts the thermal reliability of the chip, sometimes leading to permanent failure. To balance both application- and thermal-reliability along with achieving power savings and maintaining performance, we propose application- and thermal-reliability-aware reinforcement learning-based multi-core power management in this work. The proposed power management scheme employs a reinforcement learner to consider the power savings and variations in the application and thermal reliability caused by DVFS. To overcome the computational overhead, the power management decisions are determined at the application-level rather than per-core or system-level granularity. Experimental evaluation of proposed multi-core power management on a microprocessor with up to 32 cores, running PARSEC applications, was done to demonstrate the applicability and efficiency of the proposed technique. Compared to the existing state-of-the-art techniques, the proposed technique enables an average energy savings of up to ~20%, up to 4.926 °C temperature reduction without degradation in the application- and thermal-reliability.
AB - Power management through dynamic voltage and frequency scaling (DVFS) is one of the mostwidely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density impacts the thermal reliability of the chip, sometimes leading to permanent failure. To balance both application- and thermal-reliability along with achieving power savings and maintaining performance, we propose application- and thermal-reliability-aware reinforcement learning-based multi-core power management in this work. The proposed power management scheme employs a reinforcement learner to consider the power savings and variations in the application and thermal reliability caused by DVFS. To overcome the computational overhead, the power management decisions are determined at the application-level rather than per-core or system-level granularity. Experimental evaluation of proposed multi-core power management on a microprocessor with up to 32 cores, running PARSEC applications, was done to demonstrate the applicability and efficiency of the proposed technique. Compared to the existing state-of-the-art techniques, the proposed technique enables an average energy savings of up to ~20%, up to 4.926 °C temperature reduction without degradation in the application- and thermal-reliability.
KW - Application reliability
KW - DVFS
KW - Multi-core processor
KW - Power management
KW - Reinforcement learning
KW - Thermal reliability
UR - http://www.scopus.com/inward/record.url?scp=85075590930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075590930&partnerID=8YFLogxK
U2 - 10.1145/3323055
DO - 10.1145/3323055
M3 - Article
AN - SCOPUS:85075590930
SN - 1550-4832
VL - 15
JO - ACM Journal on Emerging Technologies in Computing Systems
JF - ACM Journal on Emerging Technologies in Computing Systems
IS - 4
M1 - 33
ER -