Application and thermal-reliability-aware reinforcement learning based multi-core power management

Sai Manoj Pudukotai Dinakarrao, Arun Joseph, Anand Haridass, Muhammad Shafique, Jörg Henkel, Houman Homayoun

Research output: Contribution to journalArticlepeer-review

Abstract

Power management through dynamic voltage and frequency scaling (DVFS) is one of the mostwidely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density impacts the thermal reliability of the chip, sometimes leading to permanent failure. To balance both application- and thermal-reliability along with achieving power savings and maintaining performance, we propose application- and thermal-reliability-aware reinforcement learning-based multi-core power management in this work. The proposed power management scheme employs a reinforcement learner to consider the power savings and variations in the application and thermal reliability caused by DVFS. To overcome the computational overhead, the power management decisions are determined at the application-level rather than per-core or system-level granularity. Experimental evaluation of proposed multi-core power management on a microprocessor with up to 32 cores, running PARSEC applications, was done to demonstrate the applicability and efficiency of the proposed technique. Compared to the existing state-of-the-art techniques, the proposed technique enables an average energy savings of up to ~20%, up to 4.926 °C temperature reduction without degradation in the application- and thermal-reliability.

Original languageEnglish (US)
Article number33
JournalACM Journal on Emerging Technologies in Computing Systems
Volume15
Issue number4
DOIs
StatePublished - Oct 2019

Keywords

  • Application reliability
  • DVFS
  • Multi-core processor
  • Power management
  • Reinforcement learning
  • Thermal reliability

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Application and thermal-reliability-aware reinforcement learning based multi-core power management'. Together they form a unique fingerprint.

Cite this