Towards Scalable Lifetime Reliability Management for Dark Silicon Manycore Systems

Vijeta Rathore, Vivek Chaturvedi, Amit K. Singh, Thambipillai Srikanthan, Muhammad Shafique

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Aggressive technology scaling enabled very high integration density. Unfortunately, it also led to issues such as process variation, increased power density and consequently rising chip temperature resulting in accelerated device aging and poor lifetime reliability of different components in a manycore system. Moreover, thermal and power limitations let only a fraction of the chip function at full speed; the rest is the dark silicon. Most of the lifetime reliability enhancement solutions for the multi-/manycore systems in the literature are heuristic-based, while some use standard compute-intensive methods to solve the optimization problem making them not scale well with the manycore size. The heuristic-based solutions are formulated to search through the design space of a fine granularity making it huge, limiting their scalability. Also, these approaches do not account for the impact of different applications' execution behavior on the aging of the underlying cores, and their performance requirement distribution across the cores to their advantage. In this paper, we present our resource management strategies towards building scalable lifetime reliability enhancement solutions for dark silicon manycore systems. The first technique, Hierarchical Mapping approach (HiMap), maps a periodic workload employing a block-based hierarchical method that leverages dark cores for thermal mitigation. The second approach, LifeGuard, uses reinforcement learning to learn the applications' aging behavior, and is aware of the performance requirement pattern onto the core frequencies. It maps randomly arriving requests and is scalable to the number of applications and the size of a manycore.

Original languageEnglish (US)
Title of host publication2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design, IOLTS 2019
EditorsDimitris Gizopoulos, Dan Alexandrescu, Panagiota Papavramidou, Michail Maniatakos
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages204-207
Number of pages4
ISBN (Electronic)9781728124902
DOIs
StatePublished - Jul 2019
Event25th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2019 - Rhodes, Greece
Duration: Jul 1 2019Jul 3 2019

Publication series

Name2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design, IOLTS 2019

Conference

Conference25th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2019
CountryGreece
CityRhodes
Period7/1/197/3/19

Keywords

  • aging
  • hierarchical
  • lifetime reliability
  • manycore systems
  • mapping
  • reinforcement learning.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality

Fingerprint Dive into the research topics of 'Towards Scalable Lifetime Reliability Management for Dark Silicon Manycore Systems'. Together they form a unique fingerprint.

Cite this