Reliability-aware adaptations for shared last-level caches in multi-cores

Florian Kriebel, Semeen Rehman, Arun Subramaniyan, Segnon Jean Bruno Ahandagbe, Muhammad Shafique, Jörg Henkel

Research output: Contribution to journalArticlepeer-review

Abstract

On account of their large footprint, on-chip last-level caches in multi-core systems are one of the most vulnerable components to soft errors. However, vulnerability to soft errors highly depends on the configuration and parameters of the last-level cache, especially when executing different applications concurrently. In this article we propose a novel reliability-aware reconfigurable last-level cache architecture (R2Cache) and cache vulnerability model for multi-cores. R2Cache supports various reliability-wise efficient cache configurations (i.e., cache parameter selection and cache partitioning) for different concurrently executing applications. The proposed vulnerability model takes into account the vulnerability of both the data and tag arrays as well as the active cache area for applications in different execution phases. To enable runtime adaptations, we introduce a lightweight online vulnerability predictor that exploits the knowledge of performance metrics like number of L2 misses to accurately estimate the cache vulnerability to soft errors. Based on the predicted vulnerabilities of different concurrently executing applications in the current execution epoch, our runtime reliability manager reconfigures the cache such that, for the next execution epoch, the total vulnerability for all concurrently executing applications is minimized under user-provided tolerable performance/energy overheads. In scenarios where single-bit error correction for cache lines may be afforded, vulnerability-aware reconfigurations can be leveraged to increase the reliability of the last-level cache against multi-bit errors. Compared to state-of-the-art vulnerability-minimizing and reconfigurable caches, the proposed architecture provides 35.27% and 23.42% vulnerability savings, respectively, when averaged across numerous experiments, while reducing the vulnerability by more than 65% and 60%, respectively, for selected applications and application phases.

Original languageEnglish (US)
Article number67
JournalACM Transactions on Embedded Computing Systems
Volume15
Issue number4
DOIs
StatePublished - Aug 2016

Keywords

  • Cache
  • Energy
  • Modeling
  • Multi-cores
  • Optimization
  • Performance
  • Reliability
  • Soft errors
  • Vulnerability

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Reliability-aware adaptations for shared last-level caches in multi-cores'. Together they form a unique fingerprint.

Cite this