Hardware and Software Techniques for Heterogeneous Fault-Tolerance

Semeen Rehman, Florian Kriebel, Bharath Srinivas Prabakaran, Faiq Khalid, Muhammad Shafique

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the advancements in the process technology, fault-tolerance against transient errors has emerged as an important design requirement for computing systems fabricated using nano-scale devices. Traditionally, redundancy-based techniques have been employed to detect and correct errors, and to achieve full system protection. However, as fault masking properties on different system levels have been observed and applications with lower accuracy demands or error-tolerant properties exist, reliability-heterogeneous architectures have recently paved the way for power-efficient dependable systems. In this paper, we will discuss the building blocks of such processors (both embedded and superscalar) with different fault-tolerant modes on the architecture level covering memory components like caches as well as in-order and out-of-order processor designs. We analyze the soft error vulnerability of different components and show how the variations in vulnerabilities can be exploited to improve the performance and power efficiency of such processors. We additionally show that a reliability-driven compiler can be leveraged to realize software-level heterogeneous fault tolerance by generating different reliable application versions with diverse reliability and performance properties.

Original languageEnglish (US)
Title of host publication2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
EditorsMihalis Maniatakos, Dan Alexandrescu, Dimitris Gizopoulos, Panagiota Papavramidou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages115-118
Number of pages4
ISBN (Electronic)9781538659922
DOIs
StatePublished - Sep 26 2018
Event24th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2018 - Platja D'Aro, Spain
Duration: Jul 2 2018Jul 4 2018

Publication series

Name2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018

Conference

Conference24th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
Country/TerritorySpain
CityPlatja D'Aro
Period7/2/187/4/18

Keywords

  • caches
  • compilers
  • dark silicon
  • fault-tolerance
  • hardware hardening
  • heterogeneity
  • reliability
  • superscalar processors

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Hardware and Software Techniques for Heterogeneous Fault-Tolerance'. Together they form a unique fingerprint.

Cite this