TY - GEN
T1 - Hardware and Software Techniques for Heterogeneous Fault-Tolerance
AU - Rehman, Semeen
AU - Kriebel, Florian
AU - Prabakaran, Bharath Srinivas
AU - Khalid, Faiq
AU - Shafique, Muhammad
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported in parts by the German Research Foundation +<=>? # @< J X Z +X\\ ^_`` ' http://spp1500.itec.kit.edu).
Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018/9/26
Y1 - 2018/9/26
N2 - With the advancements in the process technology, fault-tolerance against transient errors has emerged as an important design requirement for computing systems fabricated using nano-scale devices. Traditionally, redundancy-based techniques have been employed to detect and correct errors, and to achieve full system protection. However, as fault masking properties on different system levels have been observed and applications with lower accuracy demands or error-tolerant properties exist, reliability-heterogeneous architectures have recently paved the way for power-efficient dependable systems. In this paper, we will discuss the building blocks of such processors (both embedded and superscalar) with different fault-tolerant modes on the architecture level covering memory components like caches as well as in-order and out-of-order processor designs. We analyze the soft error vulnerability of different components and show how the variations in vulnerabilities can be exploited to improve the performance and power efficiency of such processors. We additionally show that a reliability-driven compiler can be leveraged to realize software-level heterogeneous fault tolerance by generating different reliable application versions with diverse reliability and performance properties.
AB - With the advancements in the process technology, fault-tolerance against transient errors has emerged as an important design requirement for computing systems fabricated using nano-scale devices. Traditionally, redundancy-based techniques have been employed to detect and correct errors, and to achieve full system protection. However, as fault masking properties on different system levels have been observed and applications with lower accuracy demands or error-tolerant properties exist, reliability-heterogeneous architectures have recently paved the way for power-efficient dependable systems. In this paper, we will discuss the building blocks of such processors (both embedded and superscalar) with different fault-tolerant modes on the architecture level covering memory components like caches as well as in-order and out-of-order processor designs. We analyze the soft error vulnerability of different components and show how the variations in vulnerabilities can be exploited to improve the performance and power efficiency of such processors. We additionally show that a reliability-driven compiler can be leveraged to realize software-level heterogeneous fault tolerance by generating different reliable application versions with diverse reliability and performance properties.
KW - caches
KW - compilers
KW - dark silicon
KW - fault-tolerance
KW - hardware hardening
KW - heterogeneity
KW - reliability
KW - superscalar processors
UR - http://www.scopus.com/inward/record.url?scp=85055818879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055818879&partnerID=8YFLogxK
U2 - 10.1109/IOLTS.2018.8474219
DO - 10.1109/IOLTS.2018.8474219
M3 - Conference contribution
AN - SCOPUS:85055818879
T3 - 2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
SP - 115
EP - 118
BT - 2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
A2 - Maniatakos, Mihalis
A2 - Alexandrescu, Dan
A2 - Gizopoulos, Dimitris
A2 - Papavramidou, Panagiota
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
Y2 - 2 July 2018 through 4 July 2018
ER -