TY - GEN
T1 - Use it or lose it
T2 - 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013
AU - Kim, Hyungjun
AU - Vitkovskiy, Arseniy
AU - Gratz, Paul V.
AU - Soteriou, Vassos
PY - 2013
Y1 - 2013
N2 - Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.
AB - Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.
KW - hot carrier injection (HCI)
KW - lifetime
KW - negative bias temperature instability (NBTI)
KW - network-on-chip
KW - reliability
KW - wearout
UR - http://www.scopus.com/inward/record.url?scp=84892513006&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84892513006&partnerID=8YFLogxK
U2 - 10.1145/2540708.2540721
DO - 10.1145/2540708.2540721
M3 - Conference contribution
AN - SCOPUS:84892513006
SN - 9781450326384
T3 - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SP - 136
EP - 147
BT - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Y2 - 7 December 2013 through 11 December 2013
ER -