TY - GEN
T1 - DsReliM
T2 - International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2015
AU - Salehi, Mohammad
AU - Shafique, Muhammad
AU - Kriebel, Florian
AU - Rehman, Semeen
AU - Tavana, Mohammad Khavari
AU - Ejlali, Alireza
AU - Henkel, Jorg
N1 - Publisher Copyright:
© 2015 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2015/11/17
Y1 - 2015/11/17
N2 - Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.
AB - Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.
KW - constrained-optimization
KW - Dark silicon
KW - many-core
KW - power-efficiency
KW - process variation
KW - reliability
KW - soft errors
UR - http://www.scopus.com/inward/record.url?scp=84963758782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963758782&partnerID=8YFLogxK
U2 - 10.1109/CODESISSS.2015.7331370
DO - 10.1109/CODESISSS.2015.7331370
M3 - Conference contribution
AN - SCOPUS:84963758782
T3 - 2015 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2015
SP - 75
EP - 82
BT - 2015 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 October 2015 through 9 October 2015
ER -