TY - GEN
T1 - Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator
AU - Zhang, Jeff
AU - Gu, Tianyu
AU - Basu, Kanad
AU - Garg, Siddharth
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/29
Y1 - 2018/5/29
N2 - Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006%). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50%, with negligible drop in classification accuracy (as low as 0.1%) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.
AB - Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006%). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50%, with negligible drop in classification accuracy (as low as 0.1%) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.
UR - http://www.scopus.com/inward/record.url?scp=85048375978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048375978&partnerID=8YFLogxK
U2 - 10.1109/VTS.2018.8368656
DO - 10.1109/VTS.2018.8368656
M3 - Conference contribution
AN - SCOPUS:85048375978
T3 - Proceedings of the IEEE VLSI Test Symposium
SP - 1
EP - 6
BT - Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018
PB - IEEE Computer Society
T2 - 36th IEEE VLSI Test Symposium, VTS 2018
Y2 - 22 April 2018 through 25 April 2018
ER -