Dependable Deep Learning: Towards Cost-Efficient Resilience of Deep Neural Network Accelerators against Soft Errors and Permanent Faults

Muhammad Abdullah Hanif, Muhammad Shafique

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep Learning has enabled machines to learn computational models (i.e., Deep Neural Networks-DNNs) that can perform certain complex tasks with claims to be close to human-level precision. This state-of-the-art performance offered by DNNs in many Artificial Intelligence (AI) applications has paved their way to being used in several safety-critical applications where even a single failure can lead to catastrophic results. Therefore, improving the robustness of these models to hardware-induced faults (such as soft errors, aging, and manufacturing defects) is of significant importance to avoid any disastrous event. Traditional redundancy-based fault mitigation techniques cannot be employed in a wide of applications due to their high overheads, which, when coupled with the compute-intensive nature of DNNs, lead to undesirable resource consumption. In this article, we present an overview of different low-cost fault-mitigation techniques that exploit the intrinsic characteristics of DNNs to limit their overheads. We discuss how each technique can contribute to the overall resilience of a DNN-based system, and how they can be integrated together to offer resilience against multiple diverse hardware-induced reliability threats. Towards the end, we highlight several key future directions that are envisioned to help in achieving highly dependable DL-based systems.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728181875
DOIs
StatePublished - Jul 2020
Event26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020 - Virtual, Online, Italy
Duration: Jul 13 2020Jul 16 2020

Publication series

NameProceedings - 2020 26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020

Conference

Conference26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020
CountryItaly
CityVirtual, Online
Period7/13/207/16/20

Keywords

  • Accelerator
  • Aging
  • Architecture
  • Cost
  • Deep Learning
  • Deep Neural Networks
  • Dependability
  • DL
  • DNNs
  • Efficiency
  • Faults
  • Manufacturing Defects
  • Permanent Faults
  • Reliability
  • Resilience
  • Robustness
  • Soft Errors
  • Systems
  • Yield

ASJC Scopus subject areas

  • Hardware and Architecture
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Safety, Risk, Reliability and Quality

Fingerprint Dive into the research topics of 'Dependable Deep Learning: Towards Cost-Efficient Resilience of Deep Neural Network Accelerators against Soft Errors and Permanent Faults'. Together they form a unique fingerprint.

Cite this