SBanTEM: A Novel Methodology for Sparse Band Tensors as Soft-Error Mitigation in Sparse Convolutional Neural Networks

Alessio Colucci, Andreas Steininger, Muhammad Shafique

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Over the last two decades, Convolutional Neural Networks (CNNs) have become common in a wide variety of tasks, including safety-critical ones such as autonomous driving, leading to optimizations such as Sparse Convolutional Neural Networks (SparseCNNs). Scaling technologica nodes has led to an exponential increase in transient faults affecting the systems, generating critical soft errors. We introduce SBanTEM a novel methodology for employing sparse band tensors as soft-error mitigation in SparseCNNs. SBanTEM includes a novel mitigation technique, employing band tensors, as they do not require using indices for storing data. We employ progressive reduction of the bandwidth of the selected tensors, allowing the network to train in-between successive prunings, and compensat accuracy loss. Additionally, we implement a Genetic Algorithm (GA) to optimally select the tensors bandwidths in the network. We analyze the resilience of many state-of-the-art CNNs on multiple datasets, showin that resilience is much lower for SparseCNNs, and using SBanTEM makes them as resilient as standard CNNs. SBanTEM's code and result is available at github.com/Alexei95/SBanTEM to boost reproducibility and reusability of the implementation.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE 30th International Symposium on On-line Testing and Robust System Design, IOLTS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350370553
DOIs
StatePublished - 2024
Event30th IEEE International Symposium on On-line Testing and Robust System Design, IOLTS 2024 - Rennes, France
Duration: Jul 3 2024Jul 5 2024

Publication series

NameProceedings - 2024 IEEE 30th International Symposium on On-line Testing and Robust System Design, IOLTS 2024

Conference

Conference30th IEEE International Symposium on On-line Testing and Robust System Design, IOLTS 2024
Country/TerritoryFrance
CityRennes
Period7/3/247/5/24

Keywords

  • band matrix
  • deep neura networks
  • fault injection
  • fault tolerance
  • pruning
  • resilience
  • sparse

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Safety, Risk, Reliability and Quality
  • Artificial Intelligence
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'SBanTEM: A Novel Methodology for Sparse Band Tensors as Soft-Error Mitigation in Sparse Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this