Detecting All-to-One Backdoor Attacks in Black-Box DNNs via Differential Robustness to Noise

Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

Research output: Contribution to journalArticlepeer-review

Abstract

The all-to-one (A2O) backdoor attack is one of the major adversarial threats against neural networks. Most existing A2O backdoor defenses operate in a white-box context, necessitating access to the backdoored model's architecture, hidden layer outputs, or internal parameters. The necessity for black-box A2O backdoor defenses arises, particularly in scenarios where only the network's input and output are accessible. However, prevalent black-box A2O backdoor defenses often mandate assumptions regarding the locations of triggers, as they leverage hand-crafted features for detection. In instances where triggers deviate from these assumptions, the resultant hand-crafted features diminish in quality, rendering these methods ineffective. To address this issue, this work proposes a post-training black-box A2O backdoor defense that maintains consistent efficacy regardless of the triggers' locations. Our method hinges on the empirical observation that, in the context of A2O backdoor attacks, poisoned samples are more resilient to uniform noise than clean samples in terms of the network output. Specifically, our approach uses a metric to quantify the resiliency of the given input to the uniform noise. A novelty detector, trained utilizing the quantified resiliency of available clean samples, is deployed to discern whether the given input is poisoned. The novelty detector is evaluated across various triggers. Our approach is effective on all utilized triggers. Lastly, an explanation is provided for our observation.

Original languageEnglish (US)
Pages (from-to)36099-36111
Number of pages13
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Keywords

  • Neural network backdoors
  • novelty detection
  • output resiliency

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Detecting All-to-One Backdoor Attacks in Black-Box DNNs via Differential Robustness to Noise'. Together they form a unique fingerprint.

Cite this