Abstract
This chapter considers backdoor attacks on deep neural networks and discusses two defense approaches against such attacks. One approach aims to remove backdoors. Specifically, an attacker imitator function is found by solving an optimization problem. The attacker imitator function converts clean samples into samples that are functionally similar to poisoned samples. Then, the backdoors are removed by making the neural network not sensitive to the samples generated by the attacker imitator function. The other method aims to identify poisoned inputs and reject the corresponding outputs from the backdoored network. In this method, two off-line novelty detection models are first trained to collect samples that are potentially poisoned. Then, a binary classifier is trained with the collected samples and clean validation samples. The binary classifier detects on-line poisoned samples with high accuracy. A wide range of illustrative examples with various types of triggers is considered, such as invisible triggers, triggers with real-world meaning, and dynamic triggers. The chapter ends with a discussion of potential benign applications of the backdoor phenomena and a discussion of potential future directions for study on backdoor attacks and defenses.
Original language | English (US) |
---|---|
Title of host publication | Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing |
Subtitle of host publication | Use Cases and Emerging Challenges |
Publisher | Springer Nature |
Pages | 395-431 |
Number of pages | 37 |
ISBN (Electronic) | 9783031406775 |
ISBN (Print) | 9783031406768 |
DOIs | |
State | Published - Jan 1 2023 |
Keywords
- Backdoor attacks
- Backdoor detection
- Deep neural networks
- Novelty detection
- On-line retraining
- Reverse-engineering defense
ASJC Scopus subject areas
- General Computer Science
- General Engineering
- General Social Sciences