TY - GEN
T1 - Fine-pruning
T2 - 21st International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2018
AU - Liu, Kang
AU - Dolan-Gavitt, Brendan
AU - Garg, Siddharth
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
AB - Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
KW - Backdoor
KW - Deep learning
KW - Fine-tuning
KW - Pruning
KW - Trojan
UR - http://www.scopus.com/inward/record.url?scp=85053888479&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053888479&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-00470-5_13
DO - 10.1007/978-3-030-00470-5_13
M3 - Conference contribution
AN - SCOPUS:85053888479
SN - 9783030004699
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 273
EP - 294
BT - Research in Attacks, Intrusions, and Defenses - 21st International Symposium, RAID 2018, Proceedings
A2 - Bailey, Michael
A2 - Ioannidis, Sotiris
A2 - Stamatogiannakis, Manolis
A2 - Holz, Thorsten
PB - Springer Verlag
Y2 - 10 September 2018 through 12 September 2018
ER -