TY - JOUR
T1 - A Subspace Projective Clustering Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks
AU - Wang, Yue
AU - Li, Wenqing
AU - Sarkar, Esha
AU - Shafique, Muhammad
AU - Maniatakos, Michail
AU - Jabari, Saif Eddin
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2024
Y1 - 2024
N2 - Backdoor attacks in deep neural networks (DNNs) involve an attacker inserting a backdoor into the network by manipulating the training dataset, which causes misclassification of inputs that contain a specific trigger. Detecting and mitigating such attacks are challenging, as only the attacker knows the trigger and target class. Our study demonstrates that the representations, i.e., the neuron activations for a given DNN, of poisoned and genuine data lie in different subspaces, which implies that there exists a certain subspace where the difference of projections from different data can be manifested. To this end, we propose a method based on subspace projective clustering (SPC), which learns a subspace as well as a projection-based weight vector by solving a projection maximization program, and the optimized weight vector can be utilized in a clustering framework to infer the group of data. Based on our theoretical analysis and experimental results, we demonstrate the effectiveness of our method in defending against backdoor attacks that use different settings of poisoned samples on GTSRB, Imagenet, VGGFace2, and PubFig datasets in comparison with the state-of-the-art methods. Our algorithm can detect more than 90% of the infected classes and identify 95% of the poisoned samples.
AB - Backdoor attacks in deep neural networks (DNNs) involve an attacker inserting a backdoor into the network by manipulating the training dataset, which causes misclassification of inputs that contain a specific trigger. Detecting and mitigating such attacks are challenging, as only the attacker knows the trigger and target class. Our study demonstrates that the representations, i.e., the neuron activations for a given DNN, of poisoned and genuine data lie in different subspaces, which implies that there exists a certain subspace where the difference of projections from different data can be manifested. To this end, we propose a method based on subspace projective clustering (SPC), which learns a subspace as well as a projection-based weight vector by solving a projection maximization program, and the optimized weight vector can be utilized in a clustering framework to infer the group of data. Based on our theoretical analysis and experimental results, we demonstrate the effectiveness of our method in defending against backdoor attacks that use different settings of poisoned samples on GTSRB, Imagenet, VGGFace2, and PubFig datasets in comparison with the state-of-the-art methods. Our algorithm can detect more than 90% of the infected classes and identify 95% of the poisoned samples.
KW - Backdoor attacks
KW - backdoor defense
KW - deep neural networks (DNNs)
KW - machine learning security
KW - optimization
UR - http://www.scopus.com/inward/record.url?scp=85187336134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187336134&partnerID=8YFLogxK
U2 - 10.1109/TAI.2024.3373720
DO - 10.1109/TAI.2024.3373720
M3 - Article
AN - SCOPUS:85187336134
SN - 2691-4581
VL - 5
SP - 3497
EP - 3509
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 7
ER -