TY - JOUR
T1 - Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection
AU - Fu, Hao
AU - Krishnamurthy, Prashanth
AU - Garg, Siddharth
AU - Khorrami, Farshad
N1 - Funding Information:
This work was supported in part by the Army Research Office under Grant W911NF-21-1-0155; and in part by the Center for Artificial Intelligence and Robotics, New York University Abu Dhabi (NYUAD), funded by Tamkeen through the NYUAD Research Institute under Award CG010.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
AB - This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
KW - Neural network backdoors
KW - black-box detection
KW - hand-crafted features
KW - small validation dataset
UR - http://www.scopus.com/inward/record.url?scp=85165260154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85165260154&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2023.3297056
DO - 10.1109/TIFS.2023.3297056
M3 - Article
AN - SCOPUS:85165260154
SN - 1556-6013
VL - 18
SP - 4668
EP - 4680
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
ER -