TY - GEN
T1 - Detecting Backdoor Attacks in Black-Box Neural Networks through Hardware Performance Counters
AU - Alam, Manaar
AU - Wang, Yue
AU - Maniatakos, Michail
N1 - Publisher Copyright:
© 2024 EDAA.
PY - 2024
Y1 - 2024
N2 - Deep Neural Networks (DNNs) have made significant strides, but their susceptibility to backdoor attacks still remains a concern. Most defenses typically assume access to white-box models or poisoned data, requirements that are often not feasible in practice, especially for proprietary DNNs. Existing defenses in a black-box setting usually rely on confidence scores of DNN's predictions. However, this exposes DNNs to the risk of model stealing attacks, a significant concern for proprietary DNNs. In this paper, we introduce a novel strategy for detecting back-doors, focusing on a more realistic black-box scenario where only hard-label (i.e., without any prediction confidence) query access is available. Our strategy utilizes data flow dynamics in a computational environment during DNN inference to identify potential backdoor inputs and is agnostic of trigger types or their locations in the input. We observe that a clean image and its corresponding backdoor counterpart with a trigger induce distinct patterns across various microarchitectural activities during the inference phase. We exploit these variations captured by Hardware Performance Counters (HPCs) and use principles of the Gaussian Mixture Model to detect backdoor inputs. To the best of our knowledge, this is the first work that utilizes HPCs for detecting backdoors in DNNs. Extensive evaluation considering a range of benchmark datasets, DNN architectures, and trigger patterns shows the efficacy of the proposed method in distinguishing between clean and backdoor inputs using HPCs.
AB - Deep Neural Networks (DNNs) have made significant strides, but their susceptibility to backdoor attacks still remains a concern. Most defenses typically assume access to white-box models or poisoned data, requirements that are often not feasible in practice, especially for proprietary DNNs. Existing defenses in a black-box setting usually rely on confidence scores of DNN's predictions. However, this exposes DNNs to the risk of model stealing attacks, a significant concern for proprietary DNNs. In this paper, we introduce a novel strategy for detecting back-doors, focusing on a more realistic black-box scenario where only hard-label (i.e., without any prediction confidence) query access is available. Our strategy utilizes data flow dynamics in a computational environment during DNN inference to identify potential backdoor inputs and is agnostic of trigger types or their locations in the input. We observe that a clean image and its corresponding backdoor counterpart with a trigger induce distinct patterns across various microarchitectural activities during the inference phase. We exploit these variations captured by Hardware Performance Counters (HPCs) and use principles of the Gaussian Mixture Model to detect backdoor inputs. To the best of our knowledge, this is the first work that utilizes HPCs for detecting backdoors in DNNs. Extensive evaluation considering a range of benchmark datasets, DNN architectures, and trigger patterns shows the efficacy of the proposed method in distinguishing between clean and backdoor inputs using HPCs.
KW - Backdoor Attacks
KW - Hardware Performance Counters
KW - Neural Networks
KW - Side-Channel Information
UR - http://www.scopus.com/inward/record.url?scp=85196484989&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196484989&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85196484989
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024
Y2 - 25 March 2024 through 27 March 2024
ER -