TY - JOUR
T1 - Training Data Poisoning in ML-CAD
T2 - Backdooring DL-Based Lithographic Hotspot Detectors
AU - Liu, Kang
AU - Tan, Benjamin
AU - Karri, Ramesh
AU - Garg, Siddharth
N1 - Funding Information:
ACKNOWLEDGMENT The authors thank G. Reddy, C. Xanthopoulos, and Y. Makris for generously giving them access to the dataset used in our experiments. They were supported in part by Semiconductor Research Corporation (SRC) through task 2709.001.
Funding Information:
Manuscript received March 15, 2020; revised June 21, 2020; accepted August 24, 2020. Date of publication September 18, 2020; date of current version May 20, 2021. The work of Benjamin Tan was supported in part by the Office of Naval Research under Award N00014-18-1-2058. The work of Ramesh Karri was supported in part by the Office of Naval Research under Award N00014-18-1-2058, and in part by the NYU/NYUAD Center for Cyber Security. The work of Siddharth Garg was supported in part by the National Science Foundation CAREER Award under Grant 1553419, and in part by the National Science Foundation under Grant 1801495. This article was recommended by Associate Editor H. Li. (Kang Liu and Benjamin Tan contributed equally to this work.) (Corresponding author: Kang Liu.) The authors are with the Department of Electrical and Computer Engineering, New York University, Brooklyn, NY 11201 USA (e-mail: kang.liu@nyu.edu; benjamin.tan@nyu.edu; rkarri@nyu.edu; siddharth.garg@nyu.edu). Digital Object Identifier 10.1109/TCAD.2020.3024780
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - Recent efforts to enhance computer-aided design (CAD) flows have seen the proliferation of machine learning (ML)-based techniques. However, despite achieving state-of-the-art performance in many domains, techniques, such as deep learning (DL) are susceptible to various adversarial attacks. In this work, we explore the threat posed by training data poisoning attacks where a malicious insider can try to insert backdoors into a deep neural network (DNN) used as part of the CAD flow. Using a case study on lithographic hotspot detection, we explore how an adversary can contaminate training data with specially crafted, yet meaningful, genuinely labeled, and design rule compliant poisoned clips. Our experiments show that very low poisoned/clean data ratio in training data is sufficient to backdoor the DNN; an adversary can 'hide' specific hotspot clips at inference time by including a backdoor trigger shape in the input with 100% success. This attack provides a novel way for adversaries to sabotage and disrupt the distributed design process. After finding that training data poisoning attacks are feasible and stealthy, we explore a potential ensemble defense against possible data contamination, showing promising attack success reduction. Our results raise fundamental questions about the robustness of DL-based systems in CAD, and we provide insights into the implications of these.
AB - Recent efforts to enhance computer-aided design (CAD) flows have seen the proliferation of machine learning (ML)-based techniques. However, despite achieving state-of-the-art performance in many domains, techniques, such as deep learning (DL) are susceptible to various adversarial attacks. In this work, we explore the threat posed by training data poisoning attacks where a malicious insider can try to insert backdoors into a deep neural network (DNN) used as part of the CAD flow. Using a case study on lithographic hotspot detection, we explore how an adversary can contaminate training data with specially crafted, yet meaningful, genuinely labeled, and design rule compliant poisoned clips. Our experiments show that very low poisoned/clean data ratio in training data is sufficient to backdoor the DNN; an adversary can 'hide' specific hotspot clips at inference time by including a backdoor trigger shape in the input with 100% success. This attack provides a novel way for adversaries to sabotage and disrupt the distributed design process. After finding that training data poisoning attacks are feasible and stealthy, we explore a potential ensemble defense against possible data contamination, showing promising attack success reduction. Our results raise fundamental questions about the robustness of DL-based systems in CAD, and we provide insights into the implications of these.
KW - Computer aided design
KW - design for manufacture
KW - machine learning (ML)
KW - robustness
KW - security
UR - http://www.scopus.com/inward/record.url?scp=85091687114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091687114&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2020.3024780
DO - 10.1109/TCAD.2020.3024780
M3 - Article
AN - SCOPUS:85091687114
SN - 0278-0070
VL - 40
SP - 1244
EP - 1257
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 6
M1 - 9200729
ER -