TY - GEN
T1 - Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions
AU - Pearce, Hammond
AU - Ahmad, Baleegh
AU - Tan, Benjamin
AU - Dolan-Gavitt, Brendan
AU - Karri, Ramesh
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described 'AI pair programmer', GitHub Copilot, which is a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk cybersecurity weaknesses, e.g. those from MITRE's 'Top 25' Common Weakness Enumeration (CWE) list. We explore Copilot's performance on three distinct code generation axes - examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.
AB - There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described 'AI pair programmer', GitHub Copilot, which is a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk cybersecurity weaknesses, e.g. those from MITRE's 'Top 25' Common Weakness Enumeration (CWE) list. We explore Copilot's performance on three distinct code generation axes - examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.
KW - Artificial Intelligence (AI)
KW - Common Weakness Enumerations (CWEs)
KW - Cybersecurity
KW - code generation
UR - http://www.scopus.com/inward/record.url?scp=85129071886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129071886&partnerID=8YFLogxK
U2 - 10.1109/SP46214.2022.9833571
DO - 10.1109/SP46214.2022.9833571
M3 - Conference contribution
AN - SCOPUS:85129071886
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 754
EP - 768
BT - Proceedings - 43rd IEEE Symposium on Security and Privacy, SP 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd IEEE Symposium on Security and Privacy, SP 2022
Y2 - 23 May 2022 through 26 May 2022
ER -