TY - GEN
T1 - Benchmarking Large Language Models for Automated Verilog RTL Code Generation
AU - Thakur, Shailja
AU - Ahmad, Baleegh
AU - Fan, Zhenxing
AU - Pearce, Hammond
AU - Tan, Benjamin
AU - Karri, Ramesh
AU - Dolan-Gavitt, Brendan
AU - Garg, Siddharth
N1 - Funding Information:
ACKNOWLEDGEMENTS This research work was supported in part by NSF Award 1553419, NSF Award 1646671, NSF Award 2039607, and ARO Award 77191NC. The opinions, findings, and conclusions, or recommendations expressed are those of the author(s) and do not necessarily reflect the views of any sponsors.
Publisher Copyright:
© 2023 EDAA.
PY - 2023
Y1 - 2023
N2 - Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). We release our training/evaluation scripts and LLM checkpoints as open source contributions.
AB - Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). We release our training/evaluation scripts and LLM checkpoints as open source contributions.
KW - GPT
KW - LLM
KW - Transformers
KW - Verilog
UR - http://www.scopus.com/inward/record.url?scp=85162707713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162707713&partnerID=8YFLogxK
U2 - 10.23919/DATE56975.2023.10137086
DO - 10.23919/DATE56975.2023.10137086
M3 - Conference contribution
AN - SCOPUS:85162707713
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023
Y2 - 17 April 2023 through 19 April 2023
ER -