TY - GEN
T1 - Benchmarking Large Language Models for Automated Verilog RTL Code Generation
AU - Thakur, Shailja
AU - Ahmad, Baleegh
AU - Fan, Zhenxing
AU - Pearce, Hammond
AU - Tan, Benjamin
AU - Karri, Ramesh
AU - Dolan-Gavitt, Brendan
AU - Garg, Siddharth
N1 - Publisher Copyright:
© 2023 EDAA.
PY - 2023
Y1 - 2023
N2 - Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). We release our training/evaluation scripts and LLM checkpoints as open source contributions.
AB - Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). We release our training/evaluation scripts and LLM checkpoints as open source contributions.
KW - GPT
KW - LLM
KW - Transformers
KW - Verilog
UR - http://www.scopus.com/inward/record.url?scp=85162707713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162707713&partnerID=8YFLogxK
U2 - 10.23919/DATE56975.2023.10137086
DO - 10.23919/DATE56975.2023.10137086
M3 - Conference contribution
AN - SCOPUS:85162707713
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023
Y2 - 17 April 2023 through 19 April 2023
ER -