TY - GEN
T1 - Invited Paper
T2 - 42nd IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2023
AU - Chowdhury, Animesh B.
AU - Thakur, Shailja
AU - Pearce, Hammond
AU - Karri, Ramesh
AU - Garg, Siddharth
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Despite the growing interest in ML-guided EDA tools from RTL to GDSII, there are no standard datasets or prototypical learning tasks defined for the EDA problem domain. Experience from the computer vision community suggests that such datasets are crucial to spur further progress in ML for EDA. Here we describe our experience curating two large-scale, high-quality datasets for Verilog code generation and logic synthesis. The first, VeriGen, is a dataset of Verilog code collected from GitHub and Verilog textbooks. The second, OpenABC-D, is a large-scale, labeled dataset designed to aid ML for logic synthesis tasks. The dataset consists of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs on a large number of open-source hardware projects. In this paper we will discuss challenges in curating, maintaining and growing the size and scale of these datasets. We will also touch upon questions of dataset quality and security, and the use of novel data augmentation tools that are tailored for the hardware domain.
AB - Despite the growing interest in ML-guided EDA tools from RTL to GDSII, there are no standard datasets or prototypical learning tasks defined for the EDA problem domain. Experience from the computer vision community suggests that such datasets are crucial to spur further progress in ML for EDA. Here we describe our experience curating two large-scale, high-quality datasets for Verilog code generation and logic synthesis. The first, VeriGen, is a dataset of Verilog code collected from GitHub and Verilog textbooks. The second, OpenABC-D, is a large-scale, labeled dataset designed to aid ML for logic synthesis tasks. The dataset consists of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs on a large number of open-source hardware projects. In this paper we will discuss challenges in curating, maintaining and growing the size and scale of these datasets. We will also touch upon questions of dataset quality and security, and the use of novel data augmentation tools that are tailored for the hardware domain.
UR - http://www.scopus.com/inward/record.url?scp=85181400241&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181400241&partnerID=8YFLogxK
U2 - 10.1109/ICCAD57390.2023.10323663
DO - 10.1109/ICCAD57390.2023.10323663
M3 - Conference contribution
AN - SCOPUS:85181400241
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2023 42nd IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 October 2023 through 2 November 2023
ER -