Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3670474.3685948acmconferencesArticle/Chapter ViewAbstractPublication PagesmlcadConference Proceedingsconference-collections
research-article

PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

Published: 09 September 2024 Publication History

Abstract

Embedding hardware design frameworks within Python is a promising technique to improve the productivity of hardware engineers. At the same time, there is significant interest in using large-language models (LLMs) to improve key chip design tasks. This paper describes PyHDL-Eval, a new framework for evaluating LLMs on specification-to-RTL tasks in the context of Python-embedded domain-specific languages (DSLs). The framework includes 168 problems, Verilog reference solutions, Verilog test benches, Python test scripts, and workflow orchestration scripts. We use the framework to conduct a detailed case study comparing five LLMs (CodeGemma 7B, Llama3 8B/70B, GPT4, and GPT4 Turbo) targeting Verilog and five Python-embedded DSLs (PyMTL3, PyRTL, MyHDL, Migen, and Amaranth). Our results demonstrate the promise of in-context learning when applied to smaller models (e.g., pass rate for CodeGemma 7B improves from 14.9% to 32.7% on Verilog) and Python-embedded DSLs (e.g., pass rate for LLama3 70B improves from 0.6% to 33.0% on PyMTL3). We find LLMs perform better when targeting Verilog as compared to Python-embedded DSLs (e.g., pass rate for GPT4 Turbo is 72.2% on Verilog and 29.8-62.0% on the Python-embedded DSLs) despite using a popular general-purpose host language. PyHDL-Eval will serve as a useful framework for future research at the intersection of Python-embedded DSLs and LLMs.

References

[1]
A. Allam and M. Shalan. RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects. Computing Research Repository (CoRR), arXiv:2405.17378, May 2024.
[2]
Amaranth HDL. Online Webpage, 2024 (accessed May 2024). https://github.com/amaranth-lang/amaranth.
[3]
C. Baaij, M. Kooijman, J. Kuper, A. Boeijink, and M. Gerards. Cλlash: Structural Descriptions of Synchronous Hardware Using Haskell. Euromicro Conf. on Digital System Design (DSD), Sep 2010.
[4]
J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis, J. Wawrzynek, and K. Asanović. Chisel: Constructing Hardware in a Scala Embedded Language. Design Automation Conf. (DAC), Jun 2012.
[5]
P. Bellows and B. Hutchings. JHDL: An HDL for Reconfigurable Systems. Symp. on FPGAs for Custom Computing Machines (FCCM), Apr 1998.
[6]
P. Bjesse, K. Claessen, M. Sheeran, and S. Singh. Lava: Hardware Design in Haskell. Int'l Conf. on Functional Programming (ICFP), Sep 1998.
[7]
J. Blocklove, S. Garg, R. Karri, and H. Pearce. Chip-Chat: Challenges and Opportunities in Conversational Hardware Design. Int'l Symp. on Machine Learning for CAD (MLCAD), Sep 2023.
[8]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language Modles are Few-Shot Learners. Conf. on Neural Information Processing Systems (NeurIPS), Dec 2020.
[9]
K. Chang, K. Wang, N. Yang, Y. Wang, D. Jin, W. Zhu, Z. Chen, C. Li, H. Yan, Y. Zhou, Z. Zhao, Y. Cheng, Y. Pan, Y. Liu, M. Wang, S. Liang, Y. Han, H. Li, and X. Li. Data is All You Need: Finetuning LLMs for Chip Design via an Automated Design-Data Augmentation Framework. Design Automation Conf. (DAC), Jun 2024.
[10]
K. Chang, Y. Wang, H. Ren, M. Wang, S. Liang, Y. Han, H. Li, and X. Li. ChipGPT: How Far Are We From Natural Language Hardware Design? Computing Research Repository (CoRR), arXiv:2305.14019, May 2023.
[11]
J. Clow, G. Tzimpragos, D. Dangwal, S. Guo, J. McMahan, and T. Sherwood. A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation. Int'l Conf. on Field Programmable Logic (FPL), Sep 2017.
[12]
cocotb: A Coroutine-Based Cosimulation Library for Writing VHDL and Verilog Testbenches in Python. Online Webpage, 2024 (accessed May 2024). https: //github.com/cocotb/cocotb.
[13]
J.Decaluwe. MyHDL: A Python-Based Hardware Description Language. Linux Journal, Nov 2004.
[14]
D. Durst, M. Feldman, D. Huff, D. Akeley, R. Daly, G. L. Bernstein, M. Patrignani, K. Fatahalian, and P. Hanrahan. Type-Directed Scheduling of Streaming Accelerators. Conf. on Programming Language Design and Implementation (PLDI), Jun 2020.
[15]
Y. Fu, Y. Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y. C. Lin. GPT4AIChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2023.
[16]
Google. CodeGemma: Open Code Models Based on Gemma. Google White Paper, May 2024. https://goo.gle/codegemma.
[17]
S. Jiang, Y. Ou, P. Pan, K. Cheng, Y. Zhang, and C. Batten. PyH2: Using PyMTL3 to Create Productive and Open-Source Hardware Testing Methodologies. IEEE Design and Test of Computers, 40(4):53--61, Apr 2021.
[18]
S.Jiang, P. Pan, Y. Ou, and C. Batten. PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification. IEEE Micro, 40(4):58--66, Jul/Aug 2020.
[19]
F. Kermarrec, S. Bourdeauducq, J.-C. L. Lann, and H. Badier. LiteX: An Open-Source SoC Builder and Library Based on Migen Python DSL. Workshop on Open-Source Design Automation (OSDA), Mar 2019.
[20]
M. Liu, T.-D. Ene, R. Kirby, C. Cheng, N. Pinckney, R. Liang, J. Alben, H. Anand, S. Banerjee, I. Bayraktaroglu, B. Bhaskaran, B. Catanzaro, A. Chaudhuri, S. Clay, B. Dally, L. Dang, P. Deshpande, S. Dhodhi, S. Halepete, E. Hill, J. Hu, S. Jain, A. Jindal, B. Khailany, G. Kokai, K. Kunal, X. Li, C. Lind, H. Liu, S. Oberman, S. Omar, G. Pasandi, S. Pratty, J. Raiman, A. Sarkar, Z. Shao, H. Sun, P. P. Suthar, V. Tej, W. Turner, K. Xu, and H. Ren. ChipNeMo: Domain-Adapted LLMs for Chip Design. Computing Research Repository (CoRR), arXiv:2311.00176, Oct 2023.
[21]
M. Liu, N. Pinckney, B. Khailany, and H. Ren. VerilogEval: Evaluating Large Language Models for Verilog Code Generation. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2023.
[22]
S. Liu, W. Fang, Y. Lu, Q. Zhang, H. Zhang, and Z. Xie. RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution. Computing Research Repository (CoRR), arXiv:2312.08617, Dec 2023.
[23]
D. Lockhart, G. Zibrat, and C. Batten. PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research. Int'l Symp. on Microarchitecture (MICRO), Dec 2014.
[24]
Y. Lu, S. Liu, Q. Zhang, and Z. Xie. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Models. Asia and South Pacific Design Automation Conf. (ASP-DAC), Jan 2024.
[25]
D. R. MacIver, Z. Hatfield-Dodds, and many other contributors. Hypothesis: A New Approach to Property-Based Testing. Journal of Open-Source Software (JOSS), 4(43), Nov 2019.
[26]
Meta. Introducing Meta Llama 3: The Most Capable Openly Available LLM to Date. Online Webpage, Apr 2024 (accessed May 2024). https://ai.meta.com/blog/meta-llama-3.
[27]
R. Nigam, P. H. A. de Amorim, and A. Sampson. Modular Hardawre Design with Timeline Types. Conf. on Programming Language Design and Implementation (PLDI), Jun 2023.
[28]
N. Nikhil. Bluespec System Verilog: Efficient, Correct RTL from High-Level Specifications. Int'l Conf. on Formal Methods and Models for Co-Design (MEMOCODE), Jun 2004.
[29]
OpenAI. New Models and Developer Products Announced at DevDay. Online Webpage, Nov 2024 (accessed May 2024). https://openai.com/index/new-models-and-developer-products-announced-at-devday.
[30]
OpenAI et al. GPT-4 Technical Report. Computing Research Repository (CoRR), arxiv:2303.08774, Mar 2023.
[31]
M. Orenes-Vera, M. Martonosi, and D. Wentzlaff. Using LLMs to Facilitate Formal Verification of RTL. Computing Research Repository (CoRR), arXiv:2309.09437, Sep 2023.
[32]
O. Port and Y. Etsion. DFiant: A Dataflow Hardware Description Language. Int'l Conf. on Field Programmable Logic (FPL), Sep 2017.
[33]
A. Ray, B. Devlin, F. Y. Quah, and R. Yesantharao. HardCaml: An OCaml Hardware Domain-Specific Languaeg for Efficient and Robust Design. Computing Research Repository (CoRR), arXiv:1509.02058, Dec 2023.
[34]
SpinalHDL: A Scala-based HDL. Online Webpage, 2024 (accessed May 2024). https://github.com/SpinalHDL/SpinalHDL.
[35]
S. Thakur, B. Ahmad, Z. Fan, H. Pearce, B. Tan, R. Karri, B. Dolan-Gavitt, and S. Garg. Benchmarking Large Language Models for Automated Verilog RTL Code Generation. Design, Automation, and Test in Europe (DATE), Apr 2023.
[36]
S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg. VeriGen: A Large Language Model for Verilog Code Generation. ACM Trans. on Design Automation of Electronic Systems (TODAES), 29(3):1--31, Apr 2024.
[37]
S. Thakur, J. Blocklove, H. Pearce, B. Tan, S. Garg, and R. Karri. AutoChip: Automating HDL Generation Using LLM Feedback. Computing Research Repository (CoRR), arXiv:2311.04887, Nov 2023.
[38]
L. Truong and P. Hanrahan. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. Summit on Advances in Programming Languages (SNAPL), May 2019.
[39]
Y.-D. Tsai, M. Liu, and H. Ren. RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models. Computing Research Repository (CoRR), arXivv:2311.16543, Nov 2023.
[40]
M. Zakharov, F. R. Kashanaki, and J. Renau. HDLEval Benchmarking LLMs for Multiple HDLs. Int'l Workshop on LLM-Aided Design (LAD), Jun 2024.
[41]
Z. Zhang, G. Chadwick, H. McNally, Y. Zhao, and R. Mullins. LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation. Computing Research Repository (CoRR), arXiv:2310.04535, Oct 2023.
[42]
R. Zhong, X. Du, S. Kai, Z. Tang, S. Xu, H.-L. Zhen, J. Hao, Q. Xu, M. Yuan, and J. Yan. LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation. Computing Research Repository (CoRR), arXiv:2401.12224, Dec 2023.

Cited By

View all
  • (2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024

Index Terms

  1. PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MLCAD '24: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD
      September 2024
      321 pages
      ISBN:9798400706998
      DOI:10.1145/3670474
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 September 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Python-embedded domain-specific languages
      2. hardware description languages
      3. large language models

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MLCAD '24
      Sponsor:

      Acceptance Rates

      MLCAD '24 Paper Acceptance Rate 35 of 83 submissions, 42%;
      Overall Acceptance Rate 35 of 83 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)232
      • Downloads (Last 6 weeks)37
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media