Large Language Models (LLMs) have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks. Nevertheless, there remains to be a significant gap in benchmarks that accurately reflect the comp...
详细信息
ISBN:
(纸本)9798350376098;9798350376081
Large Language Models (LLMs) have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks. Nevertheless, there remains to be a significant gap in benchmarks that accurately reflect the complexity of real-world RTL projects. To address this, this paper presents RTL-Repo, a benchmark specifically designed to evaluate LLMs on large-scale RTL design projects. RTL-Repo includes a comprehensive dataset of more than 4000 verilogcode samples extracted from public GitHub repositories, with each sample providing the full context of the corresponding repository. We evaluate several state-of-the-art models on the RTL-Repo benchmark, including GPT-4, GPT-3.5, Starcoder2, alongside verilog-specific models like VeriGen and RTLcoder, and compare their performance in generating verilogcode for complex projects. The RTL-Repo benchmark provides a valuable resource for the hardware design community to assess and compare LLMs' performance in real-world RTL design scenarios and train LLMs specifically for verilog code generation in complex, multi-file RTL projects. RTL-Repo is open-source and publicly available on Github(1).
Hardware description language (HDL) code designing is a critical component of the chip design process, requiring substantial engineering and time resources. Recent advancements in large language models (LLMs), such as...
详细信息
Hardware description language (HDL) code designing is a critical component of the chip design process, requiring substantial engineering and time resources. Recent advancements in large language models (LLMs), such as GPT series, have shown promise in automating HDL codegeneration. However, current LLM-based approaches face significant challenges in meeting real-world hardware design requirements, particularly in handling complex designs and ensuring code correctness. Our evaluations reveal that the functional correctness rate of LLM-generated HDL code significantly decreases as design complexity increases. In this paper, we propose the AutoSilicon framework, which aims to scale up the hardware design capability of LLMs. AutoSilicon incorporates an agent system, which 1) allows for the decomposition of large-scale, complex code design tasks into smaller, simpler tasks; 2) provides a compilation and simulation environment that enables LLMs to compile and test each piece of code it generates; and 3) introduces a series of optimization strategies. Experimental results demonstrate that AutoSilicon can scale hardware designs to projects with code equivalent to over 10,000 tokens. In terms of design quality, it further improves the syntax correctness rate and functional correctness rate compared with approaches that do not employ any extensions. For example, compared to directly generating HDL code using GPT-4-turbo, AutoSilicon enhances the syntax correctness rate by an average of 35.8% and improves functional correctness by an average of 35.6%.
暂无评论