咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >CodeRepoQA: A Large-scale Benc... 收藏
arXiv

CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering

作     者:Hu, Ruida Jiang, Bo Gao, Pengfei Peng, Chao Meng, Xiangxin Wang, Xinchen Ren, Jingyi Wu, Qinyun Gao, Cuiyun 

作者机构:Haribin Institute of Technology Shenzhen China ByteDance Shenzhen China ByteDance Beijing China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Program documentation 

摘      要:In this work, we introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA encompasses five programming languages and covers a wide range of scenarios, enabling comprehensive evaluation of language models. To construct this dataset, we crawl data from 30 well-known repositories in GitHub, the largest platform for hosting and collaborating on code, and carefully filter raw data. In total, CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries, covering a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry. We evaluate ten popular large language models on our dataset and provide in-depth analysis. We find that LLMs still have limitations in question-answering capabilities in the field of software engineering, and medium-length contexts are more conducive to LLMs’ performance. The entire benchmark is publicly available at https://***/kinesiatricssxilm14/CodeRepoQA. © 2024, CC BY-NC-ND.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分