As large language models (llms) have become more advanced, generating code to solve exercises in programming courses has become significantly easier. However, this convenience raises the concern of over-reliance on th...
详细信息
As Large Language Models (llms) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed llm performance on human-generatedcode, the prolif...
详细信息
As Large Language Models (llms) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed llm performance on human-generatedcode, the proliferation of llm-generated code raises critical questions about their ability to detect clones across both human- and llm-created codebases, as this capability remains largely unexplored. This paper addresses this gap by evaluating two versions of LLaMA3 on these distinct types of datasets. Additionally, we perform a deeper analysis beyond simple prompting, examining the nuanced relationship between code cloning and code similarity that llms infer. We further explore how fine-tuning impacts llm performance in clone detection, offering new insights into the interplay between code clones and similarity in human versus AI-generatedcode. Our findings reveal that LLaMA models excel in detecting syntactic clones but face challenges with semantic clones. Notably, the models perform better on llm-generated datasets for semantic clones, suggesting a potential bias. The fine-tuning technique enhances the ability of llms to comprehend code semantics, improving their performance in both code clone detection and code similarity assessment. Our results offer valuable insights into the effectiveness and characteristics of llms in clone detection and code similarity assessment, providing a foundation for future applications and guiding further research in this area.
The increasing trend of using Large Language Models (llms) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation ...
详细信息
ISBN:
(纸本)9798400704826
The increasing trend of using Large Language Models (llms) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from llms. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the codegenerated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.
暂无评论