版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Port Said Univ Fac Engn Elect Engn Dept Port Said 42526 Egypt
出 版 物:《JOURNAL OF SYSTEMS AND SOFTWARE》 (J Syst Software)
年 卷 期:2025年第222卷
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:Information Technology Academia Collaboration (ITAC) program in Egypt
主 题:Software development Code clone detection Software productivity Large language models Transformer-based models
摘 要:In software development, the replication of specific source code segments is known as code cloning. This practice allows reusing source code instead of developing these segments from scratch, enhancing software productivity. However, code cloning can introduce bugs, complicate code refactoring, and increase maintenance costs. Consequently, code clone detection (CCD) is an essential concern for the software industry. While various techniques have been proposed for detecting code clones, many existing tools generate a high ratio of false positives/negatives and a need for more contextual awareness. Therefore, this paper introduces CloneXformer, an innovative framework for code clone detection. The framework adopts a collaborative approach that harnesses multiple large language models for code understanding. The framework employs a preliminary phase to preprocess the input code, which helps the models understand and represent the code efficiently. Then, it captures the semantic level of the code and the syntactic level as it relies on a set of transformer-based models. Afterward, these models are finetuned to detect code clones with interpretable results that explain the detected clone types. Finally, the output of these models is combined to provide a unified final prediction. The empirical evaluation indicates that the framework improves detection performance, achieving an approximately 16.88 % higher F1 score than the state-of-the-art techniques.