Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity betwe...
详细信息
Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWcode in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the ori
Background: Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. The interoperability of Java and Kotlin...
详细信息
This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With code dependency (GA-XWcode), which integrates eXtreme Gradient Boosting (XGBoost) with...
详细信息
This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With code dependency (GA-XWcode), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWcode outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of <0.01 and demonstrates exceptional performance and stability across various training data scales.
Context Software modularization is extremely important to streamline the inner structure of the program modules without influencing its core functionality. As the framework advances during the upkeep stage, the pristi...
详细信息
Context Software modularization is extremely important to streamline the inner structure of the program modules without influencing its core functionality. As the framework advances during the upkeep stage, the pristine design of the software package gets disintegrated and hence it is arduous to understand and maintain. There are many existing approaches being carried out to automatically remodularize using optimization techniques to ease the maintenance and improve the quality of the system. The outcomes are rather insufficiently optimal and depend on problem-specific operators, which in turn expands the time multifaceted nature to land at an answer. Apart from these limitations, the issues, such as time complexity, scalability and performance need to be addressed. Objective: In this paper, an efficient automatic software remodularization using extended Ant Colony Optimization (ACO) has been proposed to remodularize the software systems. Method: The proposed approach mainly includes two phases: optimised traversal of software system using ACO for finding the order of software files to be processed and remodularization of software system using the proposed approach of extended ACO. Results: We experimented our proposed approach on seven software systems. The performance is evaluated by using Turbo modularization quality (MQ) which supports Module dependency graph (MDG) that have edge weights. The time complexity of remodularized software system is evaluated based on number of Turbo MQ. Conclusion It can be concluded that when the performance has been compared with the subsisting methodologies, for example, Genetic algorithm (GA), Hill climbing (HC) and Interactive genetic algorithms (I-GM), the proposed approach has higher Turbo MQ value with lesser time complexity in the evaluated software systems.
Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, cod...
详细信息
ISBN:
(纸本)9781728105918
Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language. To recognize the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognize algorithm classes across languages. We have instantiated the framework with several kinds of token-, tree-and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures. Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and mu
暂无评论