Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. T...
详细信息
Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flowgraph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code knowledge graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on *** and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.
Links between issue reports and corresponding code commits to fix them can greatly reduce the maintenance costs of a software project. More often than not, however, these links are missing and thus cannot be fully uti...
详细信息
ISBN:
(纸本)9781728105918
Links between issue reports and corresponding code commits to fix them can greatly reduce the maintenance costs of a software project. More often than not, however, these links are missing and thus cannot be fully utilized by developers. Current practices in issue-commit link recovery extract text features and code features in terms of textual similarity from issue reports and commit logs to train their models. These approaches are limited since semantic information could be lost. Furthermore, few of them consider the effect of source code files related to a commit on issue-commit link recovery, let alone the semantics of code context. To tackle these problems, we propose to construct code knowledge graph of a code repository and generate embeddings of source code files to capture the semantics of code context. We also use embeddings to capture the semantics of issue-or commit-related text. Then we use these embeddings to calculate semantic similarity and code similarity using a deep learning approach before training a SVM binary classification model with additional features. Evaluations on real-world projects show that our approach DeepLink can outperform the state-of-the-art method.
暂无评论