This paper presents a novel approach for comparing software systems by calculating the robust Hausdorff distance between semantic source code embeddings of individual software components, i.e., methods. The proposed a...
详细信息
This paper presents a novel approach for comparing software systems by calculating the robust Hausdorff distance between semantic source code embeddings of individual software components, i.e., methods. The proposed approach represents each software as a set of vectors, where every vector is a semantic sourcecode embedding of a particular method. The codeembeddings are constructed from abstract syntax trees of the methods with the help of attention-based neural network models that capture the semantics of the methods. Previous research has shown that comparing semantic source code embeddings can reveal semantic relationships between the two methods. We utilize this characteristic to estimate the semantic similarity between the two software systems by computing the robust Hausdorff distance. In the experiment, a pre-trained code2vec neural network model is used to create the sourcecode vector representations of several open-source Java-based libraries. Several variations of the robust Hausdorff distance are evaluated. The results show that the proposed approach can effectively estimate the semantic similarity, reflecting the software library's scopes, software evolution, and individual parts (e.g., packages) of those libraries.
暂无评论