Software code clone detection techniques and tools play a major role in improving the software quality as well as saving maintenance cost and effort. program dependency graph (PDG) based clone detection techniques hav...
详细信息
ISBN:
(纸本)9781509065950
Software code clone detection techniques and tools play a major role in improving the software quality as well as saving maintenance cost and effort. program dependency graph (PDG) based clone detection techniques have a key advantage over other techniques as they are capable of detecting non-contiguous code clones in addition to contiguous clones. We propose further enhancement to current state of the art PDG-based detection to identify all possible (exact and approximate) clone relations from the obtained clone pair ( PDG-based) results using Approximate Subgraph Matching (ASM). We obtain clone results of our proposed technique on three subject software systems, and validate the results on eclipse-ant from Bellon's benchmark. We also present a new ASM-based distance measure to represent the similarity between software code clones.
Understanding and debugging of data structures and algorithms (DSA) is one of the most common tasks in computer science. DSA tests have also become a standard threshold that software developers have to cross to "...
详细信息
ISBN:
(数字)9781665480925
ISBN:
(纸本)9781665480925
Understanding and debugging of data structures and algorithms (DSA) is one of the most common tasks in computer science. DSA tests have also become a standard threshold that software developers have to cross to "get the job". One major challenge in comprehending and debugging DSA implementations lies in establishing and maintaining mental models of the quintessentially complex and twisted networks of events that make up their dynamic runtime behavior. Despite the high level of difficulty of this crucial task, general purpose tools to help users understand or reason about DSA implementations still have very limited capabilities. In this work we present Dbux-PDG, a dynamic program dependency graph extension for the Dbux omniscient debugger. It captures data and control flow, as well as data dependencies of a program's execution for visualization and user interaction. To deal with the immense complexity of non-trivial programs, it offers multiple layers of summarization, that allow the user to explore either the graph as a whole or in parts, one step at a time, as they see fit. We present our findings from applying Dbux-PDG to 94 diverse algorithms and explore its utility in several case studies. All visual results are made available in an online gallery. Dbux-PDG is open source and one-click installable, making it a powerful, easy-to-use tool prototype for DSA comprehension. Video URL: https://***/dgXj3VoQJZQ
Fault localization is sensitive to program runs, and the pattern of fault propagation and manifestation in real software is extremely complex and uncertain. To accommodate the complexity and uncertainty, this paper pr...
详细信息
Fault localization is sensitive to program runs, and the pattern of fault propagation and manifestation in real software is extremely complex and uncertain. To accommodate the complexity and uncertainty, this paper presents a novel probabilistic graph model - the probabilistic cause-effect graph (PCEG) is built upon dynamic dependencies generated from running the faulty program against failed test cases and performs probabilistic inference with coverage information from the whole test suite. PCEG is an extension of the traditional probabilistic graph both in structural and inferential terms and is different from earlier probabilistic approaches to software diagnosis by introducing two forms of evidences (*** faults and real faults). The proposed probabilistic reasoning algorithm works on the PCEG converted from a dynamic program dependency graph and diagnoses the causes with both top-down and bottom-up inference. The experimental results have shown the improvements on diagnostic effectiveness and accuracy in both single-fault and multiple-fault context, even when a program yields similar program runs through loop statements. Copyright (c) 2015 John Wiley & Sons, Ltd.
Code obfuscation is a staple tool in malware creation where code fragments are altered substantially to make them appear different from the original, while keeping the semantics unaffected. A majority of the obfuscate...
详细信息
Code obfuscation is a staple tool in malware creation where code fragments are altered substantially to make them appear different from the original, while keeping the semantics unaffected. A majority of the obfuscated code detection methods use program structure as a signature for detection of unknown codes. They usually ignore the most important feature, which is the semantics of the code, to match two code fragments or programs for obfuscation. Obfuscated code detection is a special case of the semantic code clone detection task. We propose a detection framework for detecting both code obfuscation and clone using machine learning. We use features extracted from Java bytecode dependencygraphs (BDG), program dependency graphs (PDG) and abstract syntax trees (AST). BDGs and PDGs are two representations of the semantics or meaning of a Java program. ASTs capture the structural aspects of a program. We use several publicly available code clone and obfuscated code datasets to validate the effectiveness of our framework. We use different assessment parameters to evaluate the detection quality of our proposed model. Experimental results are excellent when compared with contemporary obfuscated code and code clone detectors. Interestingly, we achieve 100% success in detecting obfuscated code based on recall, precision, and F1-Score. When we compare our method with other methods for all of obfuscations types, viz, contraction, expansion, loop transformation and renaming, our model appears to be the winner. In case of clone detection our model achieve very high detection accuracy in comparison to other similar detectors. (C) 2017 Elsevier Ltd. All rights reserved.
Software quality is key to the success of software systems. Modern software systems are growing in their worth based on industry needs and becoming more complex, which inevitably increases the possibility of more defe...
详细信息
Software quality is key to the success of software systems. Modern software systems are growing in their worth based on industry needs and becoming more complex, which inevitably increases the possibility of more defects in software systems. Software repairing is time-consuming, especially locating the source files related to specific software defect reports. To locate defective source files more quickly and accurately, automated software defect location technology is generated and has a huge application value. The existing deep learning-based software defect location method focuses on extracting the semantic correlation between the source file and the corresponding defect reports. However, the extensive code structure information contained in the source files is ignored. To this end, we propose a software defect location method, namely, multi-graph learning-based software defect location (MGSDL). By extracting the program dependency graphs for functions, each source file is converted into a graph bag containing multiple graphs (i.e., multi-graph). Further, a multi-graph learning method is proposed, which learns code structure information from multi-graph to establish the semantic association between source files and software defect reports. Experiments' results on four publicly available datasets, AspectJ, Tomcat, Eclipse UI, and SWT, show that MGSDL improves on average 3.88%, 5.66%, 13.23%, 9.47%, and 3.26% over the competitive methods in five evaluation metrics, rank@10, rank@5, MRR, MAP, and AUC, respectively.
This paper proposes an algorithm for the automatic assessment of programming exercises. The algorithm assigns assessment scores based on the program dependency graph structure and the program semantic similarity, but ...
详细信息
This paper proposes an algorithm for the automatic assessment of programming exercises. The algorithm assigns assessment scores based on the program dependency graph structure and the program semantic similarity, but does not actually need to run the student's program. By calculating the node similarity between the student's program and the teacher's reference programs in terms of structure and program semantics, a similarity matrix is generated and the optimal similarity node path of this matrix is identified. The proposed algorithm achieves improved computational efficiency, with a time complexity of O ( n 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n<^>2)$$\end{document} for a graph with n nodes. The experimental results show that the assessment algorithm proposed in this paper is more reliable and accurate than several comparison algorithms, and can be used for scoring programming exercises in C/C++, Java, Python, and other languages.
This paper shows an EDCFG (extended differential control flow graph) to improve the accuracy of UCNs (update complexity numbers) that are utilized for the selection of existing test cases in regression testing.
ISBN:
(纸本)9781509008056
This paper shows an EDCFG (extended differential control flow graph) to improve the accuracy of UCNs (update complexity numbers) that are utilized for the selection of existing test cases in regression testing.
Dynamic taint analysis (DTA) has been widely used in various security-relevant scenarios that need to track the runtime information flow of programs. Dynamic binary instrumentation (DBI) is a prevalent technique in ac...
详细信息
ISBN:
(纸本)9781665458139
Dynamic taint analysis (DTA) has been widely used in various security-relevant scenarios that need to track the runtime information flow of programs. Dynamic binary instrumentation (DBI) is a prevalent technique in achieving effective dynamic taint tracking on commodity hardware and systems. However, the significant performance overhead incurred by dynamic taint analysis restricts its usage in production systems. Previous efforts on mitigating the performance penalty fall into two categories, parallelizing taint tracking from program execution and abstracting the tainting logic to a higher granularity. Both approaches have only met with limited success. In this work, we propose Sdft, an efficient approach that combines the precision of DBI-based instruction-level taint tracking and the efficiency of function-level abstract taint propagation. First, we build the library function summaries automatically with reachability analysis on the program dependency graph (PDG) to specify the control- and data dependencies between the input parameters, output parameters, and global variables of the target library. Then we derive the taint rules for the target library functions and develop taint tracking for library function that is tightly integrated into the state-of-the-art DTA framework Libdft. By applying our approach to the core C library functions of glibc, we report an average of 1.58x speed up of the tracking performance compared with Libdft64. We also validate the effectiveness of the hybrid taint tracking and the ability on detecting real-world vulnerabilities.
Software based clone detection is in hype as industries demand to such product has risen. Due to code replication means the copy and paste activities, such pattern is recurrent thereby developers can reduce effort and...
详细信息
ISBN:
(纸本)9781479976782
Software based clone detection is in hype as industries demand to such product has risen. Due to code replication means the copy and paste activities, such pattern is recurrent thereby developers can reduce effort and time of rewriting similar code fragment from scratch. In the industrial software system, code replication is found a serious trouble because it may affect on quality, consistency, maintainability and comprehensibility. Thus, efficient approach is needed to detect such replication in distributed environment. The trial here is variety of syntax, compiler dependent language, and various coding styles to solve a single problem. As per the related survey, researchers are finding difficult to evolve code copies, even on regressive benchmarking. The existing software tools have some restrictions to detect perfect code clone. Each software developer may think in different way for the implementation of the same problem. The methodology explained here is to specify an efficient way to detect code clone which is a hybrid model that covers maximum coding behavior and classes of clones. Along with similarity check, the paper describes the importance of dissimilarity detection. Detecting dissimilarity is due to operator or function overloading. Since this is essential feature of a good Object oriented Language. It also discusses key techniques that save time in retrieval and comparison of data, by extracting and arranging code that is mined from code document. The proposed system eliminates efforts of comparing the code line by line between two files, which was followed in traditional algorithm. It defines a reduction technique and code complexity based analysis which increases the probability of success. The concluding mark is that no single scheme defines procedure for all types of clone's detection. In this paper, we introduce a multi-model learning technique to detect various types of code clone, which has been taken up as problem statement in this research work
In this paper, we have discussed several code replication detection methods and tools in different dimensions. This review has provided an extensive survey codec clone detection techniques and tools. Starting from clo...
详细信息
ISBN:
(纸本)9789811034336;9789811034329
In this paper, we have discussed several code replication detection methods and tools in different dimensions. This review has provided an extensive survey codec clone detection techniques and tools. Starting from clone perceptions, classification of clones and an overall assortment of selected techniques and tools is discussed. This paper covers the whole paradigm in clone detection and presents open research avenues in code clone detection.
暂无评论