Programming has become a fundamental skill in the current digital era. A formal programming course relies on an autograder to score student works. However, the usual black-box method only compares the output instead o...
详细信息
ISBN:
(纸本)9781665494533
Programming has become a fundamental skill in the current digital era. A formal programming course relies on an autograder to score student works. However, the usual black-box method only compares the output instead of adequately examining the code structure. As such, another method is required to measure the structure of the student submission code to give fairer scores. In this paper, an experiment is conducted using a control-flow graph (CFG) comparison algorithm to measure the similarity between student submission code and reference code from the instructor, followed by an analysis of the results obtained from the experiment. The comparison was made using the Hu Algorithm [1], based on previous CFG similarity measurements presented by Chan and Collberg [2]. Through the experiment and analysis process, it is concluded that the CFG comparison method implemented in this research is better applied to boost students who got low scores due to minor mistakes rather than be applied to the entire student submissions as the primary scoring algorithm.
Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis wh...
详细信息
ISBN:
(数字)9783031539695
ISBN:
(纸本)9783031539688;9783031539695
Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis which have the potential of assisting software teams working on large codebases. Existing approaches aim at computing similarity between two codes by suitable representation of them which captures syntactic and semantic properties. However, they lack explainability and generalization for multiple languages comparison. Here, we present a preliminary result that attempts at providing a graph-focused representation of code by means of which clustering and classification of programs is possible while exposing explainability and generalizability characteristics.
Satisfiability modulo theories (SMT) solvers have been widely applied as the reasoning engine for diverse software analysis and verification technologies. The efficiency of the SMT solver has significant effects on th...
详细信息
Satisfiability modulo theories (SMT) solvers have been widely applied as the reasoning engine for diverse software analysis and verification technologies. The efficiency of the SMT solver has significant effects on the performance of these technologies. However, current SMT solvers are designed for the general purpose of constraint solving. Lots of useful knowledge of programs cannot be utilized during SMT solving. As a result, the SMT solver may spend much effort to explore redundant search space. In this article, we propose a novel approach to utilizing control-flow knowledge in SMT solving. With this technique, the search space can be considerably reduced, and the efficiency of SMT solving is observably improved. We conducted extensive experiments on credible benchmarks. The results show significant improvements of our approach.
Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream com...
详细信息
Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler's IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the "serial-projection property," which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler. For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program's control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM's half-millionline core middle-end functionality. These changes sufficed to enable LLVM's existing compiler optimizations for serial code including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination-to work with parallel control constructs such as parallel loops and Cilk's cilk_spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel controlflow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can opti
Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art gen...
详细信息
ISBN:
(纸本)9781450360968
Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art generic deobfuscation methodologies are based on dynamic symbolic execution (DSE). However, DSE suffers from limitations such as code coverage and scalability. In the race to counter and remove the most advanced obfuscation techniques, there is a need to reduce the amount of code to cover. To that extend, we propose a novel deobfuscation approach based on semantic equivalence, called DoSE. With DoSE, we aim to improve and complement DSE-based deobfuscation techniques by statically eliminating obfuscation transformations (built on code-reuse). This improves the code coverage. Our method's novelty comes from the transposition of existing binary diffing techniques, namely semantic equivalence checking, to the purpose of the deobfuscation of untreated techniques, such as two-way opaque constructs, that we encounter in surreptitious software. In order to challenge DoSE, we used both known malwares such as Cryptowall, WannaCry, Flame and BitCoinMiner and obfuscated code samples. Our experimental results show that DoSE is an efficient strategy of detecting obfuscation transformations based on code-reuse with low rates of false positive and/or false negative results in practice, and up to 63% of code reduction on certain types of malwares.
作者:
Chen, JianhuiHe, FeiTsinghua Univ
Sch Software Key Lab Informat Syst Secur MoEBeijing Natl Res Ctr Informat Sci & Technol Beijing Peoples R China
Satisfiability modulo theories (SMT) solvers have been widely applied as the reasoning engine for diverse software analysis and verification technologies. The efficiency of the SMT solver has significant effects on th...
详细信息
ISBN:
(数字)9781450359375
ISBN:
(纸本)9781450359375
Satisfiability modulo theories (SMT) solvers have been widely applied as the reasoning engine for diverse software analysis and verification technologies. The efficiency of the SMT solver has significant effects on the performance of these technologies. However, the current SMT solvers are designed for the general purpose of constraint solving. Many useful knowledge of programs cannot be utilized during the SMT solving. As a result, the SMT solver may spend a lot of effort to explore redundant search space. In this paper, we propose a novel approach for utilizing control-flow knowledge in SMT solving. With this technique, the search space can be considerably reduced and the efficiency of SMT solving is observably improved. We conducted extensive experiments on credible benchmarks, the results show orders of magnitude improvements of our approach.
This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically trea...
详细信息
ISBN:
(纸本)9781450344937
This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations across parallel control constructs. Remedying this situation is generally thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir is a compiler IR that represents logically parallel tasks asymmetrically in the program's controlflowgraph. Tapir allows the compiler to optimize across parallel control constructs with only minor changes to its existing analyses and code transformations. To prototype Tapir in the LLVM compiler, for example, we added or modified about 6000 lines of LLVM's 4-million-line codebase. Tapir enables LLVM's existing compiler optimizations for serial code - including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination - to work with parallel control constructs such as spawning and parallel loops. Tapir also supports parallel optimizations such as loop scheduling.
control-flow graph (CFG) similarity is a core technique in many areas, including malware detection and software plagiarism detection. While many algorithms have been proposed in the literature, their relative strength...
详细信息
ISBN:
(纸本)9781479971978
control-flow graph (CFG) similarity is a core technique in many areas, including malware detection and software plagiarism detection. While many algorithms have been proposed in the literature, their relative strengths and weaknesses have not been previously studied. Moreover, it is not even clear how to perform such an evaluation. In this paper we therefore propose the first methodology for evaluating CFG similarity algorithms with respect to accuracy and efficiency. At the heart of our methodology is a technique to automatically generate benchmark graphs, CFGs of known edit distances. We show the result of applying our methodology to four popular algorithms. Our results show that an algorithm proposed by Hu et al. is most efficient both in terms of running time and accuracy.
The static single information (SSI) form is an extension of the static single assignment (SSA) form, a well-established compiler intermediate representation that has been successfully used for numerous compiler analys...
详细信息
The static single information (SSI) form is an extension of the static single assignment (SSA) form, a well-established compiler intermediate representation that has been successfully used for numerous compiler analysis and optimizations. Several interesting results have also been shown for SSI form concerning liveness analysis and the representation of live-ranges of variables, which could make SSI form appealing for just-in-time compilation. Unfortunately, we have uncovered several mistakes in the previous literature on SSI form, which, admittedly, is already quite sparse. This article corrects the mistakes that are most germane to SSI form. We first explain why the two definitions of SSI form proposed in past literature, first by C. S. Ananian, then by J. Singer, are not equivalent. Our main result is then to prove that basic blocks, and thus program points, can be totally ordered so that live-ranges of variables correspond to intervals on a line, a result that holds for both variants of SSI form. In other words, in SSI form, the intersection graph defined by live-ranges is an interval graph, a stronger structural property than for SSA form for which the intersection graph of live-ranges is chordal. Finally, we show how this structure of live-ranges can be used to simplify liveness analysis.
In this paper, we present new pointcuts and primitives to Aspect-Oriented Programming (AOP) languages that are needed for systematic hardening of security concerns. The two proposed pointcuts allow to identify particu...
详细信息
In this paper, we present new pointcuts and primitives to Aspect-Oriented Programming (AOP) languages that are needed for systematic hardening of security concerns. The two proposed pointcuts allow to identify particular join points in a program's control-flow graph (CFG). The first one is the GAflow, Closest Guaranteed Ancestor, which returns the closest ancestor join point to the pointcuts of interest that is on all their runtime paths. The second one is the GDflow, Closest Guaranteed Descendant, which returns the closest child join point that can be reached by all paths starting from the pointcut of interest. The two proposed primitives are called ExportParameter and ImportParameter and are used to pass parameters between two pointcuts. They allow to analyze a program's call graph in order to determine how to change function signatures for passing the parameters associated with a given security hardening. We find these pointcuts and primitives to be necessary because they are needed to perform many security hardening practices and, to the best of our knowledge, none of the existing ones can provide their functionalities. Moreover, we show the viability and correctness of the proposed pointcuts and primitives by elaborating and implementing their algorithms and presenting the result of explanatory case studies. (C) 2009 Elsevier Ltd. All rights reserved.
暂无评论