binary code clone detection (or similarity comparison) is a fundamental technique for many important applications, such as plagiarism detection, malware analysis, software vulnerability assessment and program comprehe...
详细信息
ISBN:
(纸本)9781538605356
binary code clone detection (or similarity comparison) is a fundamental technique for many important applications, such as plagiarism detection, malware analysis, software vulnerability assessment and program comprehension. With the prevailing of smart and IoT (Internet of Things) devices, more and more programs are ported from traditional desktop platforms (e.g., IA-32) to ARM and MIPS architectures. It becomes imperative to detect cloned binary code across architectures. However, because of incomparable instruction sets of different architectures as well as alternative compiling configurations, it is difficult to conduct a binary code clone detection with traditional syntax-or structure-based methods. To address, we propose a semantics-based approach to fulfill the target. We recognize arguments and indirect jump targets of each binary function, and emulate executions of those functions, extracting semantic signatures to measure the similarity of functions. The approach has been implemented in a prototype system named CACOMPARE to detect cloned binary functions across architectures and compiling configurations. It supports comparisons between mainstream architectures (IA-32, ARM and MIPS) and is able to analyse binaries on the Linux platform. The experimental results show that CACOMPARE not only is effective in dealing with binaries of different architectures and variant compiling configurations, but also improves the accuracy of binary code clone detection comparing to state-of-the-art solutions.
In this paper, we present an approach to comparing control flow graphs of binaryprograms by matching their basic blocks. We first set up an initial match and propagate it to reach a stable state. We consider the matc...
详细信息
ISBN:
(纸本)9781479935741
In this paper, we present an approach to comparing control flow graphs of binaryprograms by matching their basic blocks. We first set up an initial match and propagate it to reach a stable state. We consider the matched pairs to identify overall similarities. To evaluate the proposed method, we perform experiments on real-world Java applications, and compare their performance with previous structural matching method. In the experimental results, the proposed method shows more reliable results than previous method at distinguishing similar control flow graphs.
Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for v...
详细信息
Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for various binaryanalysis applications, such as function fingerprinting, clone detection, and authorship attribution. It is thus important to develop an efficient and automated approach for extracting compiler provenance. In this study, we present BinComp, a practical approach which, analyzes the syntax, structure, and semantics of disassembled functions to extract compiler provenance. BinComp has a stratified architecture with three layers. The first layer applies a supervised compilation process to a set of known programs to model the default code transformation of compilers. The second layer employs an intersection process that disassembles functions across compiled binaries to extract statistical features (e.g., numerical values) from common compiler/linker-inserted functions. This layer labels the compiler-related functions. The third layer extracts semantic features from the labeled compiler-related functions to identify the compiler version and the optimization level. Our experimental results demonstrate that BinComp is efficient in terms of-both computational resources and time. (C) 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).
The capability of efficiently recognizing reused functions for binary code is critical to many digital forensics tasks, especially considering the fact that many modern malware typically contain a significant amount o...
详细信息
The capability of efficiently recognizing reused functions for binary code is critical to many digital forensics tasks, especially considering the fact that many modern malware typically contain a significant amount of functions borrowed from open source software packages. Such a capability will not only improve the efficiency of reverse engineering, but also reduce the odds of common libraries leading to false correlations between unrelated code bases. In this paper, we propose SIGMA, a technique for identifying reused functions in binary code by matching traces of a novel representation of binary code, namely, the Semantic Integrated Graph (SIG). The SIG s enhance and merge several existing concepts from classic programanalysis, including control flow graph, register flow graph, and function call graph into a joint data structure. Such a comprehensive representation allows us to capture different semantic descriptors of common functionalities in a unified manner as graph traces, which can be extracted from binaries and matched to identify reused functions, actions, or open source software packages. Experimental results show that our approach yields promising results. Furthermore, we demonstrate the effectiveness of our approach through a case study using two malware known to share common functionalities, namely, Zeus and Citadel. (C) 2015 The Authors. Published by Elsevier Ltd.
In recent years, malware has grown faster than ever in volume, form and harmfulness. While existing static or dynamic analysis techniques can meet the common user needs for malware detection, ana -lysts desire a more ...
详细信息
In recent years, malware has grown faster than ever in volume, form and harmfulness. While existing static or dynamic analysis techniques can meet the common user needs for malware detection, ana -lysts desire a more detailed overview to uncover the program architecture. Malware often force research into difficulties due to their complex anti-analysis techniques, which call for a quick analysis of program structure and components to clarify malware functional semantics. In this paper, we use community dis-covery methods to automate the malware program components analysis from the intuition of modular programing principles. Specifically, we design and implement DeMal, a solution to the malware module decomposition problem. It achieves remodularization by recovering program call relationships, extract-ing structure-related attributes, and applying an ensemble model of multiple community discovery algo-rithms. DeMal takes a malicious executable as input and predicts its code composition structure. In an evaluation with 155 malware samples, DeMal performs well on achieving an average F1-score of 71.3%, and 14.5% of the samples reach an average precision of 90%. The analysis time on each sample is about 19.79s. On extended experiments with 1,621 benign programs and over 10,0 0 0 stripped malware sam-ples, we also verify DeMal's scalability on common programs as well as the large-scale performance, respectively. The visualization of the results also strongly demonstrates DeMal's module decomposition capabilities.(c) 2022 Published by Elsevier Ltd.
The capability of efficiently recognizing reused functions for binary code is critical to many digital forensics tasks, especially considering the fact that many modern malware typically contain a significant amount o...
详细信息
The capability of efficiently recognizing reused functions for binary code is critical to many digital forensics tasks, especially considering the fact that many modern malware typically contain a significant amount of functions borrowed from open source software packages. Such a capability will not only improve the efficiency of reverse engineering, but also reduce the odds of common libraries leading to false correlations between unrelated code bases. In this paper, we propose SIGMA, a technique for identifying reused functions in binary code by matching traces of a novel representation of binary code, namely, the Semantic Integrated Graph (SIG). The SIG s enhance and merge several existing concepts from classic programanalysis, including control flow graph, register flow graph, and function call graph into a joint data structure. Such a comprehensive representation allows us to capture different semantic descriptors of common functionalities in a unified manner as graph traces, which can be extracted from binaries and matched to identify reused functions, actions, or open source software packages. Experimental results show that our approach yields promising results. Furthermore, we demonstrate the effectiveness of our approach through a case study using two malware known to share common functionalities, namely, Zeus and Citadel. (C) 2015 The Authors. Published by Elsevier Ltd.
The Internet of Things (IoT) enables many new and exciting applications, but it also creates a number of new risks related to information security. Several recent attacks on IoT devices and systems illustrate that the...
详细信息
The Internet of Things (IoT) enables many new and exciting applications, but it also creates a number of new risks related to information security. Several recent attacks on IoT devices and systems illustrate that they are notoriously insecure. It has also been shown that a major part of the attacks resulted in full adversarial control over IoT devices, and the reason for this is that IoT devices themselves are weakly protected and they often cannot resist even the most basic attacks. Penetration testing or ethical hacking of IoT devices can help discovering and fixing their vulnerabilities that, if exploited, can result in highly undesirable conditions, including damage of expensive physical equipment or even loss of human life. In this paper, we give a basic introduction into hacking IoT devices. We give an overview on the methods and tools for hardware hacking, firmware extraction and unpacking, and performing basic firmware analysis. We also provide a survey on recent research on more advanced firmware analysis methods, including static and dynamic analysis of binaries, taint analysis, fuzzing, and symbolic execution techniques. By giving an overview on both practical methods and readily available tools as well as current scientific research efforts, our work can be useful for both practitioners and academic researchers.
暂无评论