Different from source code clone detection, clone detection (similar codesearch) in binary executables faces big challenges due to the gigantic differences in the syntax and the structure of binarycode that result f...
详细信息
Different from source code clone detection, clone detection (similar codesearch) in binary executables faces big challenges due to the gigantic differences in the syntax and the structure of binarycode that result from different configurations of compilers, architectures and OSs. Existing studies have proposed different categories of features for detecting binarycode clones, including CFG structures, n-gram in CFG, input/output values, etc. In our previous study and the tool BinGo, to mitigate the huge gaps in CFG structures due to different compilation scenarios, we propose a selective inlining technique to capture the complete function semantics by inlining relevant library and user-defined functions. However, only features of input/output values are considered in BinGo. In this study, we propose to incorporate features from different categories (e.g., structural features and high-level semantic features) for accuracy improvement and emulation for efficiency improvement. We empirically compare our tool, BinGo-E, with the pervious tool BinGo and the available state-of-the-art tools of binary code search in terms of search accuracy and performance. Results show that BinGo-E achieves significantly better accuracies than BinGo for cross-architecture matching, cross-OS matching, cross-compiler matching and intra-compiler matching. Additionally, in the new task of matching binaries of forked projects, BinGo-E also exhibits a better accuracy than the existing benchmark tool. Meanwhile, BinGo-E takes less time than BinGo during the process of matching.
binary function embedding models are applicable to various downstream tasks within IoT device software systems and have demonstrated advantages in numerous binary analysis tasks, such as vulnerability (homologous) fun...
详细信息
binary similarity has been widely used in function recognition and vulnerability detection. How to define a proper similarity is the key element in implementing a fast detection method. We proposed a scalable method t...
详细信息
ISBN:
(纸本)9781728173030
binary similarity has been widely used in function recognition and vulnerability detection. How to define a proper similarity is the key element in implementing a fast detection method. We proposed a scalable method to detect binary vulnerabilities based on similarity. Procedures lifted from binaries are divided into several comparable strands by data dependency, and those strands are transformed into a normalized form by our tool named Vulnera Bin, so that similarity can be determined between two procedures through a hash value comparison. The low computational complexity allows semantically equivalent code to be identified in binaries compiled from million lines of source code in a fast and accurate way.
binary code search has received much attention recently due to its impactful applications, e.g., plagiarism detection, malware detection and software vulnerability auditing. However, developing an effective binary cod...
详细信息
ISBN:
(纸本)9781665403924
binary code search has received much attention recently due to its impactful applications, e.g., plagiarism detection, malware detection and software vulnerability auditing. However, developing an effective binary code search tool is challenging due to the gigantic syntax and structural differences in binaries resulted from different compilers, compiler options and malware family. In this paper, we propose a scalable and accurate binarysearch engine which performs syntactic matching by combining a set of key techniques to address the challenges above. The key contribution is binary code searching technique which combined function filtering and partial trace method to match the function code relatively quick and accurate. In addition, a simhash and basic information based function filtering is proposed to dramatically reduce the irrelevant target functions. Besides, we introduce a partial trace method for matching the shortlisted function accurately. The experimental results show that our method can find similar functions, even with the presence of program structure distortion, in a scalable manner.
We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compile...
详细信息
ISBN:
(纸本)9781450349888
We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code. We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity. We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.
暂无评论