The cryptographic techniques are commonly used in software protection against malicious re-engineering. How to efficiently detect encryption algorithms used in the software to determine if they meet protection require...
详细信息
The cryptographic techniques are commonly used in software protection against malicious re-engineering. How to efficiently detect encryption algorithms used in the software to determine if they meet protection requirements is an interesting and significant task. However, existing encryption algorithm detection methods suffer from a high alarm rate or low efficiency as they fail to extract the complete program structure and semantic features of the encryption algorithms. In this article, we proposed GENDA, a graph embedding network-based detection method on encrypted binarycode. We first analyze the characteristics of various encryption algorithms and construct the program graph for each encryption algorithm. Then the program graph is recursively embedded into the graph neural network as a basic unit, and the vector representation of the encryption algorithm graph is obtained. Finally, the type of encryption algorithm is determined by comparing the distance between these vectors. To evaluate GENDA, we collected a number of cryptographic libraries and real application programs from the open-source software. The experimental results show that GENDA can reach over a detection success rate of 92%. We also compared GENDA to existing state-of-the-art detection methods. The comparison results show that GENDA outperforms most of the existing methods.
binarycode traceability aims to use the relevant characteristics of anonymous binarycodes to identify concealed authors or teams and replace error-prone and time-consuming manual reverse engineering tasks with autom...
详细信息
binarycode traceability aims to use the relevant characteristics of anonymous binarycodes to identify concealed authors or teams and replace error-prone and time-consuming manual reverse engineering tasks with automated systems. Although significant progress has been made in source code traceability technology, research on tracking binary files is still limited. Hence, we propose a feature extraction method and deep learning model that exploit the sequence and structure information of binarycodes to identify the authors of anonymous and malicious binarycodes and their relations with other known binarycode families. We further propose a new multigranularity information fusion feature based on biological genes oriented to the traceability of binarycodes. The evaluations conducted on the Google code Jam (GCJ) dataset indicate that our method can accurately trace the binarycode from 10 00 people to the target author with an accuracy rate of 71%. Further, experimental results verify the robustness of the proposed model. For malicious code datasets, in particular, the proposed method achieved a stable traceability accuracy rate for malicious samples using only a small number of training samples. For the problem of malicious code tracking, in 300 team organizations, the proposed method achieved a code-tracing accuracy rate of 82%. (C) 2022 Elsevier Ltd. All rights reserved.
During software development, numerous third-party library functions are often reused. Accurately recognizing library functions reused in software is of great significance for some security scenarios, such as the detec...
详细信息
During software development, numerous third-party library functions are often reused. Accurately recognizing library functions reused in software is of great significance for some security scenarios, such as the detection of known vulnerabilities and reverse analyses of malware. An optional method for recognizing library functions is matching the functions in the library to those in the target software. However, due to the diversity of function library versions, compilers, build options, etc., there are differences between the two corresponding functions. Recognizing library functions used in target software precisely is still a challenging task. In this paper, we propose a novel method named SELF (SEarch for Library Functions) to recognize library functions used in target software. In SELF, the function is represented with a co-occurrence matrix and encoded by a convolutional auto-encoder (CAE). Then, the similarity between two functions is detected using the generated bottleneck features. This scheme focuses on the discriminative semantic features;thus, this method can not only distinguish different functions but also tolerate the subtle differences between two pairing functions, which is specifically required for library function recognition. We collected 451 software projects, including approximately 3 million functions, to train and evaluate SELF. The experimental results show that SELF performs well in both Recall@1 and Recall@5. Especially when the library version gap is large, SELF significantly outperforms classic BINDIFF. In addition, SELF shows good computational efficiency. (c) 2021 Elsevier Ltd. All rights reserved.
binarycode fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers...
详细信息
binarycode fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers and reverse engineers since it enables high fidelity reasoning about the binarycode such as revealing the functionality, authorship, libraries used, and vulnerabilities. Numerous studies have investigated binarycode with the goal of extracting fingerprints that can illuminate the semantics of a target application. However, extracting fingerprints is a challenging task since a substantial amount of significant information will be lost during compilation, notably, variable and function naming, the original data and control flow structures, comments, semantic information, and the code layout. This article provides the first systematic review of existing binarycode fingerprinting approaches and the contexts in which they are used. In addition, it discusses the applications that rely on binarycode fingerprints, the information that can be captured during the fingerprinting process, and the approaches used and their implementations. It also addresses limitations and open questions related to the fingerprinting process and proposes future directions.
作者:
Elie MenginFabrice RossiSAMM
EA 4543 Université Paris 1 Panthéon-Sorbonne 75013 Paris CEREMADE
CNRS UMR 7534 Université Paris-Dauphine PSL University 75016 Paris
In this paper, we present a novel algorithm to address the Network Alignment problem. It is inspired from a previous message passing framework of Bayati et al. [2] and includes several modifications designed to signif...
详细信息
In this paper, we present a novel algorithm to address the Network Alignment problem. It is inspired from a previous message passing framework of Bayati et al. [2] and includes several modifications designed to significantly speed up the message updates as well as to enforce their convergence. Experiments show that our proposed model outperforms other state-of-the-art solvers. Finally, we propose an application of our method in order to address the binary Diffing problem. We show that our solution provides better assignment than the reference differs in almost all submitted instances and outline the importance of leveraging the graphical structure of binary programs.
This paper presents a method for exploitable vulnerabilities detection in binarycode with almost no false positives. It is based on the concolic (a mix of concrete and symbolic) execution of software binarycode and ...
详细信息
This paper presents a method for exploitable vulnerabilities detection in binarycode with almost no false positives. It is based on the concolic (a mix of concrete and symbolic) execution of software binarycode and the annotation of sensitive memory zones of the corresponding program traces (represented in a formal manner). Three big families of vulnerabilities are considered (taint related, stack overflow, and heap overflow). Based on the angr framework as a supporting software VulnerabilitY detection based on dynamic behavioral PattErn Recognition was developed to demonstrate the viability of the method. Several test cases using custom code, Juliet test base and widely used public libraries were performed showing a high detection potential for exploitable vulnerabilities with a very low rate of false positives.
Today’s Internet of Things (IoT) environments are heterogeneous as they are typically comprised of devices equipped with various CPU architectures and software platforms. Therefore, in defending IoT environments agai...
详细信息
Today’s Internet of Things (IoT) environments are heterogeneous as they are typically comprised of devices equipped with various CPU architectures and software platforms. Therefore, in defending IoT environments against security threats, the capability of crossarchitecture vulnerability detection is of paramount importance. In this paper, we propose BinX, a deep learning-based approach for code similarity detection in binaries that are obtained through different compilers and optimization levels for various architectures. Our research is guided by a key idea that involves leveraging the Ghidra decompiler to generate the decompiled C code and the high p-code intermediate representation and pre-train transformerbased model, specifically BERT and codeBERT, to accurately generate semantic embeddings. These embeddings are then utilized as inputs to an RNN Siamese neural network, enhancing the learning process for code similarity detection. The effectiveness of our approach is demonstrated through several experiments and comparisons with existing methods. Our results showcase the potential of BinX in enabling cross-architecture vulnerability detection in cross-architecture cross-compiled binaries, contributing to the advancement of security in IoT environments.
暂无评论