检索结果-内蒙古大学图书馆

20th International Conference on Information Security and Cryptology, Inscrypt 2024

作者： Yang, Hongyu Wang, Yunlong Hu, Ze Cheng, Xiang School of Computer Science and Technology Civil Aviation University of China Tianjin 300300 China School of Safety Science and Engineering Civil Aviation University of China Tianjin 300300 China School of Information Engineering Yangzhou University Yangzhou 225127 China Information Security Evaluation Center of Civil Aviation Civil Aviation University of China Tianjin 300300 China

ISBN: (纸本)9789819647309

Existing binary code similarity detection (BCSD) methods often overlook the actual execution information and local semantic details of programs, leading to suboptimal performance in assembly code semantic representation learning, high training resource consumption, and poor similarity detection performance. To address these issues, this paper proposes a Multi-Source Coordinated Representation Learning (MSRL) method for binary code similarity detection. First, we extract the semantic correspondence between assembly instruction sequences and programming language fragments to construct a contrastive learning dataset. We then propose an assembly code and source code semantic alignment (ACSA) method, which uses the high-level semantics of source code as supervisory information. Through contrastive learning tasks, we align the feature representations of the ACSA-Asm encoder and the programming language encoder in the semantic space, thereby enhancing the semantic representation learning capability of ACSA-Asm for assembly instructions. Next, we design a graph-based binary function embedding (GBFE) method, which uses a semantic structure-aware network to fuse the semantic information extracted by ACSA-Asm with the actual execution information of the program, generating function embedding vectors for similarity detection. Experimental results show that, compared to existing methods, MSRL improves the Recall@1 metric for binary code similarity detection by 8%–33%. Additionally, in the context of code obfuscation, MSRL exhibits stronger resilience, with less degradation in the Recall@1 metric. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： binary code similarity detection contrastive learning multi-source semantic structure-aware networks deep neural network

来源：评论

学校读者我要写书评

暂无评论

Fast Cross-Platform binary code similarity detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNN

引用

Chinese Journal of Electronics 2024年第1期33卷 128-138页

作者： Jinxue PENG Yong WANG Jingfeng XUE Zhenyan LIU School of Computer Science and Technology Beijing Institute of Technology

Cross-platform binary code similarity detection aims at detecting whether two or more pieces of binary code are similar or not. Existing approaches that combine control flow graphs(CFGs)-based function representation and graph convolutional network(GCN)-based similarity analysis are the best-performing ones. Due to a large amount of convolutional computation and the loss of structural information, the use of convolution networks will inevitably bring problems such as high overhead and sometimes inaccuracy. To address these issues, we propose a fast cross-platform binary code similarity detection framework that takes advantage of natural language processing(NLP)and inductive graph neural network(GNN) for basic blocks embedding and function representation respectively by simulating extracting structural features and temporal features. GNN's node-centric and small batch is a suitable training way for large CFGs, it can greatly reduce computational overhead. Various NLP basic block embedding models and GNNs are evaluated. Experimental results show that the scheme with long short term memory(LSTM)for basic blocks embedding and inductive learning-based Graph SAGE(GAE) for function representation outperforms the state-of-the-art works. In our framework, we can take only 45% overhead. Improve efficiency significantly with a small performance trade-off.

关键词： Control flow graph Natural language processing Inductive graph neural network binary code similarity detection

来源：评论

学校读者我要写书评

暂无评论

Cross-platform binary code similarity detection based on NMT and graph embedding

引用

MATHEMATICAL BIOSCIENCES AND ENGINEERING 2021年第4期18卷 4528-4551页

作者： Zhu, Xiaodong Jiang, Liehui Chen, Zeng State Key Lab Math Engn & Adv Comp Zhengzhou 450001 Peoples R China Natl Key Lab Sci & Technol Blind Signal Proc Chengdu 610000 Peoples R China

Cross-platform binary code similarity detection is determining whether a pair of binary functions coming from different platforms are similar, and plays an important role in many areas. Traditional methods focus on using platform-independent characteristic strands intersecting or control flow graph (CFG) matching to compute the similarity and have shortages in terms of efficiency and scalability. The existing deep-learning-based methods improve the efficiency but have a low accuracy and still using manually constructed features. Aiming at these problems, a cross-platform binary code similarity detection method based on neural machine translation (NMT) and graph embedding is proposed in this manuscript. We train an NMT model and a graph embedding model to automatically extract two parts of semantics of the binary code and represent it as a high-dimension vector, named an embedding. Then the similarity of two binary functions can be measured by the distance between their corresponding embeddings. We implement a prototype named SimInspector. Our comparative experiment result shows that SimInspector outperforms the state-of-the-art approach, Gemini, by about 6% with respect to similarity detection accuracy, and maintains a good efficiency.

关键词： binary code similarity detection deep learning graph embedding neural machine translation

来源：评论

学校读者我要写书评

暂无评论

A Lightweight Cross-Version binary code similarity detection Based on similarity and Correlation Coefficient Features

引用

IEEE ACCESS 2020年 8卷 120501-120512页

作者： Guo, Hui Huang, Shuguang Huang, Cheng Zhang, Min Pan, Zulie Shi, Fan Huang, Hui Hu, Donghui Wang, Xiaoping Natl Univ Def Technol Coll Elect Engn Hefei 230011 Peoples R China Sichuan Univ Coll Cybersecur Chengdu 610065 Peoples R China Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei 230009 Peoples R China

The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.

关键词： binary code similarity detection cross-version binary malware detection similarity coefficient correlation coefficient

来源：评论

学校读者我要写书评

暂无评论

Asteria-Pro: Enhancing Deep Learning-based binary code similarity detection by Incorporating Domain Knowledge

引用

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY 2024年第1期33卷 1-40页

作者： Yang, Shouguo Dong, Chaopeng Xiao, Yang Cheng, Yiran Shi, Zhiqiang Li, Zhi Sun, Limin Chinese Acad Sci Inst Informat Engn 19 Shucun Rd Beijing 100085 Peoples R China Univ Chinese Acad Sci Sch Cyber Secur 1 Yanqihu East Rd Beijing 101408 Peoples R China

Widespread code reuse allows vulnerabilities to proliferate among a vast variety of firmware. There is an urgent need to detect these vulnerable codes effectively and efficiently. By measuring code similarities, AI-based binary code similarity detection is applied to detecting vulnerable code at scale. Existing studies have proposed various function features to capture the commonality for similarity detection. Nevertheless, the significant code syntactic variability induced by the diversity of IoT hardware architectures diminishes the accuracy of binary code similarity detection. In our earlier study and the tool Asteria, we adopted a Tree-LSTM network to summarize function semantics as function commonality, and the evaluation result indicates an advanced performance. However, it still has utility concerns due to excessive time costs and inadequate precision while searching for large-scale firmware bugs. To this end, we propose a novel deep learning-enhancement architecture by incorporating domain knowledge-based pre-filtration and re-ranking modules, and we develop a prototype named ASTERIA-PRO based on Asteria. The pre-filtration module eliminates dissimilar functions, thus reducing the subsequent deep learning-model calculations. The re-ranking module boosts the rankings of vulnerable functions among candidates generated by the deep learning model. Our evaluation indicates that the pre-filtration module cuts the calculation time by 96.9%, and the re-ranking module improves MRR and Recall by 23.71% and 36.4%, respectively. By incorporating these modules, ASTERIA-PRO outperforms existing state-of-the-art approaches in the bug search task by a significant margin. Furthermore, our evaluation shows that embedding baseline methods with pre-filtration and re-ranking modules significantly improves their precision. We conduct a large-scale real-world firmware bug search, and ASTERIA-PRO manages to detect 1,482 vulnerable functions with a high precision 91.65%.

关键词： binary code similarity detection pre-fitering re-ranking abstract syntactic tree graph neural network

来源：评论

学校读者我要写书评

暂无评论

Feature Extraction Methods for binary code similarity detection Using Neural Machine Translation Models

引用

IEEE ACCESS 2023年 11卷 102796-102805页

作者： Ito, Norimitsu Hashimoto, Masaki Otsuka, Akira Natl Police Acad Police Info Commun Res Ctr Fuchu Tokyo 1838558 Japan Inst Informat Secur Yokohama Kanagawa 2210835 Japan

binary code similarity detection is an effective analysis technique for vulnerability, bug, and plagiarism detection in software for which the source code cannot be obtained. The recent proliferation of IoT devices has also increased the demand for similarity detection across different architectures. However, there are currently not many examples of feature extraction methods using neural machine translation (NMT) models being applied to similarity detection in basic block units across different architectures. In this research, we propose new methods that extract features at a higher speed and detect similarities across different architectures with higher accuracy than existing methods for basic block feature extraction using neural machine translation models. We assume that the intermediate representation of the NMT model, which learned the translation of basic blocks across different architectures, includes the semantics of the instructions in the basic block. Hence we adopted the intermediate representation as the features of the basic blocks. Then, we applied the linear transformation used in bilingual word embedding to match the embedding space of basic blocks across different architectures. This enables the similarity detection in basic block units across different architectures with higher accuracy than the distance learning method used in existing research to match the embedding space. In the evaluation experiment, we compare the Precision at k (P@k) on the same dataset with existing research methods and our method achieved the highest accuracy of 92%. In addition, We also compare the time required for feature extraction using GPUs, and found that it was up to 16 times faster.

关键词： Feature extraction binary codes Semantics Software engineering Computer architecture Training Source coding Machine learning Neural networks Machine translation binary code similarity detection machine learning neural machine translation

来源：评论

学校读者我要写书评

暂无评论

Optir-SBERT: Cross-Architecture binary code similarity detection Based on Optimized LLVM IR 14th

Optir-SBERT: Cross-Architecture Binary Code Similarity Detec...

引用

14th EAI International Conference on Digital Forensics and Cyber Crime (ICDF2C)

作者： Yan, Yintong Yu, Lu Wang, Taiyan Li, Yuwei Pan, Zulie Natl Univ Def Technol Coll Elect Engn Hefei 230037 Peoples R China

ISBN: (纸本)9783031565823;9783031565830

Cross-architecture binary code similarity detection plays an important role in different security domains. In view of the low accuracy and poor scalability of existing cross-architecture detection technologies, we propose Optir-SBERT, which is the first technology to detect cross-architecture binary code similarity based on optimized LLVM IR. At the same time, we design a new data set binaryIR, which is more diverse and provides a benchmark data set for subsequent research work based on LLVM IR. In terms of cross-architecture binary code similarity detection, the accuracy of Optir-SBERT reaches 94.38%, and the contribution of optimization is 3.99%. In terms of vulnerability detection, the average accuracy of Optir-SBERT reach 93.9%, and the contribution of optimization is 7%. The results are better than existing state-of-the-art (SOTA) cross-architecture detection technologies. In order to improve the efficiency of vulnerability detection in realistic scenarios, we introduced a file-level vulnerability identification mechanism on the basis of Optir-SBERT. The new model Optir-SBERT-F saved 45.36% of the detection time on the premise of a slight decrease in detection F value, which greatly improves the efficiency of vulnerability detection.

关键词： binary code similarity detection Cross-architecture Optimized LLVM IR SBERT File-level vulnerability identification mechanism

来源：评论

学校读者我要写书评

暂无评论

SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version binary code similarity detection 24th

SimCGE: Simple Contrastive Learning of Graph Embeddings for ...

引用

24th International Conference on Information and Communications Security (ICICS)

作者： Xia, Fengliang Wu, Guixing Zhao, Guochao Li, Xiangyu Univ Sci & Technol China Hefei Peoples R China Univ Sci & Technol China Suzhou Inst Adv Res Suzhou Peoples R China

ISBN: (纸本)9783031157776;9783031157769

binary code similarity detection (BCSD) has many applications in computer security, whose task is to detect the similarity of two binary functions without having access to the source code. Recently deep learning methods have shown better efficiency, accuracy, and potential in BCSD. Most of them reduce losses by the Siamese network, and they ignore some shortcomings of the Siamese network. In this paper, we introduce the idea of contrastive learning into graph neural networks and experimentally demonstrate that the way of training graph models by contrastive learning is significantly better than Siamese. In addition, we found that Principal Neighbourhood Aggregation for Graph Nets (PNA) has the best ability to extract structural information of control flow graph (CFG) among various graph neural networks.

关键词： binary code similarity detection contrastive learning graph neural network

来源：评论

学校读者我要写书评

暂无评论

IoTSim: Internet of Things-Oriented binary code similarity detection with Multiple Block Relations

引用

SENSORS 2023年第18期23卷 7789-7789页

作者： Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China

binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction architecture sets, which require BCSD approaches robust against different architectures. In this study, we propose a novel IoT-oriented binary code similarity detection approach. Our approach leverages a customized transformer-based language model with disentangled attention to capture relative position information. To mitigate out-of-vocabulary (OOV) challenges in the language model, we introduce a base-token prediction pre-training task aimed at capturing basic semantics for unseen tokens. During function embedding generation, we integrate directed jumps, data dependency, and address adjacency to capture multiple block relations. We then assign different weights to different relations and use multi-layer Graph Convolutional Networks (GCN) to generate function embeddings. We implemented the prototype of IoTSim. Our experimental results show that our proposed block relation matrix improves IoTSim with large margins. With a pool size of 103, IoTSim achieves a recall@1 of 0.903 across architectures, outperforming the state-of-the-art approaches Trex, SAFE, and PalmTree.

关键词： IoT security binary code similarity detection vulnerability detection

来源：评论

学校读者我要写书评

暂无评论

UniBin: Assembly semantic-enhanced binary vulnerability detection without disassembly

引用

INFORMATION SCIENCES 2025年 691卷

作者： Liu, Li Wang, Shen Jiang, Xunzhi Harbin Inst Technol Sch Cyberspace Sci Harbin 150000 Heilongjiang Peoples R China

The widespread reuse of open-source code amplifies the impact of vulnerabilities. Current vulnerability detection methods predominantly rely on binary code similarity comparisons, which involve disassembling to obtain assembly code or control flow graphs. These methods depend on specific disassembly tools and complex preprocessing, limiting their applicability and detection speed. This paper proposes UniBin, a vulnerability detection method based on the multilayer Transformer encoder. By employing bidirectional LM, unidirectional LM, and sequence-to-sequence LM tasks on both binary and assembly code during the pre-training phase, UniBin learns richer semantic information from binary machine code, enabling efficient similarity comparison without disassembly and mitigating the limitations of disassembly. We cross-compile 55 widely used open-source C projects as datasets. After 52 hours of pre-training and 8 hours of fine-tuning, UniBin reaches an average accuracy of 98.3% in similarity detection across compilation conditions, outperforming the state-of-the-art method. For search tasks across optimization options with a pool size of 1000, the Recall@1 metric improves by 28.2% (from 67.9% to 87.1%). UniBin eliminates dependency on specific disassembly tools and improves end-to-end binary analysis speed by over 36%. In real-world vulnerability detection tasks, UniBin detects all vulnerability functions with the lowest false positive rate of 0.16%.

关键词： Vulnerability detection binary code similarity detection Deep neural network Transformer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：