检索结果-内蒙古大学图书馆

IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) / BigDataSE Conference / CSE Conference / EUC Conference / ISCI Conference

作者： Li, Longfei Yin, Xiaokang Li, Xiao Zhu, Xiaoya Liu, Shengli Zhengzhou Univ Zhengzhou Peoples R China Informat Engn Univ Zhengzhou Peoples R China

ISBN: (纸本)9798350381993;9798350382006

binary code similarity detection (BCSD) has numerous applications, including malware detection, vulnerability search, plagiarism detection, and patch identification. Recent studies have demonstrated that with the rapid progress of machine learning (ML) techniques, various BCSD approaches based on machine learning have exhibited stronger performance than traditional methods. However, current ML-based BCSD approaches tend to ignore the issue of training samples, and most ML-based BCSD approaches are based on supervised learning, which is suffered from the labelling difficulties. To mitigate these issues, we propose ConFunc: a function-level binary code similarity detection framework based on contrastive learning. Performance evaluation shows that ConFunc enhances the Mean Reciprocal Rank (MRR) and Recall rates (Recall@1) of baseline models by fully harnessing the potential of the data. Additionally, ConFunc demonstrates stronger performance in scenarios with scarce data, achieving the baseline model's performance on the entire dataset using only 10% of the complete dataset. In real-world patch identification and vulnerability search tasks, ConFunc consistently outperforms other baseline models in MRR and Recall@10.

关键词： binary code similarity detection machine learning contrastive learning function embeddings

来源：评论

学校读者我要写书评

暂无评论

FlowEmbed: binary function embedding model based on relational control flow graph and byte sequence 29

FlowEmbed: Binary function embedding model based on relation...

引用

29th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2023

作者： Wang, Yongpan Dong, Chaopeng Li, Siyuan Luo, Fucai Su, Renjie Song, Zhanwei Li, Hong Chinese Academy of Sciences Institute of Information Engineering China University of Chinese Academy of Sciences School of Cyber Security China State Grid Fujian Electric Power Company China

ISBN: (纸本)9798350330717

binary function embedding models are applicable to various downstream tasks within IoT device software systems and have demonstrated advantages in numerous binary analysis tasks, such as vulnerability (homologous) function search and compilation optimization option identification. However, current binary function embedding methods either learn embedding based on code sequence, which lack the program semantics of functions (e.g., control flow, etc.) or based on program structure graphs, which omit global sequential information. As a result, these methods fall short in enabling models to learn the complete semantic of function. In this paper, we introduce FlowEmbed, a novel approach that synergistically integrates control flow and global semantic learning to facilitate exhaustive code comprehension. Initially, FlowEmbed harnesses a distinct relational control flow graph combined with the power of BERT and RGCN models to aptly capture the nuances of control flow semantics. Moreover, by deploying the DPCNN model on a byte sequence constructed from function machine code, FlowEmbed adeptly discerns the inherent global sequential semantics of binary functions. Through rigorous evaluations spanning three IoT-related tasks, FlowEmbed's efficacy becomes evident, showcasing notable improvements: a 20.6% improvement in compilation optimization option identification, a 1.8% improvement in binary function similarity analysis, and an 11.9% improvement in homologous function search. Collectively, these results underscore FlowEmbed's superior capability, positioning it as a invaluable asset in a binary analysis application. © 2023 IEEE.

关键词： binary code search binary code similarity detection binary function embedding deep learning static analysis

来源：评论

学校读者我要写书评

暂无评论

GraphMoCo: A graph momentum contrast model for large-scale binary function representation learning

引用

NEUROCOMPUTING 2024年 575卷

作者： Sun, Runjin Guo, Shize Guo, Jinhong Wei, Li Zhang, Xingyu Xi, Guo Pan, Zhisong Army Engn Univ PLA Nanjing 210007 Peoples R China Natl Comp Network & Informat Secur Management Ctr Beijing 100029 Peoples R China Shanghai Jiao Tong Univ Shanghai 200240 Peoples R China Acad Mil Sci Beijing 100091 Peoples R China Univ Sci & Technol Beijing Beijing 100083 Peoples R China

In the field of cybersecurity, the ability to compute similarity scores at the function level for binary code is of utmost importance. Considering that a single binary file may contain an extensive amount of functions, an effective learning framework must exhibit both high accuracy and efficiency when handling substantial volumes of data. Nonetheless, conventional methods encounter several limitations. Firstly, accurately annotating different pairs of functions with appropriate labels poses a significant challenge, thereby making it difficult to employ supervised learning methods without risk of overtraining. Secondly, while SOTA models often rely on pretrained encoders or fine-grained graph comparison techniques, these approaches suffer from drawbacks related to time and memory consumption. Thirdly, the momentum update algorithm utilized in graph -based contrastive learning models can result in information leakage. Surprisingly, none of the existing articles address this issue. This research focuses on addressing the challenges associated with large-scale binary code similarity detection (BCSD). To overcome the aforementioned problems, we propose GraphMoCo: a graph momentum contrast model that leverages multimodal structural information for efficient binary function representation learning on a large scale. We adopt an unsupervised learning strategy. Our approach eliminates the need for manual labeling. By leveraging the intrinsic structural information at multiple levels of the binary code, our model could achieve higher accuracy with a simple CNN -based model. By introducing the preshuffle mechanism, the issue of information leakage in graph momentum update algorithm is mitigated. The evaluation results indicate that GraphMoCo exhibits superior performance compared to SOTA approaches in the function pair search task, showing an average improvement of 7% on AUC, and 10% on MRR and Recall@1. Furthermore, GraphMoCo achieves a MAP of 0.93 on the more challenging data

关键词： binary code similarity detection Contrastive learning Embedding Cyberspace security Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detection

引用

COMPUTERS & SECURITY 2024年 145卷

作者： Jiang, Xunzhi Wang, Shen Gong, Yuxin Yu, Tingyue Liu, Li Yu, Xiangzhan Harbin Inst Technol Sch Cyberspace Sci Harbin 150000 Heilongjiang Peoples R China

binary vulnerability detection is a significant area of research in computer security. The existing methods for detecting binary vulnerabilities primarily rely on binary code similarity analysis, detecting vulnerabilities by comparing the similarities embedded in binary codes. Recently, Transformer-based models have achieved significant progress in this field, leveraging their advantage in handling sequential data to better understand the semantics of assembly code. However, to prevent the out-of-vocabulary (OOV) problems, assembly code typically needs to be normalized, which would lose some important numerical and jump information. In this paper, we propose HAformer, a Transformer-based model, which semantically fuses hexadecimal machine codes and assembly codes to extract richer semantic information from binary codes. By incorporating the hexadecimal machine code and a newly designed assembly code normalization method, HAformer can alleviate the problem of numerical information loss caused by traditional assembly code normalization, thereby addressing the issue of OOV. Evaluation results demonstrate that our HAformer outperforms the baseline method in the Recall@1 metric by 16.9%, 25.5% and 19.2% in cross-optimization level, cross-compiler and cross-architecture environments, respectively. In real-world vulnerability detection experiments, HAformer exhibits the highest accuracy.

关键词： Vulnerability detection binary similarity analysis binary code similarity detection Transformer Function semantic

来源：评论

学校读者我要写书评

暂无评论

Security Attacks and Defenses in Cyber Systems: From an AI Perspective

Security Attacks and Defenses in Cyber Systems: From an AI P...

引用

作者： Luo, Zhengping University of South Florida

学位级别：Ph.D., Doctor of Philosophy

Security of real-world cyber systems has drawn a lot of attention in recent years, especially when machine learning techniques are widely deployed into different layers of cyber systems. With the technology of machine learning, especially adversarial machine learning techniques, the attacks and defenses in cyber systems have shown a lot of new characteristics. In this dissertation, two major works regarding the attacks and defenses in real world cyber systems including dynamic spectrum sensing systems and High Performance Computing (HPC) systems and software systems are discussed. In the first work, we revisit this security vulnerability of cooperative spectrum sensing as an adversarial machine learning problem and propose a novel learning-empowered framework named Learning-Evaluation-Beating (LEB) to mislead fusion centers. Given the gap between the new LEB attack and existing defenses, we introduced a non-invasive and parallel method named influence-limiting defense sided with existing defenses to defend against LEB-based or other similar attacks. In the second work, we offer a novel perspective, treating the anomaly detection in HPC systems based on log files as a sequential decision process, and further applying reinforcement learning techniques to detect anomalies or malicious users. Start from there, we also provide a binary code similarity detection-based method that can be applied to a more general scenario in software systems through utilizing Recurrent Neural Network (RNN) and Siamese Neural Network to detect malwares from the binaries generated by the processor that executing the program.

关键词： Adversarial Machine Learning Attacks and Defenses binary code similarity detection Cognitive Radio Networks Cybersecurity High Performance Computing

来源：评论

学校读者我要写书评

暂无评论

FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs

引用

ACM Transactions on Software Engineering and Methodology 1000年

作者： Xiuwei Shang Guoqiang Chen Shaoyin Cheng Shikai Guo Yanming Zhang Weiming Zhang Nenghai Yu University of Science and Technology of China Hefei China QI-ANXIN Technology Research Institute Beijing China University of Science and Technology of China Anhui Province Key Laboratory of Digital Security Hefei China Dalian Maritime University The Dalian Key Laboratory of Artificial Intelligence Dalian China

Analyzing the behavior of cryptographic functions in stripped binaries is a challenging but essential task, which is crucial in software security fields such as malware analysis and legacy code inspection. However, the inherent high logical complexity of cryptographic algorithms makes their analysis more difficult than that of ordinary code, and the general absence of symbolic information in binaries exacerbates this challenge. Existing methods for cryptographic algorithm identification frequently rely on data or structural pattern matching, which limits their generality and effectiveness while requiring substantial manual effort. In response to these challenges, we present FoC (Figure out the Cryptographic functions), a novel framework that leverages large language models (LLMs) to identify and analyze cryptographic functions in stripped *** FoC, we first build an LLM-based generative model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language form, which is intuitively readable to analysts. Subsequently, based on the semantic insights provided by FoC-BinLLM, we further develop a binary code similarity detection model (FoC-Sim), which allows analysts to effectively retrieve similar implementations of unknown cryptographic functions from a library of known cryptographic functions. The predictions of generative model like FoC-BinLLM are inherently difficult to reflect minor alterations in binary code, such as those introduced by vulnerability patches. In contrast, the change-sensitive representations generated by FoC-Sim compensate for the shortcomings to some extent. To support the development and evaluation of these models, and to facilitate further research in this domain, we also construct a comprehensive cryptographic binary dataset and introduce an automatic method to create semantic labels for extensive binary functions. Our evaluation results are promising. FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score,

关键词： binary code Summarization Cryptographic Algorithm Identification binary code similarity detection Large Language Models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：