检索结果-内蒙古大学图书馆

FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs

ACM Transactions on Software Engineering and Methodology 1000年

作者： Xiuwei Shang Guoqiang Chen Shaoyin Cheng Shikai Guo Yanming Zhang Weiming Zhang Nenghai Yu University of Science and Technology of China Hefei China QI-ANXIN Technology Research Institute Beijing China University of Science and Technology of China Anhui Province Key Laboratory of Digital Security Hefei China Dalian Maritime University The Dalian Key Laboratory of Artificial Intelligence Dalian China

Analyzing the behavior of cryptographic functions in stripped binaries is a challenging but essential task, which is crucial in software security fields such as malware analysis and legacy code inspection. However, the inherent high logical complexity of cryptographic algorithms makes their analysis more difficult than that of ordinary code, and the general absence of symbolic information in binaries exacerbates this challenge. Existing methods for cryptographic algorithm identification frequently rely on data or structural pattern matching, which limits their generality and effectiveness while requiring substantial manual effort. In response to these challenges, we present FoC (Figure out the Cryptographic functions), a novel framework that leverages large language models (LLMs) to identify and analyze cryptographic functions in stripped *** FoC, we first build an LLM-based generative model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language form, which is intuitively readable to analysts. Subsequently, based on the semantic insights provided by FoC-BinLLM, we further develop a binary code similarity detection model (FoC-Sim), which allows analysts to effectively retrieve similar implementations of unknown cryptographic functions from a library of known cryptographic functions. The predictions of generative model like FoC-BinLLM are inherently difficult to reflect minor alterations in binary code, such as those introduced by vulnerability patches. In contrast, the change-sensitive representations generated by FoC-Sim compensate for the shortcomings to some extent. To support the development and evaluation of these models, and to facilitate further research in this domain, we also construct a comprehensive cryptographic binary dataset and introduce an automatic method to create semantic labels for extensive binary functions. Our evaluation results are promising. FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score,

关键词： binary code summarization Cryptographic Algorithm Identification binary code Similarity Detection Large Language Models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：