As software is used in various areas today, software security has become a crucial issue. Third-party libraries, which play a major role in software development, pose difficulties in analyzing and testing software sec...
详细信息
As software is used in various areas today, software security has become a crucial issue. Third-party libraries, which play a major role in software development, pose difficulties in analyzing and testing software security. It is essential to know the variables used in software and the datatype information of each variable in order to identify the major weaknesses in the software. However, because the third-party library is generally of the binary code form, the variables, variable datatype, program syntax, and semantic information in the source code are removed. Therefore, reconstructing the variables used and the datatype information of the variables from binary code is the most important step in weak point analysis. Traditionally, this step of reconstructing information is based on pattern matching;however, the inference of datatypes is limited. We herein proposed a method of inferring datatypes using deep learning for variables determined based on pattern matching in binary code, and analyzed its performance. The proposed study has improved the feature generation method to solve the inconsistent problems of the features generated in the previous studies. As a result, the accuracy of prediction of float and double is improved by average 7.2% compared to the previous study, and the result is that the accuracy of 5.1% is increased overall. (C) 2019 Published by Elsevier B.V.
Due to increasing use of third-party libraries because of the increasing complexity of software development, the lack of management of legacy code and the nature of embedded software, the use of third-party libraries ...
详细信息
Due to increasing use of third-party libraries because of the increasing complexity of software development, the lack of management of legacy code and the nature of embedded software, the use of third-party libraries which have no source code is increasing. Without the source code, it is difficult to analyze these libraries for vulnerabilities. Therefore, to analyze weaknesses inherent in binary code, various studies have been conducted to perform static analysis using intermediate code. The conversion from binary code to intermediate language differs depending on the execution environment. In this paper, we propose a deep learning-based analysis method to reconstruct missing datatypes during the compilation process from binary code to intermediate language, and propose a method to generate supervised learning data for deep learning.
暂无评论