Automatic malware analysis is an essential part of today's computer security practices. Nearly one million malware samples were delivered to the analysts on a daily basis on year 2014 alone while the number of sam...
详细信息
Automatic malware analysis is an essential part of today's computer security practices. Nearly one million malware samples were delivered to the analysts on a daily basis on year 2014 alone while the number of samples submitted for analysis increases almost exponentially each year. Given the size of the threat we are facing today and the amount of malicious codes emerging every day, the ability to automatically analyze unknown and unwanted software is critically important more than ever. On the other hand, malware writers adapt their malicious codes to new security measurements to protect them from being exposed and detected. This is usually achieved by employing obfuscation techniques that complicate the reverse engineering and analysis of the code by adding lots of unnecessary and irrelevant computations. Most of the malicious samples found in the wild are obfuscated and equipped with complicated anti-analysis defenses intended to hide the malicious intent of the malware by defeating the analysis and/or increasing the analysis time. Deobfuscation (reversing the obfuscation) requires automatic techniques to extract the original logic embedded in the obfuscated code for further analysis. Presumably the deobfuscated code requires less analysis time and is easier to analyze compared to the obfuscated one. Previous approaches in this regard target specific types of obfuscations by making strong assumptions about the underlying protection scheme leaving opportunities for the adversaries to attack. This work addresses this limitation by proposing new program analysis techniques that are effective against code obfuscations while being generic by minimizing the assumptions about the underlying code. We found that standard program analysis techniques, including well-known data and control flow analyses and/or symbolic execution, suffer from imprecision due to the obfuscation and show how to mitigate this loss of precision. Using more precise program analysis techniques, we
JavaScript is a key component of websites and greatly enhances web page functionality. At the same time, it has become one of the most common attack vectors in malicious web pages. Early approaches to detecting malici...
详细信息
JavaScript is a key component of websites and greatly enhances web page functionality. At the same time, it has become one of the most common attack vectors in malicious web pages. Early approaches to detecting malicious scripts relied heavily on manual feature engineering by security experts, with limited feature representation capabilities. With the advancements in deep learning technologies, deep learning networks have shown the ability to automatically learn strong feature representations from malicious JavaScript. Presently, mainstream detection methods usually extract the Abstract Syntax Tree (AST) from JavaScript code, which captures the code's semantic information. The information about AST nodes is then processed into a sequence using depth-first traversal and fed into deep learning models. However, for large JavaScript library files and obfuscated JavaScript code, the computational power and hardware constraints pose challenges in feeding complete information into the model. Only apart of the sequence is sampled for training and detection, significantly diminishing the model's detection capability. To address this, this paper proposes an innovative method for malicious JavaScript detection based on sequence compression. The approach extracts input sequences comprised solely of AST node type information and employs a compression algorithm to reduce their length further. Technically, we first extract the information of the type field in each node in the AST in the order of depth-first traversal to generate the sequence, and then effectively compress the sequence using Byte Pair Encoding. Finally, the compressed sequence is fed into the deep learning model for detection. On publicly available datasets, when employing the same deep learning model for classification, our proposed method outperforms existing other approaches, achieving a precision of 98.96% and a recall of 96.37%.
The detection and analysis of malware binaries pose significant challenges due to their obfuscated and packed nature, rendering traditional static analysis techniques ineffective. Extracting static features in a dynam...
详细信息
The detection and analysis of malware binaries pose significant challenges due to their obfuscated and packed nature, rendering traditional static analysis techniques ineffective. Extracting static features in a dynamic environment where malware exhibits its actual behavior becomes crucial to detecting malware accurately. This article addresses this challenge by analyzing static features extracted from real-time Windows, Android, and IoT applications within a dynamic environment. To tackle this problem, we propose an Advanced Ensemble Framework (AEF) that combines embedded feature selection and an advanced stacking ensemble technique. The embedded feature selection approach effectively reduces the number of highly correlated features by over 70%, employing a combination of filter and wrapper methods. Furthermore, the advanced stacking ensemble approach combines two-level learners: a base learner with state-of-the-art classifiers adept at handling raw features and meta-learner trains using transfer features and probabilities obtained from the previous base classifiers. A 5-fold cross-training scheme based on cross-validation is introduced to prevent overfitting during the training. It also helps to reduce overfitting by training the model on multiple subsets of the data. The model learns patterns from different parts of the dataset, which can lead to a more generalized model. Preprocessed datasets from the Canadian Institute of Cybersecurity comprising obfuscated Windows malware, Android malware, and IoT malicious attacks are used to evaluate AEF. Additionally, to further assess the efficiency, compatibility, and robustness of AEF, we utilized an additional dataset of obfuscated Windows malware that includes memory dump images. Extensive experiments are conducted to evaluate the proposed defender using publicly available real-time datasets. The results show that AEF effectively counters obfuscation techniques, offering a flexible, practical, and efficient solution fo
Protecting data and applications from malware and other forms of malicious code has assumed a great relevance in the current era of pervasive web-based applications. Attackers often use code obfuscation to hide harmfu...
详细信息
Protecting data and applications from malware and other forms of malicious code has assumed a great relevance in the current era of pervasive web-based applications. Attackers often use code obfuscation to hide harmful programs from automatic detection. Several researchers have proposed methods to classify an unknown program as malicious or benign;however, little work has been done to identify obfuscated code. A promising approach to detect obfuscated code consists of using a set of metrics, collected by static analysis, to classify a program. In this paper we present an empirical evaluation of three text-based metrics to identify obfuscated code. Our experiment shows that the effectiveness of these metrics depends on the obfuscators: there are cases in which the metrics allow the proliferation of false positives (i.e., misclassification of clear code as obfuscated code), which is bothering but not dangerous, and cases where false negatives (i.e. misclassification of obfuscated as clear code) proliferate, which is definitely more dangerous. Based on our experiment, we propose a combination of these three metrics and show how this combination outperforms the individual metrics.
Malware attacks necessitate extensive forensic analysis efforts that are manual-labor intensive because of the analysis-resistance techniques that malware authors employ. The most prevalent of these techniques are cod...
详细信息
ISBN:
(纸本)9783642155116
Malware attacks necessitate extensive forensic analysis efforts that are manual-labor intensive because of the analysis-resistance techniques that malware authors employ. The most prevalent of these techniques are code unpacking, code overwriting, and control transfer obfuscations. We simplify the analyst's task by analyzing the code prior to its execution and by providing the ability to selectively monitor its execution. We achieve pre-execution analysis by combining static and dynamic techniques to construct control- and data-flow analyses. These analyses form the interface by which the analyst instruments the code. This interface simplifies the instrumentation task, allowing us to reduce the number of instrumented program locations by a hundred-fold relative to existing instrumentation-based methods of identifying unpacked code. We implement our techniques in SD-Dyninst and apply them to a large corpus of malware, performing analysis tasks such as code coverage tests and call-stack traversals that are greatly simplified by hybrid analysis.
暂无评论