Due to the convenience and popularity of Web applications, they have become a prime target for attackers. As the main programming language for Web applications, many methods have been proposed for detecting malicious ...
详细信息
ISBN:
(纸本)9798350347937
Due to the convenience and popularity of Web applications, they have become a prime target for attackers. As the main programming language for Web applications, many methods have been proposed for detecting malicious javascript, among which static analysis-based methods play an important role because of their high effectiveness and efficiency. However, obfuscation techniques are commonly used in javascript, which makes the features extracted by static analysis contain many useless and disguised features, leading to many false positives and false negatives in detection results. In this paper, we propose a novel method to find out the essential features related to the semantics of javascript code. Specifically, we develop JSRevealer, a robust, effective, scalable, and interpretable detector for malicious javascript. To test the capabilities of JSRevealer, we conduct comparative experiments with four other state-of-the-art malicious javascript detection tools. The experimental results show that JSRevealer has an average F1 of 84.8% on the data obfuscated by different obfuscators, which is 21.6%, 22.3%, 18.7%, and 22.9% higher than the tools CUJO, ZOZZLE, JAST, and JSTAP, respectively. Moreover, the detection results of JSRevealer can be interpreted, which can provide meaningful insights for further security research.
Machine learning is increasingly being applied to malicious javascript detection in response to the growing number of Web attacks and the attendant costly manual identification. In practice, to hide their malicious be...
详细信息
ISBN:
(纸本)9798400702211
Machine learning is increasingly being applied to malicious javascript detection in response to the growing number of Web attacks and the attendant costly manual identification. In practice, to hide their malicious behaviors or protect intellectual copyrights, both malicious and benign scripts tend to obfuscate their own code before uploading. While obfuscation is beneficial, it also introduces some additional code features (e.g., dead code) into the code. When machine learning is employed to learn a malicious javascript detector, these additional features can affect the model to make it less effective. However, there is still a lack of clear understanding of how robust existing machine learning-based detectors are on different obfuscators. In this paper, we conduct the first empirical study to figure out how obfuscation affects machine learning detectors based on static features. Through the results, we observe several findings: 1) obfuscation has a significant impact on the effectiveness of detectors, causing an increase both in false negative rate (FNR) and false positive rate (FPR), and the bias of obfuscation in the training set induces detectors to detect obfuscation rather than malicious behaviors. 2) The common measures such as improving the quality of the training set by adding relevant obfuscated samples and leveraging state-of-the-art deep learning models can not work well. 3) The root cause of obfuscation effects on these detectors is that feature spaces they use can only reflect shallow differences in code, not about the nature of benign and malicious, which can be easily affected by the differences brought by obfuscation. 4) obfuscation has a similar effect on realistic detectors in VirusTotal, indicating that this is a common real-world problem.
This paper is dedicated to the problem of design of the detector for obfuscated javascript code using machine learning technologies. The main challenge was to design models that would be robust against obfuscators tha...
详细信息
This paper is dedicated to the problem of design of the detector for obfuscated javascript code using machine learning technologies. The main challenge was to design models that would be robust against obfuscators that the model got not familiar with during the training process. During the research we were trying to simulate the scenario when the obfuscation detector, trained to detect samples obfuscated by the specific obfuscators, is given samples that were processed by some another obfuscator. The presented approach of the feature engineering and model training allowed to get better accuracy on the previously unseen obfuscators comparing to the reference work. It was shown that treating minified code samples as obfuscated, as well as enriching the set of the lexical and syntactical features could improve detector's quality.
Recently, most of malicious web pages include obfuscated codes in order to circumvent the detection of signature-based detection systems. It is difficult to decide whether the sting is obfuscated because the shape of ...
详细信息
Recently, most of malicious web pages include obfuscated codes in order to circumvent the detection of signature-based detection systems. It is difficult to decide whether the sting is obfuscated because the shape of obfuscated strings are changed continuously. In this paper, we propose a novel methodology that can detect obfuscated strings in the malicious web pages. We extracted three metrics as rules for detecting obfuscated strings by analyzing patterns of normal and malicious javascript codes. They are N-gram, Entropy, and Word Size. N-gram checks how many each byte code is used in strings. Entropy checks distributed of used byte codes. Word size checks whether there is used very long string. Based on the metrics, we implemented a practical tool for our methodology and evaluated it using read malicious web pages. The experiment results showed that our methodology can detect obfuscated strings in web pages effectively.
暂无评论