javascript is a key component of websites and greatly enhances web page functionality. At the same time, it has become one of the most common attack vectors in malicious web pages. Early approaches to detecting malici...
详细信息
javascript is a key component of websites and greatly enhances web page functionality. At the same time, it has become one of the most common attack vectors in malicious web pages. Early approaches to detecting malicious scripts relied heavily on manual feature engineering by security experts, with limited feature representation capabilities. With the advancements in deep learning technologies, deep learning networks have shown the ability to automatically learn strong feature representations from malicious javascript. Presently, mainstream detection methods usually extract the Abstract Syntax Tree (AST) from javascript code, which captures the code's semantic information. The information about AST nodes is then processed into a sequence using depth-first traversal and fed into deep learning models. However, for large javascript library files and obfuscated javascript code, the computational power and hardware constraints pose challenges in feeding complete information into the model. Only apart of the sequence is sampled for training and detection, significantly diminishing the model's detection capability. To address this, this paper proposes an innovative method for malicious javascript detection based on sequence compression. The approach extracts input sequences comprised solely of AST node type information and employs a compression algorithm to reduce their length further. Technically, we first extract the information of the type field in each node in the AST in the order of depth-first traversal to generate the sequence, and then effectively compress the sequence using Byte Pair Encoding. Finally, the compressed sequence is fed into the deep learning model for detection. On publicly available datasets, when employing the same deep learning model for classification, our proposed method outperforms existing other approaches, achieving a precision of 98.96% and a recall of 96.37%.
Web development technology has experienced significant progress. The creation of javascript has highly enriched the interactive ability of the client. However, the attacker uses the dynamic characteristics of the Java...
详细信息
Web development technology has experienced significant progress. The creation of javascript has highly enriched the interactive ability of the client. However, the attacker uses the dynamic characteristics of the javascript language to embed malicious code into web pages to achieve the purpose of smuggling, redirection, and so on. Traditional methods based on static feature detection are therefore difficult to detect malicious code after confusion, and the method based on dynamic analysis is inefficient. To meet these challenges, this paper proposes a static detection model JStrong based on graph neural network. The model first generates an abstract syntax tree from the javascript source code, and then adds data flow and control flow information into the program dependency graph. In addition, we embed the nodes and edges of the graph into the feature vector and fully learn the features of the whole graph through the graph neural network. We take advantage of a real-world dataset collected from the top website and GitHub to evaluate JStrong and compare it to the state-of-the-art method. Experimental results show that JStrong achieves near-perfect classification performance and is superior to the state-of-the-art method.(c) 2022 Elsevier Ltd. All rights reserved.
Due to the convenience and popularity of Web applications, they have become a prime target for attackers. As the main programming language for Web applications, many methods have been proposed for detecting malicious ...
详细信息
ISBN:
(纸本)9798350347937
Due to the convenience and popularity of Web applications, they have become a prime target for attackers. As the main programming language for Web applications, many methods have been proposed for detecting malicious javascript, among which static analysis-based methods play an important role because of their high effectiveness and efficiency. However, obfuscation techniques are commonly used in javascript, which makes the features extracted by static analysis contain many useless and disguised features, leading to many false positives and false negatives in detection results. In this paper, we propose a novel method to find out the essential features related to the semantics of javascript code. Specifically, we develop JSRevealer, a robust, effective, scalable, and interpretable detector for malicious javascript. To test the capabilities of JSRevealer, we conduct comparative experiments with four other state-of-the-art malicious javascript detection tools. The experimental results show that JSRevealer has an average F1 of 84.8% on the data obfuscated by different obfuscators, which is 21.6%, 22.3%, 18.7%, and 22.9% higher than the tools CUJO, ZOZZLE, JAST, and JSTAP, respectively. Moreover, the detection results of JSRevealer can be interpreted, which can provide meaningful insights for further security research.
Websites on the Internet are becoming increasingly vulnerable to malicious javascript code because of its strong impact and dramatic effect. Numerous recent cyberattacks use javascript vulnerabilities, and in some cas...
详细信息
Websites on the Internet are becoming increasingly vulnerable to malicious javascript code because of its strong impact and dramatic effect. Numerous recent cyberattacks use javascript vulnerabilities, and in some cases employ obfuscation to conceal their malice and elude detection. To secure Internet users, an adequate intrusion-detection system (IDS) for malicious javascript must be developed. This paper proposes an automatic IDS of obfuscated javascript that employs several features and machine-learning techniques that effectively distinguish malicious and benign javascript codes. We also present a new set of features, which can detect obfuscation in javascript. The features are selected based on identifying obfuscation, a popular method to bypass conventional malware detection systems. The performance of the suggested approach has been tested on javascript obfuscation attacks. The studies have shown that IDS based on selected features has a detection rate of 94% for malicious samples and 81% for benign samples within the dimension of the feature vector of 60.
In order to be able to detect new malicious javascript with low cost, methods with machine learning techniques have been proposed and gave positive results. These methods focus on achieving a light-weight filtering mo...
详细信息
In order to be able to detect new malicious javascript with low cost, methods with machine learning techniques have been proposed and gave positive results. These methods focus on achieving a light-weight filtering model that can quickly and precisely filter out malicious data for dynamic analysis. A method constructs a language model using Natural Language Processing techniques to represent the data in vector form from the source code for machine learning. This method has high score with the balanced dataset, however the experiment with an imbalanced dataset has not been done. Previous studies mainly focus on a balanced dataset, however the dataset is not representative of real-world data, and it rises questions in practical uses of the model. A good model that can have a high recall score with imbalanced dataset is needed for a good filter. To construct an efficient language model, and to deal with the data imbalance problem, we focus on oversampling techniques. In our research, our method is the first to use oversampling and machine learning to detect malicious javascript. The experimental result shows that our method can detect new malicious javascript more accurately and efficiently. Our model can quickly filter out malicious data for dynamic analysis. The best recall score achieves 0.72 with the Doc2Vec model. Our proposed method is shown to outperform the baseline method by 210% in terms of recal score with the same training time and test time per sample. (C) 2021 Elsevier B.V. All rights reserved.
javascript (JS) is a dominant programming language in web/mobile development, while it is also notoriously abused by attackers due to its powerful characteristics, e.g., dynamic, prototype-based and multi-paradigm, wh...
详细信息
ISBN:
(纸本)9781728119854
javascript (JS) is a dominant programming language in web/mobile development, while it is also notoriously abused by attackers due to its powerful characteristics, e.g., dynamic, prototype-based and multi-paradigm, which foil most static and dynamic analysis approaches. To detect malicious JS instances, several machine learning-based methods have been developed recently. However, these methods took JS as a natural language instead of a programming one, which can not capture its syntactic and semantic features. In this paper, we present JSAC, a novel framework to detect JS malware. It combines deep learning and program analysis techniques to capture the syntactic and semantic features of JS programs. Specifically, to get a JS program's syntactic information, we build its abstract syntax tree and employ a tree-based convolutional neural network (CNN) to extract features from it. To get its semantic information, we construct its control flow graph and feed it to another graph-based CNN. Last, the features extracted from two CNNs are fused for final detection. Evaluation on a corpus of 69,523 JS files indicates that JSAC outperforms 4 other models with 98.73% F1-score in detecting JS malware.
In the malware field, learning-based systems have become popular to detect new malicious variants. Nevertheless, attackers with specific and internal knowledge of a target system may be able to produce input samples w...
详细信息
ISBN:
(纸本)9781450367479
In the malware field, learning-based systems have become popular to detect new malicious variants. Nevertheless, attackers with specific and internal knowledge of a target system may be able to produce input samples which are misclassified. In practice, the assumption of strong attackers is not realistic as it implies access to insider information. We instead propose HIDENOSEEK, a novel and generic camouflage attack, which evades the entire class of detectors based on syntactic features, without needing any information about the system it is trying to evade. Our attack consists of changing the constructs of malicious javascript samples to reproduce a benign syntax. For this purpose, we automatically rewrite the Abstract Syntax Trees (ASTs) of malicious javascript inputs into existing benign ones. In particular, HIDENOSEEK uses malicious seeds and searches for isomorphic subgraphs between the seeds and traditional benign scripts. Specifically, it replaces benign sub-ASTs by their malicious equivalents (same syntactic structure) and adjusts the benign data dependencies-without changing the AST-, so that the malicious semantics is kept. In practice, we leveraged 23 malicious seeds to generate 91,020 malicious scripts, which perfectly reproduce ASTs of Alexa top 10,000 web pages. Also, we can produce on average 14 different malicious samples with the same AST as each Alexa top 10. Overall, a standard trained classifier has 99.98% false negatives with HIDENOSEEK inputs, while a classifier trained on such samples has over 88.74% false positives, rendering the targeted static detectors unreliable.
Given the success of the Web platform, attackers have abused its main programming language, namely javascript, to mount different types of attacks on their victims. Due to the large volume of such malicious scripts, d...
详细信息
ISBN:
(纸本)9781450376280
Given the success of the Web platform, attackers have abused its main programming language, namely javascript, to mount different types of attacks on their victims. Due to the large volume of such malicious scripts, detection systems rely on static analyses to quickly process the vast majority of samples. These static approaches are not infallible though and lead to misclassifications. Also, they lack semantic information to go beyond purely syntactic approaches. In this paper, we propose JSTAP, a modular static javascript detection system, which extends the detection capability of existing lexical and AST-based pipelines by also leveraging control and data flow information. Our detector is composed of ten modules, including five different ways of abstracting code, with differing levels of context and semantic information, and two ways of extracting features. Based on the frequency of these specific patterns, we train a random forest classifier for each module. In practice, JSTAP outperforms existing systems, which we reimple-mented and tested on our dataset totaling over 270,000 samples. To improve the detection, we also combine the predictions of several modules. A first layer of unanimous voting classifies 93% of our dataset with an accuracy of 99.73%, while a second layer-based on an alternative modules' combination-labels another 6.5% of our initial dataset with an accuracy over 99%. This way, JSTAP can be used as a precise pre-filter, meaning that it would only need to forward less than 1% of samples to additional analyses. For reproducibility and direct deployability of our modules, we make our system publicly available.(1)
malicious scripts used in web-based attacks have recently been reported as one of the top internet security threats. However, anti-malware solutions develop and integrate various techniques to defend against malicious...
详细信息
ISBN:
(纸本)9781450352574
malicious scripts used in web-based attacks have recently been reported as one of the top internet security threats. However, anti-malware solutions develop and integrate various techniques to defend against malicious scripts, attackers have been increasingly applying different counter techniques to hide their malicious intents and evade detection. One of the most popular techniques used is code obfuscation. In this research, an enhanced system is proposed to automate the process of deobfuscating malicious javascript code. The proposed system was tested on real-world malicious javascript samples. Based on the analysis results, the cause of popularity of certain obfuscation techniques is identified. In addition, a set of improvements to the currently used malware detection techniques is proposed.
The traditional static detection method for malicious javascript detection has high efficiency without the need of code executing, but it cannot detect new malicious script. While the dynamic method usually needs to e...
详细信息
The traditional static detection method for malicious javascript detection has high efficiency without the need of code executing, but it cannot detect new malicious script. While the dynamic method usually needs to execute code and extract features, which lead to low efficiency and highly difficulty. In this paper, we propose a half-dynamic detection method for classification, which can solve the problem of obfuscated malicious javascript. The proposed method starts with obtaining the intermediate-state machine code using the javascript interpreter to compile the javascript. After extracting the function calling sequence of machine code, the feature model of the sequence is built using N-gram. Then we use k-NN classifier for training and detecting the malicious script. N-gram can directly be used to statically analyze the sequence of the obfuscated javascript, but not available to recognize the maliciousness. Then N-gram on the call function sequence of the compiled machine code is proposed as an efficient half-dynamic malicious script detection method. Finally, the efficiency and effectiveness of the proposed method is demonstrated through the experiments.
暂无评论