In the realm of cybersecurity, the evolution of network attacks necessitates innovative approaches for threat prediction and mitigation. Traditional methods often fall short in identifying novel attack patterns or ada...
详细信息
Network attacks are diversified, rare and Universal generalization. This has made the exploration and construction of network information flow packet threat detection systems, which becomes a hot research topic in pre...
详细信息
Network attacks are diversified, rare and Universal generalization. This has made the exploration and construction of network information flow packet threat detection systems, which becomes a hot research topic in preventing network attacks. So this study establishes a network data threat detection model based on traditional network threat detection systems and deep learning neural networks. And convolutional neural network and data enhancement technology are used to optimize the model and improve rare data recognizing accuracy. The experiment confirms that this detection model has a recognition probability of approximately 11% and 42% for two rare attacks when N=1, respectively. When N=2, their probabilities are 52% and 78%, respectively. When N=3, their recognition probabilities are approximately 85% and 92%, respectively. When N=4, their recognition probabilities are about 58% and 68%, respectively, with N=3 having the best recognition effect. In addition, the recognition efficiency of this model for malicious domain name attacks and normal data remains around 90%, which has significant advantages compared to traditional detection systems. The proposed network data flow threat detection model that integrates Gated Recurrent Neural Network and domain generation algorithm has certain practicality and feasibility.
Attackers usually use a command and control (C2) server to manipulate the communication. In order to perform an attack, threat actors often employ a domain generation algorithm (DGA), which can allow malware to commun...
详细信息
Attackers usually use a command and control (C2) server to manipulate the communication. In order to perform an attack, threat actors often employ a domain generation algorithm (DGA), which can allow malware to communicate with C2 by generating a variety of network locations. Traditional malware control methods, such as blacklisting, are insufficient to handle DGA threats. In this paper, we propose a machine learning framework for identifying and detecting DGA domains to alleviate the threat. We collect real-time threat data from the real-life traffic over a one-year period. We also propose a deep learning model to classify a large number of DGA domains. The proposed machine learning framework consists of a two-level model and a prediction model. In the two-level model, we first classify the DGA domains apart from normal domains and then use the clustering method to identify the algorithms that generate those DGA domains. In the prediction model, a time-series model is constructed to predict incoming domain features based on the hidden Markov model (HMM). Furthermore, we build a deep neural network (DNN) model to enhance the proposed machine learning framework by handling the huge dataset we gradually collected. Our extensive experimental results demonstrate the accuracy of the proposed framework and the DNN model. To be precise, we achieve an accuracy of 95.89% for the classification in the framework and 97.79% in the DNN model, 92.45% for the second-level clustering, and 95.21% for the HMM prediction in the framework.
Malware families often use the domain generation algorithm (DGA) to communicate with the Command and Control (C&C) servers. Although machine learning and deep learning based methods have achieved good accuracy in ...
详细信息
ISBN:
(数字)9783030368029
ISBN:
(纸本)9783030368029;9783030368012
Malware families often use the domain generation algorithm (DGA) to communicate with the Command and Control (C&C) servers. Although machine learning and deep learning based methods have achieved good accuracy in DGA detection task, it has problems on the new DGA families with limited datasets. In this paper, RL-Gen, a Reinforcement Learning (RL) framework, is proposed to improve the performance of character-level text generation with few input samples. RL-Gen has two modules, W-Generator and Evaluator. W-Generator is an improved generation model based on WGAN-GP, which is regarded as an agent, and Evaluator acts as an environment to evaluate the generated text. Especially in DGA case, Evaluator is an effective DGA detection model (ATT-GRU). The parameters' updating of W-Generator is optimized by the reward from Evaluator, which promotes the generating abilities on speed and quality. Experiments show that the generated DGAs are sufficiently close to real DGAs, and can play an alternative role of real DGAs in detection model training. And RL-Gen gets better quality of text generation more quickly and smoothly than WGAN-GP.
Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a domain generation algorithm (DGA) to cycle the network...
详细信息
ISBN:
(纸本)9783030017019;9783030017002
Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a domain generation algorithm (DGA) to cycle the network location in which malware communicates with C2. Network security controls such as blacklisting, implementing a DNS sinkhole, or inserting a firewall rule is a vital asset to an organization's security posture. However, all of them are typically ineffective against a DGA. In this paper, we propose a machine learning framework for identifying and clustering domain names to circumvent threats from a DGA. We collect a real-time threat intelligent feed over a six month period where all domains have threats on the public Internet at the time of collection. We then apply the proposed machine learning framework to study DGA-based malware. The proposed framework contains a two-level model, which consists of classification and clustering is used to first detect DGA domains and then identify the DGA of those domains. Our extensive experimental results demonstrate the accuracy of the proposed framework. To be precise, we achieve accuracies of 95.14% for the first-level classification and 92.45% for the second-level clustering, respectively.
The botnet is a severe threat to computer networks, and the detection of botnet behaviors is an important research area of cyber security. Malware authors leverage the domain generation algorithm (DGA) to generate bul...
详细信息
ISBN:
(纸本)9781665484763
The botnet is a severe threat to computer networks, and the detection of botnet behaviors is an important research area of cyber security. Malware authors leverage the domain generation algorithm (DGA) to generate bulks of pseudo-random domain names to connect to the Command and Control (C&C) server, which makes the detections and preventions extremely difficult. Previous work mostly defended against the DGA domains through pre-registering, sink-holeing or publishing blacklists after reverse engineering the malware. However, these approaches can be easily bypassed by malware authors. For most of the communications between the botnet and the C&C server, the first step is generally sending domain Name System (DNS) request packets. Thus, an alternative approach was based on capturing and analyzing the DNS traffic and classifying the domains. Most of the previous work tried to cluster the domains, and these techniques involved the usage of contextual information. Thus, it takes a long time period to run the algorithms, which means these techniques can not be used in real-time detection. Compared with the traditional methods, recent methods attempt to predict whether the domain is DGA generated based solely on the domain name string. Nevertheless, these methods involved human engineered features that can be readily circumvented by the attackers. In this paper, we proposed a method that extracts the linguistic features as well as applies machine learning algorithms to classify the domain name. To verify the performance of the proposed method, we designed and implemented a botnet detection system, and trained and tested the model with real data. The results demonstrate that the proposed method is able to capture the suspicious packets and accurately classify the domains. We evaluated our system with real traffic, it can correctly classify the DGA domains in 95% of the cases. Furthermore, when detecting unknown DGA domains, our system achieved a 88.5% accuracy.
Distinguishing malicious domain names generated by various domain generation algorithms (DGA) is critical for defending a network against sophisticated network attacks. In recent years, stealthy domaingeneration algo...
详细信息
Distinguishing malicious domain names generated by various domain generation algorithms (DGA) is critical for defending a network against sophisticated network attacks. In recent years, stealthy domain generation algorithms (SDGA) have been proposed and revealed significantly stronger stealthiness comparing to the traditional character-based DGA. Existing state-of-the-art detection schemes are not effective enough for detecting SDGA. In this paper, we exploit the character-level characteristics of the SDGA domain names and propose a heterogeneous deep neural network framework (HDNN) for detecting SDGA. HDNN employs a proposed improved parallel CNN (IPCNN) architecture with multi-sizes of convolution kernel for extracting multi-scale local features from a domain name. The framework also contains a proposed self-attention based bidirectional long short term memory (SA-Bi-LSTM) architecture which can extract the bidirectional global features with attention mechanism from a domain name. Besides that, the focal loss function is introduced to mitigate the imbalance of the sample quantity in the training phase. The benchmark experiments are carried out based on the database composed of the collected benign domain names, real-world DGA and SDGA ones. Compared to the 6 influential deep-learning-based DGA detection schemes, the proposed scheme has achieved state-of-the-art detection results on SDGAs, and also achieved state-of-the-art results on binary and multiclass classification for traditional DGAs.
Nowadays, malware campaigns have reached a high level of sophistication, thanks to the use of cryptography and covert communication channels over traditional protocols and services. In this regard, a typical approach ...
详细信息
Nowadays, malware campaigns have reached a high level of sophistication, thanks to the use of cryptography and covert communication channels over traditional protocols and services. In this regard, a typical approach to evade botnet identification and takedown mechanisms is the use of domain fluxing through the use of domain generation algorithms (DGAs). These algorithms produce an overwhelming amount of domain names that the infected device tries to communicate with to find the Command and Control server, yet only a small fragment of them is actually registered. Due to the high number of domain names, the blacklisting approach is rendered useless. Therefore, the botmaster may pivot the control dynamically and hinder botnet detection mechanisms. To counter this problem, many security mechanisms result in solutions that try to identify domains from a DGA based on the randomness of their name. In this work, we explore hard to detect families of DGAs, as they are constructed to bypass these mechanisms. More precisely, they are based on the use of dictionaries or adversarial approaches so the generated domains seem to be user-generated. Therefore, the corresponding generated domains pass many filters that look for, e.g. high entropy strings or n-grams. To address this challenge, we propose an accurate and efficient probabilistic approach to detect them. We test and validate the proposed solution through extensive experiments with a sound dataset containing all the wordlist-based DGA families that exhibit this behaviour, as well as several adversarial DGAs, and compare it with other state-of-the-art methods, practically showing the efficacy and prevalence of our proposal.
Attackers are known to utilize domain generation algorithms (DGAs) to generate domain names for command and control (C&C) servers and facilitate the distribution of uniform resource locators within malicious softw...
详细信息
Attackers are known to utilize domain generation algorithms (DGAs) to generate domain names for command and control (C&C) servers and facilitate the distribution of uniform resource locators within malicious software. DGAs pose a significant threat to cybersecurity owing to their ability to dynamically generate unpredictable domain names. Extensive research is currently underway to detect the domain names created using DGAs. However, the high false positive rates when handling benign domain names in non-English languages pose a challenge. Thus, this study proposes a DGA detection method that effectively embeds non-English domain names to focus on Chinese domain names, which are referred to as domain names composed of Pinyin. The proposed method segments domain names into meaningful subwords for effective vector representation. Consequently, the FastText model learns the context information of the segmented subwords and embeds the domain name. Further, the deep learning-based detection model learns the vectorized domain names and determines whether a particular domain name is DGA-generated. We labeled the Chinese domain names among the benign domain names for our experiment. The experimental results show that the proposed method outperforms existing methods across all performance metrics on the entire test dataset. Notably, the proposed method minimizes the false positive rate, thereby enhancing detection reliability. In addition, it exhibits high performance, achieving a recall of 0.9873 and 0.9886 for Chinese and English domain names, respectively. This demonstrates that the proposed method consistently delivers high performance across various metrics and languages.
Botnets are one of the major threats to network security nowadays. To carry out malicious actions remotely, they heavily rely on Command and Control channels. DGA-based botnets use a domain generation algorithm to gen...
详细信息
Botnets are one of the major threats to network security nowadays. To carry out malicious actions remotely, they heavily rely on Command and Control channels. DGA-based botnets use a domain generation algorithm to generate a significant number of domain names. By analyzing the linguistic distinctions between legitimate and DGA-based domain names, traditional machine learning schemes obtain great benefits. However, it is difficult to identify the ones based on wordlists or pseudo-random generated. Accordingly, this paper proposes an efficient CNN-LSTM-based detection model (BotDetector) that uses only a set of simple-to-compute, easy-to-compute character features. We evaluate our model with two open-source benchmark datasets (360 netlab, Bambenek) and real DNS traffic from the China Education and Research Network. Experimental results demonstrate that our algorithm improves by 1.6% in terms of accuracy and F1-score and reduces the computation time by 9.4% compared to other state-of-the-art alternatives. Remarkably, our work can identify botnet's covert communication channels that use domain names based on word lists or pseudo-random generation without any help of reverse engineering.
暂无评论