A detection model of Internet of Things encrypted traffic based on edge intelligence is proposed in the paper, which can reduce the communication times of distributed Internet of Things gateways in the process of edge...
详细信息
A detection model of Internet of Things encrypted traffic based on edge intelligence is proposed in the paper, which can reduce the communication times of distributed Internet of Things gateways in the process of edge intelligence as well as the encrypted traffic detection model establishment time, in order to solve the problems that it is difficult to carry out efficient classification and accurate identification of the encrypted traffic of Internet of Things. In this paper, four new classification and identification methods for encrypted traffic are put forward, namely time-sequence behavior analysis, dynamic behavior analysis, key behavior analysis and two-round filtering analysis. The experimental results show that when the sample size is 1600, the encrypted traffic detection model establishment time is less than 100 seconds, and the accuracy of all the four new traffic classification methods is more than 92% and the recall rates of them are more than 83%.
We explore machinelearning for accurately predicting imminent disk failures and hence providing proactive fault tolerance for modern large-scale storage systems. Current disk failure prediction approaches are mostly ...
详细信息
We explore machinelearning for accurately predicting imminent disk failures and hence providing proactive fault tolerance for modern large-scale storage systems. Current disk failure prediction approaches are mostly offline and assume that the disk logs required for training learning models are available a priori. However, disk logs are often continuously generated as an evolving data stream, in which the statistical patterns vary over time (also known as concept drift). Such a challenge motivates the need of online techniques that perform training and prediction on the incoming stream of disk logs in real time, while being adaptive to concept drift. We first measure and demonstrate the existence of concept drift on various disk models in production. Motivated by our study, we design StreamDFP, a general stream mining framework for disk failure prediction with concept-drift adaptation based on three key techniques, namely online labeling, concept-drift-aware training, and general prediction, with a primary objective of supporting various machine learning algorithms. We extend StreamDFP to support online transfer learning for minority disk models with concept-drift adaptation. Our evaluation shows that StreamDFP improves the prediction accuracy significantly compared to without concept-drift adaptation under various settings, and achieves reasonably high stream processing performance.
With the rapid development of network technology, a large amount of information fills the network world, and the performance of the current information extraction model to extract keyword information from a large numb...
详细信息
With the rapid development of network technology, a large amount of information fills the network world, and the performance of the current information extraction model to extract keyword information from a large number of data is insufficient. To solve the problem of insufficient extraction performance in traditional information extraction models, this paper combines text sorting algorithms with document topic generation models. A keyword information extraction model that combines the advantages of the two algorithms is proposed. The performance comparison experiment of this fusion algorithm shows that its accuracy and recall rates are 76.1% and 77.0%, respectively, which outperform the comparing algorithm. In the empirical analysis results of the information extraction model, it is found that the accuracy and precision rates of the proposed information extraction model are 80.16% and 77.54%, respectively, which are better than the comparing model. The proposed model of information extraction is of great importance for the development of the field of information extraction.
Today's Internet of Vehicles (IoV) has soared by leveraging data gathered from transportation systems, yet it grapples with security concerns stemming from network vulnerabilities, exposing it to cyber threats. Th...
详细信息
Today's Internet of Vehicles (IoV) has soared by leveraging data gathered from transportation systems, yet it grapples with security concerns stemming from network vulnerabilities, exposing it to cyber threats. This study proposes an innovative method to anticipate anomalies and exploit IoV services related to road traffic. Using the Unceasement Conditional Random Field Dynamic Bayesian Network Model (U-CRF-DDBN), this approach predicts the impact of network attacks, strategically managing vulnerable nodes and attackers. Through experimentation and comparisons with existing methods, our model demonstrates its effectiveness in mitigating IoV vulnerabilities. The U-CRF-DDBN strikes a superior balance, outperforming other approaches in intrusion detection for Internet of Vehicles systems. Evaluating its performance on the NSL-KDD dataset reveals a promising average Detection Rate of 93.512% and a low False Acceptance Rate of 0.125% for known attacks, highlighting its robustness. However, with unknown attacks, while the Detection Rate remains at 74.157%, there is an increased FAR of 16.47%, resulting in a slightly lower F1-score of 0.822.
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. machine learning algorithms are used to uncover patterns amo...
详细信息
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular machinelearning (ML) algorithms, Decision Tree Induction, Support Vector machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine machinelearning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and *** further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.
Most existing methods for visual domain adaptation need to convert high-order tensors into one-order high-dimensional vectors through naive vectorization operations. However, they not only destroy the internal spatial...
详细信息
Most existing methods for visual domain adaptation need to convert high-order tensors into one-order high-dimensional vectors through naive vectorization operations. However, they not only destroy the internal spatial structure within the original high-order tensors, but also result in exponentially increasing model parameters. To address these problems, we propose a novel method for visual domain adaptation by representing tensorial features in tensor-train subspace in this paper. Specifically, we firstly provide a theoretical deduction by constructing a tensor-train subspace and proving its linearity and left-orthogonality. Secondly, to extract common tensorial features between source and target domains, we formulate the visual domain adaptation problem into an optimization problem that models the aforementioned common tensor-train subspace between two domains, as well as their corresponding projections. Thirdly, we design a tensor-train subspace representation algorithm (TTSR) to solve the multiple variables optimization problem by optimizing its sub-problems iteratively, so as to process high-order tensorial features. Finally, we evaluate the performance of our proposed TTSR algorithm by conducting extensive experiments on three popular public datasets. The experimental results demonstrate that the TTSR algorithm can improve the classification accuracy of unlabeled target domain than that of baseline algorithms.
Physical Unclonable Functions (PUFs) are used for authentication and generation of secure cryptographic keys. However, recent research work has shown that PUFs, in general, are vulnerable to machinelearning modeling ...
详细信息
Physical Unclonable Functions (PUFs) are used for authentication and generation of secure cryptographic keys. However, recent research work has shown that PUFs, in general, are vulnerable to machinelearning modeling attacks. From a subset of Challenge-Response Pairs (CRPs), the remaining CRPs can be effectively predicted using different machine learning algorithms. In this work, Artificial Neural Networks (ANNs) using swarm intelligence-based modeling attacks are used against different silicon-based PUFs to test their resiliency to these attacks. Amongst the swarm intelligence algorithms, the Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Particle Swarm Optimizer (PSO) and the Grey Wolf Optimizer (GWO) are used. The attacks are extensively performed on six different types of PUFs;namely, Configurable Ring Oscillator, Inverter Ring Oscillator, XOR-Inverter Ring Oscillator, Arbiter, Modified XOR-Inverter Ring Oscillator, and Hybrid Delay Based PUF. From the results, it can be concluded that the first four PUFs under study are vulnerable to ANN swarm intelligence-based models, and their responses can be predicted with an average accuracy of 71.1% to 88.3% for the different models. However, for the Hybrid Delay Based PUF and the Modified XOR-Inverter Ring Oscillator PUF, which are especially designed to thwart machinelearning attacks, the prediction accuracy is much lower and in the range of 9.8% to 14.5%.
Reliable water quality prediction can improve environmental flow monitoring and the sustainability of the stream ecosystem. In this study, we compared two machinelearning methods to predict water quality parameters, ...
详细信息
Reliable water quality prediction can improve environmental flow monitoring and the sustainability of the stream ecosystem. In this study, we compared two machinelearning methods to predict water quality parameters, such as total nitrogen (TN), total phosphorus (TP), and turbidity (TUR), for 97 watersheds located in the Southeast Atlantic region of the USA. The modeling framework incorporates multiple climate and watershed variables (characteristics) that often control the water quality indicators in different landscapes. Three techniques, such as stepwise regression (SR), Least Absolute Shrinkage and Selection Operator (LASSO), and genetic algorithm (GA), are implemented to identify appropriate predictors out of 28 climate and catchment-related variables. The selected predictors were then used to develop the Random Forest (RF) and Boosted regression tree (BRT) models for water quality predictions in selected watersheds. The results highlighted that while both algorithms provided reasonable results (based on statistical metrics), the RF algorithm was easier to train and robust to model overfitting. Partial dependence plots highlighted the complex and nonlinear relationships between the individual predictors and the water quality indicators. The thresholds obtained from partial dependence plots showed that the median values of total nitrogen (TN) and total phosphorus (TP) in streams increase significantly when the percentage of urban and agricultural lands is above 40% and 43% of the watershed area, respectively. Furthermore, when soil hydraulic conductivity increases, the reduction in runoff results in decreased Turbidity levels in streams. Therefore, identifying the key watershed characteristics and their critical thresholds can help watershed managers create appropriate regulations for managing and sustaining healthy stream ecosystems. Besides, the forecasting models can improve water quality predictions in ungauged watersheds.
Emerging mobile edge techniques and applications such as Augmented Reality (AR)/Virtual Reality (VR), Internet of Things (IoT), and vehicle networking, result in an explosive growth of power and computing resource con...
详细信息
Emerging mobile edge techniques and applications such as Augmented Reality (AR)/Virtual Reality (VR), Internet of Things (IoT), and vehicle networking, result in an explosive growth of power and computing resource consumptions. In the meantime, the volume of data generated at the edge networks is also increasing rapidly. Under this circumstance, building energy-efficient and privacy-protected communications is imperative for 5G and beyond wireless communication systems. The recent emerging distributed learning methods such as federated learning (FL) perform well in improving resource efficiency while protecting user privacy with low communication overhead. Specifically, FL enables edge devices to learn a shared network model by aggregating local updates while keeping all the training processes on local devices. This paper investigates distributed power allocation for edge users in decentralized wireless networks with aim to maximize energy/spectrum efficiency while preventing privacy leakage based on a FL framework. Due to the dynamics and complexity of wireless networks, we adopt an on-line Actor-Critic (AC) architecture as the local training model, and FL performs cooperation for edge users by sharing the gradients and weightages generated in the Actor network. Moreover, in order to resolve the over-fitting problem caused by data leakages in Non-independent and identically distributed (Non-i.i.d) data environment, we propose a federated augmentation mechanism with Wasserstein Generative Adversarial Networks (WGANs) algorithm for data augmentation. Federated augmentation empowers each device to replenish the data buffer using a generative model of WGANs until accomplishing an i.i.d training dataset, which significantly reduces the communication overhead in distributed learning compared to direct data sample exchange method. Numerical results reveal that the proposed federated learning based cooperation and augmentation (FL-CA) algorithm possesses a good convergence
Recent years have witnessed the very rapid increase in both the volume and sophistication of malware programs. Malware authors invest heavily in technologies and capabilities to streamline the process of building and ...
详细信息
Recent years have witnessed the very rapid increase in both the volume and sophistication of malware programs. Malware authors invest heavily in technologies and capabilities to streamline the process of building and mutating existing malware programs to evade traditional protection. One major challenge currently faced by the antivirus industry is to efficiently process the vast amount of incoming suspicious samples. Since most new malware is a variation of an existing malware family with the same forms of malicious behavior, automatic clustering and classification of malware programs into families have become valuable tools for malware analysts. Such grouping criteria not only allow analysts to prioritize the allocation of their investigation efforts but may also be applied to detect new malware samples based on their association with existing families. In this paper, we address the multi-class malware classification challenge from a scalability perspective. We present the design, development, and evaluation of a novel machinelearning classifier trained on multifaceted content features (e.g., instruction sequences, strings, section information, and other malware features) as well as threat intelligence gathered from external sources (e.g., antivirus output). Our experiments on a dataset of 21,741 malware samples demonstrate the efficacy and precision of the proposed algorithm and also provide insights into the utility of various features.
暂无评论