Distributed Denial of Service attack has been a huge threat to the Internet and may carry extreme losses to systems, companies, and national security. The invader can disseminate Distributed denial of service (DDoS) a...
详细信息
Distributed Denial of Service attack has been a huge threat to the Internet and may carry extreme losses to systems, companies, and national security. The invader can disseminate Distributed denial of service (DDoS) attacks easily, and it ends up being significantly harder to recognize and forestall DDoS attacks. In recent years, many IT-based companies are attacked by DDoS attacks. In this view, the primary concern of this work is to detect and prevent DDoS attacks. To fulfill the objective, various data mining techniques such that Jrip, J48, and k-NN have been employed for DDoS attacks detection. These algorithms are implemented and thoroughly evaluated individually to validate their performance in this domain. The presented work has been evaluated using the latest dataset CICIDS2017. The dataset characterizes different DDoS attacks viz. brute force SSH, brute force FTP, Heartbleed, infiltration, botnet TCP, UDP, and HTTP with port scan attack. Further, the prevention method takes place in progress to block the malicious nodes participates in any of the said attacks. The proposed DDoS prevention works in a proactive mode to defend all these attack types and gets evaluated concerning various parameters such as Throughput, PDR, End-to-End Delay, and NRL. This study claimed that the proposed technique outperforms with respect to the AODV routing algorithm.
Big data has become part of the life for many people. The data about people's life are being continously collected, analysized and applied as our society progresses into the big data era. Behind the scene, the com...
详细信息
Big data has become part of the life for many people. The data about people's life are being continously collected, analysized and applied as our society progresses into the big data era. Behind the scene, the computer server clusters need to process hundres of millions pieces of data every day. It is very important to choose the right big data processing platform and algorithm to deal with different kinds of datasets. Therefore, in order to be fully familiar with the related work of driving big data processing, it is necessary to master the classification algorithm of data. It aims to help us carry out a classification model or operation analysis of classification function by screening and classifying the current data in data mining. In addition, the given data can be mapped to the specified category area, and the development trend of future data can be predicted through classification models. So this kind of algorithm helps to reduce the difficulty of work operation and improve people's work efficiency. This paper optimizes the classical classification algorithm-KNN, and designs a new normalized algorithm called PEWM_G KNN. From the perspective of distance measurement, we use Pearson correlation coefficient to replace the traditional Euclidean Metric, then we further refine the study for attribute values of datasets and introduce the entropy weight method, combined with Pearson's measure to optimize the distance calculation equation. After the K value is fixed, we added Gaussian Function to carry out the selection of classification. In this study, we compared the effects of every step, and tested datasets with different data types and sizes, in order to test the performance of the algorithm under different scenarios. The datasets we used include Iris, Breast Cancer, Dry Bean and HTRU2 (All datasets are from The University of California, Irvine). Finally, we further analyze the performance of different system configuration parameters on the prediction rate and time
exponential growth of the use of connected objects, in particular of smartphones, is the consequence of the digitization of services. All types of applications, from the least critical to the most critical are availab...
详细信息
ISBN:
(纸本)9791188428090
exponential growth of the use of connected objects, in particular of smartphones, is the consequence of the digitization of services. All types of applications, from the least critical to the most critical are available on mobile devices through mobile applications. The daily penetration of mobile applications in widely used devices brings certain threats. We find in the software repositories malwares and good application at the same time, which is a major cybersecurity problem. To resolve this problem, machine learning approaches have been proposed in the literature for the detection of malware in general and Android malicious applications in particular. Obfuscation techniques are used by developers to hide malicious applications that implies the need to update Android malware detection models. But many approaches in the literature are more focus on data than features. Hence our contribution is an incremental learning approach capable of detecting Android malware. We propose through UFILA approach an updating of features for the detection and classification of android malware by adding new features. We evaluated 13 classification algorithms and chose four most efficient algorithms to implement our approach. The results obtained by our approach surpass several malware detection approaches in the literature. The values of the metrics obtained respectively by the Accuracy, the precision, the recall and an F1-Score are 99%, 99%,98.6 %98%.
Since label noise can hurt the performance of supervised learning (SL), how to train a good classifier to deal with label noise is an emerging and meaningful topic in machine learning field. Although many related meth...
详细信息
Since label noise can hurt the performance of supervised learning (SL), how to train a good classifier to deal with label noise is an emerging and meaningful topic in machine learning field. Although many related methods have been proposed and achieved promising performance, they have the following drawbacks: (1) They can lead to data waste and even performance degradation if the mislabeled instances are removed;and (2) the negative effect of the extremely mislabeled instances cannot be completely eliminated. To address these problems, we propose a novel method based on the capped l(1) norm and a graph-based regularizer to deal with label noise. In the proposed algorithm, we utilize the capped l(1) norm instead of the l(1) norm. The used norm can inherit the advantage of the l(1) norm, which is robust to label noise to some extent. Moreover, the capped l(1) norm can adaptively find extremely mislabeled instances and eliminate the corresponding negative influence. Additionally, the proposed algorithm makes full use of the mislabeled instances under the graph-based framework. It can avoid wasting collected instance information. The solution of our algorithm can be achieved through an iterative optimization approach. We report the experimental results on several UCI datasets that include both binary and multi-class problems. The results verified the effectiveness of the proposed algorithm in comparison to existing state-of-the-art classification methods.
Multivariate control charts, including Hotelling's T-2 chart, have been widely adopted for the multivariate processes found in many modern systems. However, traditional multivariate control charts assume that the ...
详细信息
Multivariate control charts, including Hotelling's T-2 chart, have been widely adopted for the multivariate processes found in many modern systems. However, traditional multivariate control charts assume that the in-control group is the only population that can be used to determine a decision boundary. However, this assumption has restricted the development of more efficient control chart techniques that can capitalise on available out-of-control information. In the present study, we propose a control chart that improves the sensitivity (i.e., detection accuracy) of a Hotelling's T-2 control chart by combining it with classification algorithms, while maintaining low false alarm rates. To the best of our knowledge, this is the first attempt to combine classification algorithms and control charts. Simulations and real case studies demonstrate the effectiveness and applicability of the proposed control chart.
Purpose In this paper, we define the concept of user spectrum and adopt it to classify Ethereum users based on their behavior. Design/methodology/approach Given a time period, our approach associates each user with a ...
详细信息
Purpose In this paper, we define the concept of user spectrum and adopt it to classify Ethereum users based on their behavior. Design/methodology/approach Given a time period, our approach associates each user with a spectrum showing the trend of some behavioral features obtained from a social network-based representation of Ethereum. Each class of users has its own spectrum, obtained by averaging the spectra of its users. In order to evaluate the similarity between the spectrum of a class and the one of a user, we propose a tailored similarity measure obtained by adapting to this context some general measures provided in the past. Finally, we test our approach on a dataset of Ethereum transactions. Findings We define a social network-based model to represent Ethereum. We also define a spectrum for a user and a class of users (i.e., token contract, exchange, bancor and uniswap), consisting of suitable multivariate time series. Furthermore, we propose an approach to classify new users. The core of this approach is a metric capable of measuring the similarity degree between the spectrum of a user and the one of a class of users. This metric is obtained by adapting the Eros distance (i.e., Extended Frobenius Norm) to this scenario. Originality/value This paper introduces the concept of spectrum of a user and a class of users, which is new for blockchains. Differently from past models, which represented user behavior by means of univariate time series, the user spectrum here proposed exploits multivariate time series. Moreover, this paper shows that the original Eros distance does not return satisfactory results when applied to user and class spectra, and proposes a modified version of it, tailored to the reference scenario, which reaches a very high accuracy. Finally, it adopts spectra and the modified Eros distance to classify Ethereum users based on their past behavior. Currently, no multi-class automatic classification approach tailored to Ethereum exists yet, albei
Randomized methods are practical and efficient for training the connectionist models. In this paper, we contribute to develop a self-stacking random weight neural network. Two different methods of feature fusion are p...
详细信息
Randomized methods are practical and efficient for training the connectionist models. In this paper, we contribute to develop a self-stacking random weight neural network. Two different methods of feature fusion are proposed in this paper. The first one inter-connects the coarse and high level features to make the classification decision more diverse by using the proposed hierarchical network architecture with dense connectivity. On the other hand, the different decisions all-throughout the network are incorporated by a novel non-linear ensemble learning in an end-to-end manner. Through experiments, we verified the effectiveness of random features fusion, and even if each hierarchical branch in the network has very unfavorable accuracy, the proposed ensemble learning presents the impressive performance to boost the classification results. Moreover, the proposed connectionist model is applied to address one practice engineering problem of gearbox fault diagnosis, and the simulation demonstrates that our method has better robust to the noise in vibration signal of working gearbox.
The planning and execution of a business strategy are important aspects of the strategic human resource management of a company. In previous studies, machine learning algorithms were used to determine the main factors...
详细信息
The planning and execution of a business strategy are important aspects of the strategic human resource management of a company. In previous studies, machine learning algorithms were used to determine the main factors correlating employees with company performance. In this study, we introduced a method based on machine-learning algorithms for the classification of company revenue. Both annual and integrated datasets were examined to evaluate the classification performance of the framework under both binary and multiclass conditions. The performance of the proposed method was validated using six evaluation metrics: accuracy, precision, recall, F1-score, receiver operating characteristic curve, and area under the curve. As the experimental results indicate, the XGBoost classifier displayed the best classification performance among the three algorithms (XGBoost classifier, stochastic gradient descent classifier, and logistic regression) used in this study. Moreover, we confirmed the important features of the trained XGBoost model in accordance with variables focusing on human resource management studies. These results demonstrate that the proposed framework has strength in terms of both classification and practical implementation. This study provides novel insights into the relationship between employees and the revenue levels of their employer.
This paper proposes a classification algorithm utilizing an open set recognition concept to conservatively detect lane change intention of surrounding vehicles. Conservatively predicting the lane change intention of t...
详细信息
This paper proposes a classification algorithm utilizing an open set recognition concept to conservatively detect lane change intention of surrounding vehicles. Conservatively predicting the lane change intention of the surrounding vehicles is needed to improve adaptive cruise control (ACC) performance and avoid possible accidents. However, existing machine learning can make incorrect decisions due to information not included in the training data set or confused data even with probability. To cope with this problem, we present a classification algorithm using a multi-class support vector machine applying an open set recognition concept to detect the surrounding vehicles' lane change intentions. Feature vectors are constructed from lateral information obtained by a Kalman filter using only radar and in-vehicle sensors. The open set recognition concept is adapted using Meta-Recognition based on binary classifiers scores. Furthermore, we analyze lateral information where an object vehicle changes lanes. From experimental results, we observe that the proposed system conservatively deals with wrong decisions and detects and cancels detecting the closest in-path vehicle (CIPV) earlier with average times of 1.4 sec and 0.4 sec compared with a commercial radar system, respectively.
classification is a vital task in machine learning. By learning patterns of samples of known categories, the model can develop the ability to distinguish the categories of samples of unknown categories. Noticing the a...
详细信息
classification is a vital task in machine learning. By learning patterns of samples of known categories, the model can develop the ability to distinguish the categories of samples of unknown categories. Noticing the advantages of the clustering method in cluster structure analysis, we combine the clustering and classification methods to develop the novel cluster-based intelligence ensemble learning (CIEL) method. We use the clustering method to analyze the inherent distribution of the data and divide all the samples into clusters according to the characteristics of the dataset. Then, for each specific cluster, we use differ-ent classification algorithms to establish the corresponding classification model. Finally, we integrate the prediction results of each base classifier to form the final prediction result. In view of the problem of parameter sensitivity, we use a swarm intelligence algorithm to optimize the key parameters involved in the clustering, classification, and ensemble stages in order to boost the classification performance. To assess the effectiveness of CIEL, we per -form tenfold cross-validation experiments on the 24 benchmark datasets provided by UCI and KEEL. Designed to improve the performance of the classifiers, CIEL outperforms other popular machine learning methods such as naive Bayes, k-nearest neighbors, random for -est, and support vector machine. (c) 2021 Elsevier Inc. All rights reserved.
暂无评论