machinelearning models have gained popularity nowadays for their potential to solve real-life issues when trained on pertinent data. In many cases, the real-life data are class imbalanced and hence the corresponding ...
详细信息
machinelearning models have gained popularity nowadays for their potential to solve real-life issues when trained on pertinent data. In many cases, the real-life data are class imbalanced and hence the corresponding machinelearning models trained on the data tend to perform poorly on metrics like precision, recall, AUC, F1, and G-mean score. Since class imbalance issue poses serious challenges to the performance of trained models, a multitude of research works have addressed this issue. Two common data-based sampling techniques have mostly been proposed-undersampling the data of the majority class and oversampling the data of the minority class. In this article, we focus on the former approach. We propose two novel algorithms that employ neural network-based approaches to remove majority samples that are found to reside in the vicinity of the minority samples, thereby undersampling the former to remove (or alleviate) the imbalance issue. We delineate the proposed algorithms and then test the proposed algorithms on some publicly available imbalanced datasets. We then compare the performance of our proposed algorithms to other popular undersampling algorithms. Finally, we conclude that our proposed algorithms outperform most of the existing undersampling approaches on most performance metrics.
In the 2015 Canadian case of Ewert v. Canada, risk assessment tools were put on trial in Canada's Federal Court and eventually at the Supreme Court of Canada, their efficacy was challenged, and their reliability w...
详细信息
In the 2015 Canadian case of Ewert v. Canada, risk assessment tools were put on trial in Canada's Federal Court and eventually at the Supreme Court of Canada, their efficacy was challenged, and their reliability was upended [1], [2]. Risk assessment tools are used by the justice system to present a calculated prediction of an offender's risk of future criminal behavior. These tools are viewed by the legal community as precursors to machinelearning and technological advancements in artificial intelligence (AI) in the criminal justice system, using algorithmic and data-driven decision-making to provide courts with predictions regarding an offender's risk if released.
The large number of visual applications in multimedia sharing websites and social networks contribute to the increasing amounts of multimedia data in cyberspace. Video data is a rich source of information and consider...
详细信息
The large number of visual applications in multimedia sharing websites and social networks contribute to the increasing amounts of multimedia data in cyberspace. Video data is a rich source of information and considered the most demanding in terms of storage space. With the huge development of digital video production, video management becomes a challenging task. Video content analysis (VCA) aims to provide big data solutions by automating the video management. To this end, shot boundary detection (SBD) is considered an essential step in VCA. It aims to partition the video sequence into shots by detecting shot transitions. High computational cost in transition detection is considered a bottleneck for real-time applications. Thus, in this paper, a balance between detection accuracy and speed for SBD is addressed by presenting a new method for fast video processing. The proposed SBD framework is based on the concept of candidate segment selection with frame active area and separable moments. First, for each frame, the active area is selected such that only the informative content is considered. This leads to a reduction in the computational cost and disturbance factors. Second, for each active area, the moments are computed using orthogonal polynomials. Then, an adaptive threshold and inequality criteria are used to eliminate most of the non-transition frames and preserve candidate segments. For further elimination, two rounds of bisection comparisons are applied. As a result, the computational cost is reduced in the subsequent stages. Finally, machinelearning statistics based on the support vector machine is implemented to detect the cut transitions. The enhancement of the proposed fast video processing method over existing methods in terms of computational complexity and accuracy is verified. The average improvements in terms of frame percentage and transition accuracy percentage are 1.63% and 2.05%, respectively. Moreover, for the proposed SBD algorithm, a compara
Cyber Threat Detection (CTD) is subject to complicated and rapidly accelerating developments. Poor accuracy, high learning complexity, limited scalability, and a high false positive rate are problems that CTD encounte...
详细信息
Cyber Threat Detection (CTD) is subject to complicated and rapidly accelerating developments. Poor accuracy, high learning complexity, limited scalability, and a high false positive rate are problems that CTD encounters. Deep learning defense mechanisms aim to build effective models for threat detection and protection allowing them to adapt to the complex and ever-accelerating changes in the field of CTD. Furthermore, swarm intelligence algorithms have been developed to tackle the optimization challenges. In this paper, a Chaotic Zebra Optimization Long-Short Term Memory (CZOLSTM) algorithm is proposed. The proposed algorithm is a hybrid between Chaotic Zebra Optimization Algorithm (CZOA) for feature selection and LSTM for cyber threat classification in the CSE-CIC-IDS2018 dataset. Invoking the chaotic map in CZOLSTM can improve the diversity of the search and avoid trapping in a local minimum. In evaluating the effectiveness of the newly proposed CZOLSTM, binary and multi-class classifications are considered. The acquired outcomes demonstrate the efficiency of implemented improvements across many other algorithms. When comparing the performance of the proposed CZOLSTM for cyber threat detection, it outperforms six innovative deep learningalgorithms for binary classification and five of them for multi-class classification. Other evaluation criteria such as accuracy, recall, F1 score, and precision have been also used for comparison. The results showed that the best accuracy was achieved using the proposed algorithm for binary is 99.83%, with F1-score of 99.82%, precision of 99.83%, and recall of 99.82%. The proposed CZOLSTM algorithm also achieved the best performance for multi-class classification among other compared algorithms.
In real cases, missing values tend to contain meaningful information that should be acquired or should be analyzed before the incomplete dataset is used for machinelearning tasks. In this work, two algorithms named j...
详细信息
In real cases, missing values tend to contain meaningful information that should be acquired or should be analyzed before the incomplete dataset is used for machinelearning tasks. In this work, two algorithms named jointly fuzzy C-Means and vaguely quantified nearest neighbor (VQNN) imputation (JFCM-VQNNI) and jointly fuzzy C-Means and fitted VQNN imputation (JFCM-FVQNNI) have been proposed by considering clustering conception and sufficient extraction of uncertain information. In the proposed JFCM-VQNNI and JFCM-FVQNNI algorithm, the missing value is regarded as a decision feature, and then, the prediction is generated for the objects that contain at least one missing value. Specially, as for JFCM-VQNNI algorithm, indistinguishable matrixes, tolerance relations, and fuzzy membership relations are adopted to identify the potential closest filled values based on corresponding similar objects and related clusters. On the basis of JFCM-VQNNI algorithm, JFCM-FVQNNI algorithm synthetic analyzes the fuzzy membership of the dependent features for instances with each cluster. In order to fill the missing values more accurately, JFCM-FVQNNI algorithm performs fuzzy decision membership adjustment in each object with respect to the related clusters by considering highly relevant decision attributes. The experiments have been carried out on five datasets. Based on the analysis of root-mean-square error, mean absolute error, comparison of imputation values with actual values, and classification accuracy results analysis, we can draw the conclusion that the proposed JFCM-FVQNNI and JFCM-VQNNI algorithms yields sufficient and reasonable imputation performance results by comparing with fuzzy C-Means parameter-based imputation algorithm and fuzzy C-Means rough parameter-based imputation algorithm.
Traditional methods of soil chemical analysis are time consuming, costly, and generate chemical waste. Proximal sensors, such as portable X-ray fluorescence (pXRF) spectrometry, may help to overcome these issues since...
详细信息
Traditional methods of soil chemical analysis are time consuming, costly, and generate chemical waste. Proximal sensors, such as portable X-ray fluorescence (pXRF) spectrometry, may help to overcome these issues since they have been shown to produce accurate predictions of many soil properties. However, such processes need to be further investigated in Brazilian soils. This work aimed to assess the influence of soil management and mineralogy on elemental composition of soils and predict exchangeable Al3+, Ca2+, Mg2+, and available K+, and P contents from pXRF data alone and associated with soil texture through machine learning algorithms [stepwise generalized linear models (SGLM), and random forest (RF)] in soils of the Brazilian Coastal Plains (BCP). A total of 285 soil samples were collected from the A (n = 123) and B (n = 162) horizons and subjected to laboratory analyses and pXRF scans. Samples were randomly separated into 70% for modeling and 30% for validation. Soil mineralogy and management mainly influenced Al, and Ca and K total content, respectively. In general, the inclusion of the auxiliary input data of soil texture did not change the predictive power of the models. The best results highlight a considerable promise of pXRF technique for rapidly assessing exchangeable Ca2+ (RMSE = 176.3 mg kg(-1), R-2 = 0.71), Mg2+ (37.7 mg kg(-1) , 0.60), and available K+ (27.46 mg kg(-1), 0.67). The algorithms could not generate reliable models to predict exchangeable Al3+ (30.6 mg kg(-1), 0.47) and available P (19.9 mg kg(-1), 0.14). In sum, pXRF can be used to reasonably predict soil fertility properties in the BCP soils. Further studies may extend predictions to othersoil properties.
New industrial control systems (ICSs) that have been modernized with the industrial Internet of Things (IIoT) are exposed to cyber-attacks that exploit IIoT vulnerabilities. Numerous intrusion detection systems (IDSs)...
详细信息
New industrial control systems (ICSs) that have been modernized with the industrial Internet of Things (IIoT) are exposed to cyber-attacks that exploit IIoT vulnerabilities. Numerous intrusion detection systems (IDSs) have therefore been proposed to secure ICSs, many of which are based on machinelearning, specifically deep neural networks (DNNs). Most of the proposed DNN-based solutions rely on single deep learning models and could be less costly in terms of ICS latency. However, they might have difficulties understanding the increasingly complex data distribution of intrusion patterns. Moreover, single deep learning models may not be effective in capturing the specific patterns of minority classes in highly imbalanced datasets, which is usually the case in cyber-security. Therefore, this paper proposes a novel hybrid multistage DNN-based intrusion detection and prevention system (IDPS) with better accuracy for critical ICSs that cannot afford to compromise on security to improve latency. The proposed approach sequentially learns the decision boundaries of the data that were misclassified or classified with low confidence by previous DNNs. Moreover, it incorporates a collaborative intrusion prevention system (IPS) with an emergency response schema that automatically mitigates attacks as soon as anomalies are detected. The results of experimental evaluations performed on different datasets demonstrate the effectiveness of the proposed solution.
作者:
Cai, DengZhejiang Univ
Coll Comp Sci State Key Lab CAD&CG Hangzhou 310058 Zhejiang Peoples R China
Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machinelearning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every prop...
详细信息
Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machinelearning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims to outperform Locality Sensitive Hashing (LSH), which is the most popular hashing method. However, the evaluation of these hashing article was not thorough enough, and the claim should be re-examined. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing article only report the performance with the code length shorter than 128. In this article, we carefully revisit the problem of search-with-a-hash-index and analyze the pros and cons of two popular hash index search procedures. Then we proposed a simple but effective novel hash index search approach and made a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing ranked the first, which is in contradiction to the claims in all the other 10 hashing article. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the article are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.
Future integrated terrestrial-aerial-satellite networks will have to exhibit some unprecedented characteristics for the provision of both communications and computation services, and security for a tremendous number o...
详细信息
Future integrated terrestrial-aerial-satellite networks will have to exhibit some unprecedented characteristics for the provision of both communications and computation services, and security for a tremendous number of devices with very broad and demanding requirements across multiple networks, operators, and ecosystems. Although 3GPP introduced the concept of self-organizing networks (SONs) in 4G and 5G documents to automate network management, even this progressive concept will face several challenges as it may not be sufficiently agile in coping with the immense levels of complexity, heterogeneity, and mobility in the envisioned beyond-5G integrated networks. In the presented vision, we discuss how future integrated networks can be intelligently and autonomously managed to efficiently utilize resources, reduce operational costs, and achieve the targeted Quality of Experience (QoE). We introduce the novel concept of the "self-evolving networks (SENs)" framework, which utilizes artificial intelligence, enabled by machinelearning (ML) algorithms, to make future integrated networks fully automated and intelligently evolve with respect to the provision, adaptation, optimization, and management aspects of networking, communications, computation, and infrastructure nodes' mobility. To envisage the concept of SEN in future integrated networks, we use the Intelligent Vertical Heterogeneous Network (I-VHetNet) architecture as our reference. The article discusses five prominent scenarios where SEN plays the main role in providing automated network management. Numerical results provide an insight into how the SEN framework improves the performance of future integrated networks. The article presents the leading enablers and examines the challenges associated with the application of the SEN concept in future integrated networks.
Code clones. In this work, we propose a novel detection framework using machinelearning for automated detection of all four type of clones. The features extracted from a pair of code blocks are combined for possible ...
详细信息
Code clones. In this work, we propose a novel detection framework using machinelearning for automated detection of all four type of clones. The features extracted from a pair of code blocks are combined for possible detection of a clone with respect to a reference block. We use AST and PDG features of both code blocks to prepare labelled training samples after fusing the two feature vectors using three different alternatives. We use six state-of-the-art classification models including Deep Convolutional Neural Network to assess the prediction performance of our scheme. To access the effectiveness of our framework we use seven datasets and compare its performance with five state-of-the-art clone detectors. We also compare a large number of algorithms for code clone detection. Comparing the performance of a large number of machinelearning techniques, ANN and non-ANN, using such features, and establishing that fusing of AST and PDG features gives competitive results using deep learning as well as boosted tree algorithms, we find that boosted tree algorithms like XGBoost are quite competitive in clone detection. Experimental results demonstrate that our approach outperforms existing clone detection methods in terms of prediction accuracy.
暂无评论