With the technical advances, the amount of bigdata is increasing day-by-day such that the traditional software tools face burden in handling them. Additionally, the presence of the imbalance data in the bigdata is a...
详细信息
With the technical advances, the amount of bigdata is increasing day-by-day such that the traditional software tools face burden in handling them. Additionally, the presence of the imbalance data in the bigdata is a huge concern to the research industry. In order to assure the effective management of bigdata and to deal with the imbalanced data, this paper proposes a new optimization algorithm. Here, the big data classification is performed using the MapReduce framework, wherein the map and reduce functions are based on the proposed optimization algorithm. The optimization algorithm is named as Exponential Bat algorithm (E-Bat), which is the integration of the Exponential Weighted Moving Average (EWMA) and Bat Algorithm (BA). The function of map function is to select the features that are presented to the classification in the reducer module using the Neural Network (NN). Thus, the classification of bigdata is performed using the proposed E-Bat algorithm-based MapReduce Framework and the experimentation is performed using four standard databases, such as Breast cancer, Hepatitis, Pima Indian diabetes dataset, and Heart disease dataset. From, the experimental results, it can be shown that the proposed method acquired a maximal accuracy of 0.8829 and True Positive Rate (TPR) of 0.9090, respectively.
To cope with the rapid evolution of various attacks and the computer networks' increase, an intelligent intrusion detection system is considered as a promising emerging technique for the security of computer netwo...
详细信息
To cope with the rapid evolution of various attacks and the computer networks' increase, an intelligent intrusion detection system is considered as a promising emerging technique for the security of computer networks. Individual classification approaches have not provided complete protection. Indeed, it has been shown that none of them is efficient enough to provide good detection rates and reduce the false alarms rates. In previous works, a comparative study was conducted between the neuro-fuzzy and the genetic-fuzzy approaches. In this study, a hybrid approach is proposed based on the stacking scheme. This approach offers a solution to combine the two basic classifiers in order to take advantage of each one of them. The experimental results have shown the effectiveness of the proposed approach in terms of maximizing the detection rates and reducing the false alarm rates.
This paper proposes an effective classification method named Rider Chicken Optimization Algorithm-based Recurrent Neural Network (RCOA-based RNN) to perform big data classification in spark architecture. Initially, th...
详细信息
This paper proposes an effective classification method named Rider Chicken Optimization Algorithm-based Recurrent Neural Network (RCOA-based RNN) to perform big data classification in spark architecture. Initially, the input data are collected from the network by the master node and then forwarded to the slave node. These nodes are responsible for storing the data and performing computations. The features are effectively selected in the slave node using the proposed RCOA. The selected features are forwarded to the master node. The big data classification is achieved in the master node by using the RNN classifier, and the training of the classifier is done using the proposed RCOA algorithm, which is the integration of the Rider optimization algorithm (ROA) with the standard Chicken Swarm Optimization (CSO). The experimentation is done by using the Switzerland dataset, Cleveland dataset, Hungarian dataset and Skin disease dataset, in which the proposed RCOA-based RNN attained better performance based on the quantitative properties, such as sensitivity, accuracy and specificity with the values of 9.3E+01%, 9.4E+01% and 9.3E+01% using Hungarian dataset. The existing learning methods failed to address the complex classification problems at a reasonable time, which is overcome by the proposed method.
Image processing is currently developing as a unique and the inventive field in computer research and applications in the modern area. Most image processing algorithms produce a large quantity of data as an outcome, w...
详细信息
Image processing is currently developing as a unique and the inventive field in computer research and applications in the modern area. Most image processing algorithms produce a large quantity of data as an outcome, which is termed as big-data. These algorithms process and store bulky information either as structured or unstructured data. The use of bigdata analytics to mine the data produced by image processing technology has huge potential in areas like education, governments, medical establishments, production units, finance and banking, and retail business centers. This paper well defined the innovations made in bigdata analytics and image processing. In this study, a novel dataclassification approach especially for image analytics is proposed. To improve image quality, pre-processing is applied to huge data that has been gathered. Then, most relevant features like spatial information, texture GLCM, and color and shape features are extracted from the pre-processed image. Since the dimensions of the features are huge in size, an adaptive map-reduce framework with Improved Shannon Entropy has been introduced to lessen the dimensionality of the extracted features. Then, in the big data classification phase, an optimized deep learning classifier deep convolutional neural network (DCNN) is introduced to classify the images accurately. The weight function of the DCNN is fine-tuned using the newly proposed dragonfly updated mothsearch (DAUMS) Algorithm to enhance the classification accuracy and to solve the optimization problems of the research work. The moth search algorithm and dragonfly algorithm are both concepts in this hybrid algorithm DAUMS.
The amount of data generated is increasing day by day due to the development in remote sensors, and thus it needs concern to increase the accuracy in the classification of the bigdata. Many classification methods are...
详细信息
The amount of data generated is increasing day by day due to the development in remote sensors, and thus it needs concern to increase the accuracy in the classification of the bigdata. Many classification methods are in practice;however, they limit due to many reasons like its nature for data loss, time complexity, efficiency and accuracy. This paper proposes an effective and optimal dataclassification approach using the proposed Ant Cat Swarm Optimization-enabled Deep Recurrent Neural Network (ACSO-enabled Deep RNN) by Map Reduce framework, which is the incorporation of Ant Lion Optimization approach and the Cat Swarm Optimization technique. To process feature selection and big data classification, Map Reduce framework is used. The feature selection is performed using Pearson correlation-based Black hole entropy fuzzy clustering. The classification in reducer part is performed using Deep RNN that is trained using a developed ACSO scheme. It classifies the bigdata based on the reduced dimension features to produce a satisfactory result. The proposed ACSO-based Deep RNN showed improved results with maximal specificity of 0.884, highest accuracy of 0.893, maximal sensitivity of 0.900 and the maximum threat score of 0.827 based on the Cleveland dataset.
Dimension reduction is a preprocessing step in machine learning for eliminating undesirable features and increasing learning accuracy. In order to reduce the redundant features, there are data representation methods, ...
详细信息
Dimension reduction is a preprocessing step in machine learning for eliminating undesirable features and increasing learning accuracy. In order to reduce the redundant features, there are data representation methods, each of which has its own advantages. On the other hand, bigdata with imbalanced classes is one of the most important issues in pattern recognition and machine learning. In this paper, a method is proposed in the form of a cost-sensitive optimization problem which implements the process of selecting and extracting the features simultaneously. The feature extraction phase is based on reducing error and maintaining geometric relationships between data by solving a manifold learning optimization problem. In the feature selection phase, the cost-sensitive optimization problem is adopted based on minimizing the upper limit of the generalization error. Finally, the optimization problem which is constituted from the above two problems is solved by adding a cost-sensitive term to create a balance between classes without manipulating the data. To evaluate the results of the feature reduction, the multi-class linear SVM classifier is used on the reduced data. The proposed method is compared with some other approaches on 21 datasets from the UCI learning repository, microarrays and high-dimensional datasets, as well as imbalanced datasets from the KEEL repository. The results indicate the significant efficiency of the proposed method compared to some similar approaches.
The innovation of bigdata has an intense impact on data context analytics. The bigdata processing platforms have gained immense popularity in evaluating bigdata as they offer low latency needed for bigdata applica...
详细信息
The innovation of bigdata has an intense impact on data context analytics. The bigdata processing platforms have gained immense popularity in evaluating bigdata as they offer low latency needed for bigdata applications. This paper introduces a novel method for big data classification using spark architecture that follows master-slave nodes. The input data is initially partitioned by data griding in the master node using Black Hole Entropy Fuzzy Clustering (BHEFC). Then, the feature selection is executed for each slave node based on Renyi entropy for choosing better features for further processing. Finally, the features selected from each slave node are concatenated together to form the feature vector. Consequently, obtained features from the slave node are passed to the classification module in the master node for performing the classification of bigdata. In this case, classification is carried out using the Deep Recurrent Neural network (DeepRNN), tuned by the novel Water Atom Search Optimization (WASO). The WASO is newly designed by integrating the Water Wave Optimization (WWO) algorithm and Atom Search Optimization (ASO) characteristics. Thus, the proposed WASO-based DeepRNN method effectively classifies bigdata using reduced dimension features to produce satisfactory results. Therefore, the output of developed WASO enabled DeepRNN is employed for big data classification. The proposed method is compared with Neural Network (NN), Support Vector Machine (SVM), Edited Nearest Neighbor for bigdata (ENN-BD), Speed-up Dendritic Cell Algorithm (Sp-DCA), Compact Fuzzy Models in bigdata (CFM-BD), and DeepRNN. The proposed method obtained improved results with a high specificity of 0.979, accuracy of 0.951, sensitivity of 0.963, and precision of 0.988 based on the skin disease dataset.
Extreme Learning Machine (ELM) and its variants have been widely used for many applications due to its fast convergence and good generalization performance. Though the distributed ELM* based on MapReduce framework can...
详细信息
Extreme Learning Machine (ELM) and its variants have been widely used for many applications due to its fast convergence and good generalization performance. Though the distributed ELM* based on MapReduce framework can handle very large scale training dataset in bigdata applications, how to cope with its rapidly updating is still a challenging task. Therefore, in this paper, a novel Elastic Extreme Learning Machine based on MapReduce framework, named Elastic ELM ((ELM)-L-2), is proposed to cover the shortage of ELM* whose learning ability is weak to the updated large-scale training dataset. Firstly, after analyzing the property of ELM* adequately, it can be found out that its most computation-expensive part, matrix multiplication, can be incrementally, decrementally and correctionally calculated. Next, the Elastic ELM based on MapReduce framework is developed, which first calculates the intermediate matrix multiplications of the updated training data subset, and then update the matrix multiplications by modifying the old matrix multiplications with the intermediate ones. Then, the corresponding new output weight vector can be obtained with centralized computing using the update the matrix multiplications. Therefore, the efficient learning of rapidly updated massive training dataset can be realized effectively. Finally, we conduct extensive experiments on synthetic data to verify the effectiveness and efficiency of our proposed (ELM)-L-2 in learning massive rapidly updated training dataset with various experimental settings. (C) 2014 Elsevier B.V. All rights reserved.
In this article, the proposed method develops a big data classification model with the aid of intelligent techniques. Here, the Parallel Pool Map reduce Framework is used for handling bigdata. The model involves thre...
详细信息
In this article, the proposed method develops a big data classification model with the aid of intelligent techniques. Here, the Parallel Pool Map reduce Framework is used for handling bigdata. The model involves three main phases, namely (1) feature extraction, (2) optimal feature selection, and (3) classification. For feature extraction, the well-known feature extraction techniques such as principle component analysis, linear discriminate analysis, and linear square regression are used. Since the length of feature vector tends to be high, the choice of the optimal features is complex task. Hence, the proposed model utilizes the optimal feature selection technology referred as Lion-based Firefly (L-FF) algorithm to select the optimal features. The main objective of this article is projected on minimizing the correlation between the selected features. It results in providing diverse information regarding the different classes of data. Once, the optimal features are selected, the classification algorithm called neural network (NN) is adopted, which effectively classify the data in an effective manner with the selected features. Furthermore, the proposed L-FF+NN model is compared with the traditional methods and proves the effectiveness over other methods. Experimental analysis shows that the proposed L-FF+NN model is 92%, 28%, 87%, 82%, and 78% superior to the state-of-art models such as GA+NN, FF+NN, PSO+NN, ABC+NN, and LA+NN, respectively.
In information analysis and systematic extraction of complex or huge dataset, bigdata plays a vital role. The massive growth of large-scale data causes a major issue in bigdata and hence it is required to classify t...
详细信息
In information analysis and systematic extraction of complex or huge dataset, bigdata plays a vital role. The massive growth of large-scale data causes a major issue in bigdata and hence it is required to classify the bigdata to solve data imbalance issues. The huge data can be explored in an efficient way by converting it into valuable knowledge and this data can be processed in the distributed environment with different application framework. In recent decades, spark framework gained more significance in bigdata domain due to its increasing achievement in incremental and iterative approaches. Due to imbalance of data distribution, big data classification with large sized dataset results a challenging task with the conventional methods as it leads wrong decision in generating classification result. In this paper, an efficient Shuffled Student Psychology Optimization_Deep Q network is proposed for big data classification with spark framework in order to overcome the issues faced by the traditional methods. Here, master and slave sets are used to perform unique operations, like data partitioning, feature fusion and data augmentation process in order to accomplish the task of dataclassification by proposed approach. The developed technique attained the maximum TPR of 0.960, accuracy of 0.942, and TNR of 0.929.
暂无评论