With the coming era of astronomic data with mass, high dimensionality and nonlinearity, clustering astronomic data becomes more and more important. This paper proposed a new clusteringalgorithm, which reduces the spa...
详细信息
ISBN:
(纸本)9781509048403
With the coming era of astronomic data with mass, high dimensionality and nonlinearity, clustering astronomic data becomes more and more important. This paper proposed a new clusteringalgorithm, which reduces the space and time complexity and the sensitivity to the parameters. It is suitable for processing large scale astronomic data sets. The new algorithm consists of three phases: coarsening clustering, representative data clustering and merging. First, we use Affinity Propagation (AP) algorithm for coarsening. Specifically, in order to save the space and computational cost, we only compute the similarity between each point and its t nearest neighbors, and get a coarsened similarity matrix (with only t columns, where t << N and N is the number of data points). Second, to further improve the efficiency and effectiveness of the proposed algorithm, the Find of Density Peaks clustering (FDP) is used to divide the representative points gotten in the first phase. Third, we can get the classes of all data by merging the results of the first two steps. The experimental results show the proposed algorithm can realize the clusters quickly and precisely for the classification of stars/ galaxies using Sloan Digital Sky Survey (SDSS), and is more efficient than the compared algorithms AP, and FDP.
With vast amounts of data being generated daily and the ever increasing interconnectivity of the world’s internet infrastructures,a machine learning based Intrusion Detection Systems(IDS)has become a vital component ...
详细信息
With vast amounts of data being generated daily and the ever increasing interconnectivity of the world’s internet infrastructures,a machine learning based Intrusion Detection Systems(IDS)has become a vital component to protect our economic and national *** shallow learning and deep learning strategies adopt the single learning model approach for intrusion *** single learning model approach may experience problems to understand increasingly complicated data distribution of intrusion ***,the single deep learning model may not be effective to capture unique patterns from intrusive attacks having a small number of *** order to further enhance the performance of machine learning based IDS,we propose the Big Data based Hierarchical Deep Learning System(BDHDLS).BDHDLS utilizes behavioral features and content features to understand both network traffic characteristics and information stored in the *** deep learning model in the BDHDLS concentrates its efforts to learn the unique data distribution in one *** strategy can increase the detection rate of intrusive attacks as compared to the previous single learning model *** on parallel training strategy and big data techniques,the model construction time of BDHDLS is reduced substantially when multiple machines are deployed.
To defend against an increasing number of sophisticated malware attacks, deep-learning based Malware Detection Systems (MDSs) have become a vital component of our economic and national security. Traditionally, researc...
详细信息
To defend against an increasing number of sophisticated malware attacks, deep-learning based Malware Detection Systems (MDSs) have become a vital component of our economic and national security. Traditionally, researchers build the single deep learning model using the entire dataset. However, the single deep learning model may not handle the increasingly complex malware data distributions effectively since different sample subspaces representing a group of similar malware may have unique data distribution. In order to further improve the performance of deep learning based MDSs, we propose a multi-level Deep Learning System (MLDLS) that organizes multiple deep learning models using the tree structure. Each model in the tree structure of MLDLS was not built on the whole dataset. Instead, each deep learning model focuses on learning a specific data distribution for a particular group of malware and all deep learning models in the tree work together to make a final decision. Consequently, the learning effectiveness of each deep learning model built for one cluster can be improved. Experimental results show that our proposed system performs better than the traditional approach. (C) 2019 Elsevier Ltd. All rights reserved.
The traditional information systems analysis,basically from a technical point of view,cannot effectively describe the real object *** the hierarchical structure of complex adaptive systems point of view,proposed multi...
详细信息
ISBN:
(纸本)9781510805750
The traditional information systems analysis,basically from a technical point of view,cannot effectively describe the real object *** the hierarchical structure of complex adaptive systems point of view,proposed multi-level clustering algorithm based on the flow of information,used to design the structure of complex information systems,reduce the complexity of the system structure;optimize the structure of information systems,the complexity of solving complex information systems issues.
This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges an...
详细信息
This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges and length of stay (LOS), for patients diagnosed with heart and circulatory disease, diabetes and cancer, respectively. Clinical charge profiles predictions can provides relevant clinical knowledge for healthcare policy makers to effectively manage healthcare services and costs at the national, state, and local levels. Despite its solid mathematical foundation and promising experimental results, SVM is not favorable for large-scale data mining tasks since its training time complexity is at least quadratic to the number of samples. Furthermore, traditional SVM classification algorithms cannot build an effective SVM when different data distribution patterns are intermingled in a large dataset. In order to enhance SVM training for large, complex and noisy healthcare datasets, we propose the multi-level Support Vector Machine (MLSVM) that organizes the dataset as clusters in a tree to produce better partitions for more effective SVM classification. The MLSVM model utilizes multiple SVMs, each of which learns the local data distribution patterns in a cluster efficiently. A decision fusion algorithm is used to generate an effective global decision that incorporates local SVM decisions at different levels of the tree. Consequently, MLSVM can handle complex and often conflicting data distributions in large datasets more effectively than the single-SVM based approaches and the multiple SVM systems. Both the combined 5 x 2-fold cross validation F test and the independent test show that classification performance of MLSVM is much superior to that of a CVM, ACSVM and CSVM based on three popular performance evaluation metrics. In this work. CSVM and MLSVM are parallelized to speed up the slow SVM training process for very large and complex datasets. Running time analysis
暂无评论