This paper proposes a new Ad Hoc clustering algorithm based on ant colony algorithm. The protocol has introduced the node reliability to reflect the node communication environment situation and how busy the node is. A...
详细信息
ISBN:
(纸本)9781479912827
This paper proposes a new Ad Hoc clustering algorithm based on ant colony algorithm. The protocol has introduced the node reliability to reflect the node communication environment situation and how busy the node is. At the same time, the node reliability is one of the node pheromone factors. In the process of clustering and cluster maintenance, it elects the optimal node as cluster head to management cluster members with the guidance of the node pheromone which is cumulative and updating timely to increase the stability of the clusters formed. The clusters are based on multi-hop which can be adjusted according to the size of network. In the cluster, the cluster heads found the best route to the destination node on-demand with ant colony algorithm to reduce the burden on the cluster head and the routing overhead.
Imbalanced data distribution still remains an unsolved problem in data mining and machine learning. This paper introduces the problem of the class-imbalanced data in classification learning and naturally introduces it...
详细信息
ISBN:
(纸本)9781467344630;9781467344647
Imbalanced data distribution still remains an unsolved problem in data mining and machine learning. This paper introduces the problem of the class-imbalanced data in classification learning and naturally introduces it into the clustering learning since data clustering is an important and frequently used unsupervised learning method. In this paper, two verification methods based on two different aspects of original data are proposed to test and verify the influence of class-imbalanced data on clustering. Furthermore, we also conduct some experiments on different imbalanced-ratios to exploring its importance in clustering algorithm since is a very important factor for the performance in classification learning. Experimental results indicate that the class-imbalance of the dataset can seriously influence the final performance and efficiency of the clustering algorithm, and the higher the ratio, the higher the adverse effects of the clustering performance based on class-imbalanced data.
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are sever...
详细信息
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk;otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
With the rapid development of educational informatization, it has enabled education to enter the era of big data. How to extract effective information from educational big data and realize adaptive personalized learni...
详细信息
With the rapid development of educational informatization, it has enabled education to enter the era of big data. How to extract effective information from educational big data and realize adaptive personalized learning goals have become the current research hotspot. The traditional static data only analyzes the students' learning degree based on the students' final answer, but ignores the dynamic data in the process of answering questions, such as the modification and the time it answered on the question, which makes it difficult to fully and accurately mine the correlation between the massive data, so it turns from static data mining to dynamic data mining. The paper proposes an optimized mining algorithm for analyzing students' learning degree based on dynamic data. The algorithm first uses the optimized text classification technology to match the question texts to the knowledge points automatically, so as to improves the efficiency and quality. Then, it uses the subjective weighting method combined with the expert experience to generate the learning degree matrix of students on knowledge points based on dynamic data of the students' records. Finally, the DBSCAN clustering algorithm is used to cluster the personalized learning characteristics of students according to the learning degree matrix. The experimental result shows that the algorithm can deal with massive data automatically and effectively, and analyze the students' learning degree on knowledge points comprehensively and accurately, so as to classify students and realize personalized teaching.
Among the existing clustering algorithms, the k-Means algorithm is one of the most commonly used clustering methods. As an extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categoric...
详细信息
Among the existing clustering algorithms, the k-Means algorithm is one of the most commonly used clustering methods. As an extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. However, there are more mixed-type data containing categorical, ordinal and numerical attributes. Mixed-type data clustering problem has recently attracted much attention from the data mining research community, but most of them fail to notice the ordinal attributes and establish explicit metric similarity of ordinal attributes. In this paper, the limitations of some existing dissimilarity measure of k-Modes algorithm in mixed ordinal and nominal data are analyzed by using some illustrative examples. Based on the idea of mining ordinal information of ordinal attribute, a new dissimilarity measure for the k-Modes algorithm to cluster this type of data is proposed. The distinct characteristic of the new dissimilarity measure is to take account of the ordinal information of ordinal attribute. A convergence study and time complexity of the k-Modes algorithm based on this new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on nine real data sets from UCI show the effectiveness of the new dissimilarity measure.
NLM (National Library of Medicine) is one heterogeneous information network, which mixes scholars, MeSH (Medical Subject Headings), journals and research domains. Mining the rules and knowledge concealed among NLM is ...
详细信息
ISBN:
(纸本)9783037855454
NLM (National Library of Medicine) is one heterogeneous information network, which mixes scholars, MeSH (Medical Subject Headings), journals and research domains. Mining the rules and knowledge concealed among NLM is one hot topic in social computing applications. In this paper, an auto-clustering algorithm for NLM was proposed to uncover the embedded knowledge concerned with medical scholars and medical journals. This algorithm adopts particle swarm optimization (PSO) as iterating algorithm to automatically cluster scholars and journals. In addition, our algorithm utilizes the mutation in genetic algorithm (GA) to overcome local optimization, which is one outstanding bottle neck in various heuristic methods. The effectiveness of our algorithm is demonstrated by applying it to a subset of NLM.
With arrival of big data of smart meters,a large number of residential power consumption data are collected according to different sampling frequency,namely Residential Load Profiles(RLPs).In this paper,RLPs of smart ...
详细信息
With arrival of big data of smart meters,a large number of residential power consumption data are collected according to different sampling frequency,namely Residential Load Profiles(RLPs).In this paper,RLPs of smart meter customers are analyzed by clustering,which is of great significance to load management of smart grid.A twostage Weighted Self-Organizing Map(WSOM) clustering algorithm and a clustering performance evaluation method,SSE-DBI,combining Sum of Squares Error(SSE) and Davies-Bouldin(DBI) are *** first stage,Principal Component Analysis(PCA) is used to reduce the dimension of the *** dimension reduced data is fed into SOM network for clustering,update of weights of SOM is weighted according to PCA,and these clustering centers,namely Typical Residential Load Profiles(TRLPs) of each customer are obtained after some iterations of *** second stage,above processing is repeated for TRLPs of each customer,TRLPs of all customer are *** to SSE-DBI,final optimal cluster number and clustering performance score of the model are *** with several benchmark methods,the proposed method obtains optimal performance.
The online assessment of the small-signal stability of wind farms faces challenges in accurately identifying impedance due to the unknown structure and parameters of wind turbine generators. Besides, impedance recalcu...
详细信息
The online assessment of the small-signal stability of wind farms faces challenges in accurately identifying impedance due to the unknown structure and parameters of wind turbine generators. Besides, impedance recalculation is needed under varying operating points. This study proposes a piecewise affine method for impedance modeling and identification under diverse operating points. The piecewise affine impedance is derived through a process that combines offline impedance modeling and online identification. In the offline modeling, the affine impedance is performed in the parameter space of the operating point of the wind turbine generator. A clustering algorithm is employed to optimize the partitioning of the parameter space. In each partition, the impedance is expressed as a first-order explicit function of the complex variable and the operating state variables. Moving to online applications, impedance identification is readily achieved with knowledge of the real-time measured operating point. By locating the operating point in the partitions of the parameter space, the coefficients of the first-order affine models can be determined. Based on the affine first-order impedance of wind turbine generators, the nodal admittance matrix for large-scale wind farms is established with high accuracy. The sensitivity of the dominant eigenvalue is analyzed with respect to the operating point. In validation, the accuracy and efficiency of the piecewise affine impedance model are verified under varying operating points. The online stability and modal analysis based on PWA-identified impedance are validated. A physical experiment is performed to validate the proposed method, which involves impedance data acquisition from frequency scan, offline piecewise affine modeling, online impedance identification, and online stability assessment.
clustering algorithms, like K-means algorithm, use distances in attribute space to cluster data. However the computation of distances in attribute space influences the accuracy. So innovatively, Variance-Similarity cl...
详细信息
ISBN:
(纸本)9783037857502
clustering algorithms, like K-means algorithm, use distances in attribute space to cluster data. However the computation of distances in attribute space influences the accuracy. So innovatively, Variance-Similarity clustering algorithm defines similarity as a function of the attribute variance, and clusters data by the comparison of similarities. In computer simulation, the comparison of Variance-Similarity algorithm and K-means algorithm on UCI data sets presents that Variance-Similarity algorithm has a better clustering accuracy than K-means algorithm.
暂无评论