Purpose: fuzzyalgorithms of Gath-Geva (GG) and Gustafson-Kessel (GK) based on Mahalanobis distance can improve those limitations of spherical structural clusters, but GG algorithmcan only be used for the data with m...
详细信息
Purpose: fuzzyalgorithms of Gath-Geva (GG) and Gustafson-Kessel (GK) based on Mahalanobis distance can improve those limitations of spherical structural clusters, but GG algorithmcan only be used for the data with multivariate normal distribution. GK algorithm is limited by that it must know the distribution of data. An improved supervised clustering algorithm based on fuzzyc-means (FcM) has been proposed. Methodology: We take a new threshold value and a new convergent algorithm to improve those limitations of GG and GK algorithms, delete the constraint of the determinants of covariance matrices in the GK algorithm, and replace the covariance matrix with the correlation matrix which exists in the objective function. Findings: The experimental results of real data sets show that our proposed new algorithmcan promote clustering accuracies and get better performance. Value: The popular FcM algorithm based on Euclidean distance function converges to a local minimum of the objective function, which can only be used to detect spherical structural clusters. Adding fuzzycovariance matrices in their distance measure was not directly derived from the objective function. But it is not stable enough when some of its covariance matrices are not equal. Hence, different initializations may lead to different results.
In the current research, cluster analysis has become a very good way to obtain biological information by analyzing the brain gene expression data. In recent years, many experts have used improved traditional clusterin...
详细信息
In the current research, cluster analysis has become a very good way to obtain biological information by analyzing the brain gene expression data. In recent years, many experts have used improved traditional clustering algorithm and a new clustering algorithm to mine brain gene expression data. First, the random Forest method is used to preprocess high-dimensional and high-complexity brain gene expression data. Then, a clustering model based on deep learning is proposed, and a clustering algorithm is implemented by using deep belief network (DBN) and fuzzy c-means algorithm (FcM). This model makes full use of the generality of unsupervised learning of deep learning and clustering technology, combines the advantages of deep learning with clustering, and makes clustering effect better and more convenient for clustering high-dimensional data.
Large-scale data analysis is a challenging and relevant task for present-day research and industry. As a promising data analysis tool, clustering is becoming more important in the era of big data. In large-scale data ...
详细信息
Large-scale data analysis is a challenging and relevant task for present-day research and industry. As a promising data analysis tool, clustering is becoming more important in the era of big data. In large-scale data clustering, sampling is an efficient and most widely used approximation technique. Recently, several sampling-based clustering algorithms have attracted considerable attention in large-scale data analysis owing to their efficiency. However, some of these existing algorithms have low clustering accuracy, whereas others have high computational complexity. To overcome these deficiencies, a stratified sampling based clustering algorithm for large-scale data is proposed in this paper. Its basic steps include: (1) obtaining a number of representative samples from different strata with a stratified sampling scheme, which are formed by locality sensitive hashing technique, (2) partitioning the chosen samples into different clusters using the fuzzyc-meansclustering algorithm, (3) assigning the out-of-sample objects into their closest clusters via data labeling technique. The performance of the proposed algorithm is compared with the state-of-the-art sampling-based fuzzyc-meansclustering algorithms on several large-scale data sets including synthetic and real ones. The experimental results show that the proposed algorithm outperforms the related algorithms in terms of clustering quality and computational efficiency for large-scale data sets. (c) 2018 Published by Elsevier B.V.
The water supply network is one of the important infrastructure in urban construction. It has strong theoretical and practical significance to realize the real-time monitoring and leak location of the water supply net...
详细信息
The water supply network is one of the important infrastructure in urban construction. It has strong theoretical and practical significance to realize the real-time monitoring and leak location of the water supply network. In this paper, based on the similarity of water supply network node pressure, fuzzyc-meansclustering algorithm is used to realize the selection of finite monitoring points. On this basis, a depth neural network model is constructed according to the pressure changes of the monitoring points before and after the leakage of the water supply network, so as to locate the leakage points. In the experimental part, hydraulics simulation was conducted by using EPANETH pipe network adjustment software according to the layout structure of water supply network, and the pressure of all nodes was obtained. A deep neural network model was established by Keras in Tensorflow framework. After model training and testing, the training error was controlled within the effective range of 5%.Finally, the model is applied to the actual leakage problem of underground water supply network in Langxi county of Xuancheng city, and the accurate location of the leakage point is realized. The experiment proves the feasibility and accuracy of the method proposed in this paper.
Mining smart data from the collected big data in Internet of Things which attempts to better human life by integrating physical devices into the information space. As one of the most important clustering techniques fo...
详细信息
Mining smart data from the collected big data in Internet of Things which attempts to better human life by integrating physical devices into the information space. As one of the most important clustering techniques for drilling smart data, the fuzzy c-means algorithm (FcM) assigns each object to multiple groups by calculating a membership matrix. However, each big data object has a large number of attributes, posing an remarkable challenge on FcM for loT big data real-time clustering. In this paper, we propose an efficient fuzzyc-means approach based on the tensor canonical polyadic decomposition for clustering big data in Internet of Things. In the presented scheme, the traditional fuzzy c-means algorithm is converted to the high-order tensor fuzzy c-means algorithm (HOFcM) via a bijection function. Furthermore, the tensor canonical polyadic decomposition is utilized to reduce the attributes of every objects for enhancing the clustering efficiency. Finally, the extensive experiments are conducted to compare the developed scheme with the traditional fuzzy c-means algorithm on two large loT datasets including sWSN and eGSAD regarding clustering accuracy and clustering efficiency. The results argue that the developed scheme achieves a significantly higher clustering efficiency with a slight clustering accuracy drop compared with the traditional algorithm, indicating the potential of the developed scheme for drilling smart data from loT big data. (c) 2018 Elsevier B.V. All rights reserved.
In this paper, we psropose a novel method for construction of a distance function and demonstrate its application in image segmentation. In algorithms for image segmentation, distance functions represent a criterion w...
详细信息
In this paper, we psropose a novel method for construction of a distance function and demonstrate its application in image segmentation. In algorithms for image segmentation, distance functions represent a criterion which divides pixels into groups of segments. We introduce two extended aggregation functions, extended powers product and extended weighted arithmetic mean of powers. Their relevant properties are examined, as well as certain resulting properties of distance functions, which are constructed by an application of mentioned aggregation functions. In addition, one pixel descriptor, which is motivated by Local Binary Pattern family of descriptors (LBPs), is introduced and discussed. In the experimental section, we present an application of the introduced extended aggregation functions and descriptor, by a construction of a new distance function, used in fuzzyc-meansclustering algorithm (FcM) for image segmentation. (c) 2019 Elsevier Inc. All rights reserved.
The Meteosat Second Generation (MSG) satellite can be used to estimate rainfall through the multispectral images, which are provided every 15 min across 12 channels. However, most studies have not maximized the teraby...
详细信息
The Meteosat Second Generation (MSG) satellite can be used to estimate rainfall through the multispectral images, which are provided every 15 min across 12 channels. However, most studies have not maximized the terabytes of data provided by the channels in this satellite, which are potentially rich in new resources that need to be exploited. Moreover, these studies classify pixels conventionally, where a pixel is considered either 100% precipitant or 0% (no-precipitant), whereas actually it cannot be classified in a clear and unambiguous way. To address this problem, we propose a method that exploits the images of the channels and constructs an estimation model in the form of fuzzy association rules to estimate the rainfall in Northeastern Algeria. Each rule is in if (condition)-then (conclusion) form, where the condition is a combination of the various fuzzyclasses of MSG images, and the conclusion contains a single fuzzyclass that represents the intensities of rain: no-rain, low, moderate, and high. The obtained results are compared with the data obtained by the European Organization for the Exploitation of Meteorological Satellites Multisensor Precipitation Estimate program.
An integrated energy system not only provides a platform for multi-energy coupling utilisation but also satisfies users' diversified energy demands. However, in view of the enormous amount of integrated energy dat...
详细信息
An integrated energy system not only provides a platform for multi-energy coupling utilisation but also satisfies users' diversified energy demands. However, in view of the enormous amount of integrated energy data and the difficulty of analysing the characteristics of that data, an integrated energy-analysis method based on sparse clustering and compressed sensing is proposed in this study. This method uses the fuzzy c-means algorithm to construct an over-complete dictionary and then compresses, collects, and reconstructs the integrated energy data using the compressed sensing theory method. This process analyses integrated energy-load characteristics accurately and also solves the problem of low data-transmission efficiency. Simulation results show that the method is suitable for analysing and processing integrated load data in integrated energy systems.
With the acceleration of urbanization, urban traffic problems are becoming more and more prominent. In the face of massive traffic data, it is difficult to predict trafficcondition with effective data analysis method...
详细信息
With the acceleration of urbanization, urban traffic problems are becoming more and more prominent. In the face of massive traffic data, it is difficult to predict trafficcondition with effective data analysis methods. In order to deal with traffic data better, this study applied data mining in traffic data analysis and processing, constructed a Hadoop based data analysis system to collect and preprocess data, and analyzed traffic data using parallel distributed calculation based on MapReduce. The improved fuzzyc-means (FcM) algorithm and the random forest algorithm were used. The simulation results showed that the error rate of the improved FcM algorithm is 10% and the accuracy rate of the random forest algorithm is 92.3%, indicating the system had high reliability. Then an experiment was carried out on the main traffic roads in Huadu district of Guangzhou, china. It was found that the method was efficient and accurate and had a good application prospect.
In this paper, an efficient fault detection approach which employs the Support Vector Data Description (SVDD) and fuzzy c-means algorithm (FcM) is proposed for ground-based electronic equipment. Firstly, the FcM metho...
详细信息
ISBN:
(纸本)9781728117089
In this paper, an efficient fault detection approach which employs the Support Vector Data Description (SVDD) and fuzzy c-means algorithm (FcM) is proposed for ground-based electronic equipment. Firstly, the FcM method is applied to fault pattern mining in which the prior knowledge of equipment faults is difficult to be known. Then SVDD model is trained with different faults data independently for multi-classification. This fault diagnosis strategy can be used in health condition monitoring for ground-based electronic equipment. The experimental results verify its effectiveness in fault diagnosis with high accuracy and real-time performance.
暂无评论