In the kernel clustering problem we are given a (large) n x n symmetric positive semidefinite matrix A = (a(ij)) with Sigma(n)(i=1) Sigma(n)(j=1) a(ij) = 0 and a (small) k x k symmetric positive semidefinite matrix B ...
详细信息
ISBN:
(纸本)9780898717013
In the kernel clustering problem we are given a (large) n x n symmetric positive semidefinite matrix A = (a(ij)) with Sigma(n)(i=1) Sigma(n)(j=1) a(ij) = 0 and a (small) k x k symmetric positive semidefinite matrix B = (b(ij)) The goal is to find a partition (S(1), ..., S(k)} of {1, n} which maximizes Sigma(k)(i=1) Sigma(k)(j=1) (Sigma(p, q)is an element of S(1) x S(1) a(pq)) b(ij) We design a polynomial time approximation algorithm that achieves an approximation ratio of R(B)(2)/C(B) where R(B) and C(B) are geometric parameters that depend only on the matrix 13, defined as follows if b(ij) = < v(i), v(j)> is the Gram matrix representation of B lot some v(1), v(k) is an element of R(k) then R(B) is the minimum radius of a Euclidean ball containing the points {v(1), ..., v(k)} The parameter C(B) is defined as the maximum over all measurable partitions {A(1), ..., A(k)} of R(k-1) the quantity Sigma(k)(i=1) Sigma(k)(j=1) b(ij) < z(i), z(j)>, where for i is an element of{1, ..., k} the vector z, is an element of R(k-1) is the Gaussian moment of A(i), i e z, =,1/(2 pi)((k-1)/2) integral A ie-||x||(2)(2)/2dx We also show that for eve!): epsilon> 0, achieving an approximation guarantee of (1 - epsilon) R(B)(2)/C(B) is Unique Games hard
In this paper, a novel probabilistic load modeling approach is presented. The proposed approach starts by grouping the 24 data points representing the hourly loading of each day in one data segment. The resulting 365 ...
详细信息
ISBN:
(纸本)9781479913039
In this paper, a novel probabilistic load modeling approach is presented. The proposed approach starts by grouping the 24 data points representing the hourly loading of each day in one data segment. The resulting 365 data segments representing the whole year loading profile are evaluated for similarities using principle component analysis;then segments with similar principal components are grouped together into one cluster using clustering algorithms. For each cluster a representative segment is selected and its probability of occurrence is computed. The results of the proposed algorithm can be used in different studies to model the long term behavior of electrical loads taking into account their temporal variations. This feature is possible as the selected representative segments cover the whole year. The designated representative segments are assigned probabilistic indices that correspond to their frequency of occurrence, thus preserving the stochastic nature of electrical loads.
clustering algorithms are used in wide varieties of fields in many contexts. In these cases the behavior of the datasets are different to each other. Their sizes, density or the distribution may vary from one another....
详细信息
ISBN:
(纸本)9781479977321
clustering algorithms are used in wide varieties of fields in many contexts. In these cases the behavior of the datasets are different to each other. Their sizes, density or the distribution may vary from one another. In data mining, clustering algorithms are implemented to build clusters with respect to a given dataset. But it is not an easy task to find the most suitable clustering algorithm for the given dataset. Therefore this study is done on several datasets using four clustering algorithms to identify the most suitable algorithm. This study is based on comparison of clustering data mining algorithms by using WEKA machine learning software.
clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementatio...
详细信息
To measure the performance or validity of clustering algorithms, several evaluation values, such as successful rate, successful number and full successful rate are defined. In order to ensure each cluster to at least ...
详细信息
ISBN:
(纸本)9780769548968;9781467347259
To measure the performance or validity of clustering algorithms, several evaluation values, such as successful rate, successful number and full successful rate are defined. In order to ensure each cluster to at least contain one vector data, and to maximize several proposed evaluation values, two class assignment algorithms are designed. To testify their performance, we employ them to the k-means clustering algorithms.
Transitioning from a traditional distribution network grid or diesel only systems to microgrids, offers end-users economic benefits and higher power quality at a reduced environmental cost. Particularly, an upcoming r...
详细信息
ISBN:
(纸本)9781538649503
Transitioning from a traditional distribution network grid or diesel only systems to microgrids, offers end-users economic benefits and higher power quality at a reduced environmental cost. Particularly, an upcoming research area, multi-microgrids, aims to provide a more reliable network capable of self-healing. The aim of this paper is to assess well-known clustering algorithms for cost effective microgrid formation and develop a planning framework for uncoupled multi-microgrid networks. In each microgrid, a minimum spanning tree represented the network, resulting in a linear relationship between the microgrid cost and the transmission/power demand. In addition, a diversity factor was introduced to showcase the ability of larger microgrids to more reliably meet peak power demands. Simulation results from three real life datasets suggested that hierarchical clustering algorithms were more suited for microgrid planning due to their adaptability to any datasets, complete solution space search guaranteeing global optimum networks and relative computational efficiency.
Recent works in unsupervised learning have emphasized the need to understand a new trend in algorithmic design, which is to influence the clustering via weights on the instance points. In this paper, we handle cluster...
详细信息
ISBN:
(纸本)0898715687
Recent works in unsupervised learning have emphasized the need to understand a new trend in algorithmic design, which is to influence the clustering via weights on the instance points. In this paper, we handle clustering as a constrained minimization of a Bregman divergence. Theoretical results show benefits resembling those of boosting algorithms, and bring new modified weighted versions of clustering algorithms such as k-means, expectation-maximization (EM) and k-harmonic means. Experiments display the quality of the results obtained, and corroborate the advantages that subtle data reweightings may bring to clustering.
An unsupervised machine learning method clustering, is introduced to conclude characteristics of vessel traffic flow data. A new way is found to implement data analysis in vessel traffic field using artificial intelli...
详细信息
ISBN:
(纸本)9780769533575
An unsupervised machine learning method clustering, is introduced to conclude characteristics of vessel traffic flow data. A new way is found to implement data analysis in vessel traffic field using artificial intelligent technique. A similarity based algorithm, K-Means, is selected in the clustering process for its simplicity and efficiency and a popular data mining tool named WEKA is chosen to execute the experiment. The result of the data mining experiment, which use the real data from an water way of Yangzi river, list the most related cluster centroids and related explanations, which show us the fact often be neglected. A conclusion that clustering is a suitable method to generalize multi-factor related regulations is made finally according to the mining result and its reasonable explanation.
Although many algorithms have been proposed for the camera-based detection of road features (such as road markings, curbstones and road borders), truly contextual or relational information between the detections is ra...
详细信息
ISBN:
(纸本)9781509018895
Although many algorithms have been proposed for the camera-based detection of road features (such as road markings, curbstones and road borders), truly contextual or relational information between the detections is rarely used. This is all the more surprising, since a lot of potential remains unused, regarding outlier rejection or compensating detection failures, multiple detections, misclassification or fragmentation. The aim of this paper is to present an approach that is suitable for such a task in both online and offline applications as a post-processing step after the actual detection and classification step. This is achieved by adapting a perception-based line-clustering algorithm that groups the pre-classified road features based on their relations and assigns them a final class. The grouped features are then fused to form continuous lines instead of individual dashes or fragmented lines. The evaluation on a 10 km drive in both rural and urban environment, as well as an online test on a short highway driving sequence shows that this approach is very well capable to increase the performance of road feature detection at a low computational cost.
Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus ...
详细信息
ISBN:
(纸本)9780898716535
Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete. A number of heuristics have been proposed as approximate solutions, some with performance guarantees. In practice, the problem is apparently easy to approximate, but guidance is necessary as which heuristic to use depending on the number of elements and clusterings given. We have implemented a number of heuristic;for the consensus clustering problem, and here we compare their performance, independent of data size, in terms of efficacy and efficiency, on both simulated and real data sets. We find that based on the underlying algorithms and their behavior in practice the heuristics can be categorized into two distinct groups, with ramification as to which one to use in a given situation, and that a hybrid solution is the best bet in general. We have also developed a refined consensus clustering heuristic for the occasions when the given clusterings may be too disparate, and their consensus may not be representative of any one of them, and we show that in practice the refined consensus clusterings can be much superior to the general consensus clustering.
暂无评论