There has been much progress on efficient algorithms for clustering data points generated by a mixture of k probability distributions under the assumption that the means of the distributions are well-separated, i.e., ...
详细信息
ISBN:
(纸本)9780769542447
There has been much progress on efficient algorithms for clustering data points generated by a mixture of k probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least Omega(k) standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a "proximity condition": the projection of any data point onto the line joining its cluster center to any other cluster center is Omega(k) standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent the difference between a point and the mean of the cluster to which it belongs. We show that in the generative models studied, our proximity condition is satisfied and so we are able to derive most known results for generative models as corollaries of our main result. We also prove some new results for generative models - e.g., we can cluster all but a small fraction of points only assuming a bound on the variance. Our algorithm relies on the well known k-means algorithm, and along the way, we prove a result of independent interest - that the k-means algorithm converges to the "true centers" even in the presence of spurious points provided the initial (estimated) centers are close enough to the corresponding actual centers and all but a small fraction of the points satisfy the proximity condition. Finally, we present a new technique for boosting the ratio of inter-center separation to standard deviation. This allows us to prove results for learning certain mixture of distributions under weaker separation conditions.
In this paper, we propose a similarity-based image retrieval considering artifacts by self-organizing map with refractoriness. In the self-organizing map with refractoriness, the plural neurons in the Map Layer corres...
详细信息
ISBN:
(纸本)9781457706530
In this paper, we propose a similarity-based image retrieval considering artifacts by self-organizing map with refractoriness. In the self-organizing map with refractoriness, the plural neurons in the Map Layer corresponding to the input can fire sequentially because of the refractoriness. The proposed image retrieval system considering artifacts using the self-organizing map with refractoriness makes use of this property in order to retrieve plural similar images. In this image retrieval system, as the image feature, not only color information but also spectrum and keywords are employed. Moreover, the original image is divided into some areas by the k-means algorithm so that each divided area should not contain two or more objects. We carried out a series of computer experiments and confirmed that the effectiveness of the proposed system.
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in sp...
详细信息
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in specified time is important because the cost of storing electricity is expensive. For handling this problem, we need knowledge about the electricity usage pattern of clients. This pattern can be obtained by using clustering techniques. In this paper, clustering is used to obtain the similarity of electricity usage patterns in a specified time. We use k-means algorithm to employ clustering on the dataset of electricity consumption from 370 clients that collected in a year. Result of this study, we obtained an interesting pattern that there is a big group of clients consume the lowest electric load in spring season, but in another group, the lowest electricity consumption occurred in winter season. From this result, electricity provider can make production planning in specified season based on pattern of electricity usage profile.
In order to study the k - meansalgorithm for evaluation of soil fertility, solve the large amount of calculation and high time complexity of the algorithm, this paper proposes the k-means algorithm based on Hadoop pl...
详细信息
ISBN:
(纸本)9783319196206;9783319196190
In order to study the k - meansalgorithm for evaluation of soil fertility, solve the large amount of calculation and high time complexity of the algorithm, this paper proposes the k-means algorithm based on Hadoop platform. First, k-means algorithm is used to cluster for Nongan town soil nutrient data for nine consecutive years;clustering results show that: the accuracy rate increased year by year, and consistent with the actual situation. Then for the k-means clustering algorithm in processing large amounts of data has the disadvantages of high time complexity, This paper uses the k-means algorithm Based on Hadoop platform to realize the clustering analysis of soil fertility of large amounts of data;the results show that: compared with the traditional serial k-means algorithms, improves the operation speed. The above analysis shows that, k-means algorithm is an effective soil fertility evaluation method;Based on Hadoop platform of parallel k-means algorithm has great realistic meaning to analysis of large amount of data of soil fertility factors.
Paper introduces the 2-stage k-means algorithm which is faster than the standard 1-stage k-means algorithm. The main idea of the 2-stages is to move, in the first stage (fast), the centers of the clusters closer to th...
详细信息
ISBN:
(纸本)9781612843957
Paper introduces the 2-stage k-means algorithm which is faster than the standard 1-stage k-means algorithm. The main idea of the 2-stages is to move, in the first stage (fast), the centers of the clusters closer to their final locations. This will be done by using a small part of the data to achieve faster calculation. The next stage (slow) stage will start from the centers found during the first stage (fast). Different initial locations of the clusters have been used while testing the algorithms here. With bigger datasets, it is shown that the 2-stage clustering method achieves better speed-up.
Ornamental plants are a commodity with high production in Indonesia, with a 17.61 million stalk increase recorded in 2018. (9.55%). Ornamental plants have capability enterprise possibilities in Indonesia as properly. ...
详细信息
ISBN:
(纸本)9781665442886
Ornamental plants are a commodity with high production in Indonesia, with a 17.61 million stalk increase recorded in 2018. (9.55%). Ornamental plants have capability enterprise possibilities in Indonesia as properly. The increase and decrease in ornamental plant turnover can be attributed to a variety of factors such as beauty awareness, the development of the tourism industry, ornamental plant trends, and the construction of housing and hotel complexes. A few of the factors mentioned can have an indirect impact on the sustainability of the ornamental plant business. To resolve these concerns, the grouping method was used with k-means Clustering to determine the equation of ornamental plant turnover data based on plant commodities and monthly turnover values. Clustering with k-means algorithm is used in this study to group turnover data based on crop commodities and turnover value. The WEkA application's grouping results utilizing the k-means Clustering algorithm resulted in two clusters with values of 11% (8 data) and 89% (66 data) from a total of 74 data, where the two cluster values appeared after three time iterations.
Visualization became one of the solutions in showing the attack on the network. With Visualize the attack, it would be easier in recognizing and conclude the pattern from the complex image visual. The target of DoS at...
详细信息
ISBN:
(纸本)9781479976751
Visualization became one of the solutions in showing the attack on the network. With Visualize the attack, it would be easier in recognizing and conclude the pattern from the complex image visual. The target of DoS attacks can be addressed to the various parts of the network, it can be routing, web, electronic mail or DNS servers (Domain Name System). The purpose of the DoS attacks create a server shutdown, reboot, crashes or not responding. The pattern of DoS attacks on the dataset ISCX form a pattern where much of his host'S IP just to exploit to a single server. Snort detects a DoS attack on testbed ISCX dataset as much as 42 alert HttpDoS attack. Percentage accuracy of the clustering algorithm using k-means of 97,83%, to its rate of detection 98,63%, and the false alarm of the programme amounting to 0.02%. Meanwhile, the value of the percentage accuracy of the clustering algorithm using k-means with tool WEkA of 99,69%, the detection rate of 99.01% and false alarms of 3.70%. The difference in accuracy between value and clustering tool WEkA caused the value of the centroid is used in mneg-cluster data packets randomly selected from a data value pack.
means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high quality spectral clustering. However this chain suffers from lack of scalability when addressi...
详细信息
ISBN:
(纸本)9783030715939;9783030715922
means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high quality spectral clustering. However this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k-means algorithm as a pre-processing task to reduce the input data instances. We describe parallel optimization techniques for the k-means algorithm on CPU and GPU. Experimental results on synthetic dataset illustrate the numerical accuracy and performance of our implementations.
In this paper Mercer kernels with certain invariance properties are briefly introduced and an apparently not well-known construction using certain cohomology groups is described. As a consequence some kernels arising ...
详细信息
ISBN:
(纸本)9783030306045;9783030306038
In this paper Mercer kernels with certain invariance properties are briefly introduced and an apparently not well-known construction using certain cohomology groups is described. As a consequence some kernels arising from this are given. Hence a kernel version of an iterative k-means algorithm due to Duda et al. is exhibited. It resembles the usual k-means algorithm but relies on a different update procedure and allows an elegant computation of the target function.
In order to solve the problem of agricultural robot navigation path recognition in the uneven illumination and complex background environment which lead to the poor accuracy of navigation path, a clustering algorithm ...
详细信息
ISBN:
(纸本)9781538694909
In order to solve the problem of agricultural robot navigation path recognition in the uneven illumination and complex background environment which lead to the poor accuracy of navigation path, a clustering algorithm for image segmentation is used in this paper. By introducing the Lab color space and k-means algorithm, the k-means clustering process can be performed with large-scale segmentation of the region of interest in the image. After clustered twice, the image can separate the path information of the farmland from background. The navigation path can be fitted by using the linear least squares method. For illustration, an image of the medlar farmland line is utilized to show the feasibility of this method. Experience results show that the method of clustering and segmenting the region of interest based on k-means algorithm can effectively improve the accuracy of image segmentation and solve the influence of uneven illumination and complex background environment on farmland navigation path accuracy.
暂无评论