The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms als...
详细信息
ISBN:
(纸本)9781509035663
The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms also require as an input parameter, the number of clusters the induced clustering should have, or then, a threshold value used for limiting for the number of induced clusters. Both, the number of cluster as well a threshold value are often unknown;however it is well-known that results of clustering tasks can be very sensitive to them. This work presents a method for empirically estimating both values. The method is based on multiple runs of sequential clustering algorithms, by using increasing threshold values. Results from experiments conducted using several data domains from two repositories, the UCI and the Keel, as well as a few artificially created data, are presented and a comparative analysis is carried out, as evidence of the good estimates on both values given by the method.
Although many algorithms have been proposed for the camera-based detection of road features (such as road markings, curbstones and road borders), truly contextual or relational information between the detections is ra...
详细信息
ISBN:
(纸本)9781509018895
Although many algorithms have been proposed for the camera-based detection of road features (such as road markings, curbstones and road borders), truly contextual or relational information between the detections is rarely used. This is all the more surprising, since a lot of potential remains unused, regarding outlier rejection or compensating detection failures, multiple detections, misclassification or fragmentation. The aim of this paper is to present an approach that is suitable for such a task in both online and offline applications as a post-processing step after the actual detection and classification step. This is achieved by adapting a perception-based line-clustering algorithm that groups the pre-classified road features based on their relations and assigns them a final class. The grouped features are then fused to form continuous lines instead of individual dashes or fragmented lines. The evaluation on a 10 km drive in both rural and urban environment, as well as an online test on a short highway driving sequence shows that this approach is very well capable to increase the performance of road feature detection at a low computational cost.
Several privacy measures have been proposed in the privacy-preserving data mining literature. However, privacy measures either assume centralized data source or that no insider is going to try to infer some informatio...
详细信息
ISBN:
(纸本)9783319463490;9783319463483
Several privacy measures have been proposed in the privacy-preserving data mining literature. However, privacy measures either assume centralized data source or that no insider is going to try to infer some information. This paper presents distributed privacy measures that take into account collusion attacks and point level breaches for distributed data clustering. An analysis of representative distributed data clustering algorithms show that collusion is an important source of privacy issues and that the analyzed algorithms exhibit different vulnerabilities to collusion groups.
作者:
Amel, HebboulFella, Hachouf
Université Constantine 2 - Abdelhamid Mehri-Ali Mendjeli Constantine25000 Algeria Laboratore d'Automatique and Robotique
Département d'Electronique Faculté des sciences de la Technologie Université Freres Mentouri Route Ain Elbey Constantine25000 Algeria
In classification task, kernel functions are used to make possible to partition data that are linearly non-separable. In this paper, a Particle Swarm Optimization (PSO) is used to obtain optimal cluster centres, their...
详细信息
Matrix clustering algorithms are among the oldest approaches to the vertical partitioning problem. They can be summarized as follows: (1) given a workload, construct an Attribute Usage Matrix (AUM), (2) apply some kin...
详细信息
Matrix clustering algorithms are among the oldest approaches to the vertical partitioning problem. They can be summarized as follows: (1) given a workload, construct an Attribute Usage Matrix (AUM), (2) apply some kind of a row and column permutation algorithm and (3) extract the resulting clusters which define the required fragments. This naive approach holds some promise for a number of contemporary applications: (1) dynamization of vertical partitioning (2) big data applications and other cases of resource constraints (3) tuning of multistores. In this paper we examine a number of existing matrix clustering algorithms used for vertical partitioning. We study these algorithms and assess the quality of the solutions. The experiments are run on the TPC-H workload using the PostgreSQL DBMS.
In this paper, we present a clustering based method used to process 3D seismic data and automatically map seismic horizons in the presence of discontinuities. Our approach uses the cosine of instantaneous phase attrib...
详细信息
ISBN:
(纸本)9789462821859
In this paper, we present a clustering based method used to process 3D seismic data and automatically map seismic horizons in the presence of discontinuities. Our approach uses the cosine of instantaneous phase attributes and applies Principal Component Analysis to the original datasets of trace shapes to improve the quality of the original samples. We also propose a measurement to infer the quality of the clusters used to map the seismic horizons. Based on this measurement, we show that using the cosine of instantaneous phase attributes and PCA greatly improves the mapping of seismic horizons.
Among the power system corrective controls, defensive islanding is considered as the last resort to secure the system from severe cascading contingencies. The primary motive of defensive islanding is to limit the affe...
详细信息
In data mining, clustering is the most popular, powerful and commonly used unsupervised learning technique. It is a way of locating similar data objects into clusters based on some similarity. clustering algorithms ca...
详细信息
Currently there is an active accumulation of big data in various information environments, such as social, corporate, scientific and other domains. Intensive use of big data in various fields stimulates the increased ...
详细信息
Currently there is an active accumulation of big data in various information environments, such as social, corporate, scientific and other domains. Intensive use of big data in various fields stimulates the increased interest of researchers to the development of methods and means of processing and analyzing massive data volumes with significant variety. One of the promising areas in data intensive analytics is cluster analysis, which allows to solve such problems as: reducing the dimension of the original dataset, identifying patterns, etc. In this article, the authors propose an ensemble of clustering algorithms, consisting of the basic algorithm K-means, characterized by one parameter - the distance metric between objects. For the evaluation of performance of the designed ensemble the open data archive of UCI was used.
In this work, we develop a new method of setting the input to reservoir and reservoir to reservoir weights in echo state machines. We use a clustering technique which we have previously developed as a pre-processing s...
详细信息
暂无评论