Dimensionality reduction through feature selection becomes inevitable to overcome the problem of the Curse of dimensionality. In this article, we propose a feature (gene) selection method for high dimensional gene exp...
详细信息
Data mining is a process of extracting valid, previously unknown, and ultimately comprehensible information from large datasets and using it for organizational decision-making. clustering is one of the most useful tas...
详细信息
ISBN:
(纸本)9781447123859
Data mining is a process of extracting valid, previously unknown, and ultimately comprehensible information from large datasets and using it for organizational decision-making. clustering is one of the most useful tasks in data mining process for discovering groups and identifying interesting distributions and patterns in the underlying *** the clustering process, there are no predefined classes and no examples that would show what kind of desirable relations should be valid among the data that is why it is perceived as an unsupervised process. On the other hand, classification is a procedure of assigning a data item to a predefined set of categories. clustering produces initial categories in which values of a data set are classified during the classification process. The clustering process may result in different partitioning of a data set, depending on the specific criterion used for clustering. In general terms, the clustering algorithms are based on a criterion for assessing the quality of a given partitioning. They take some parameters (e.g. number of clusters, density of clusters)as input and attempt to findthe best partitioning of a dataset for the given parameters. Thus, they define a partitioning of a data set based on certain assumptions and not necessarily the "best" one that fits the data set. Since clustering algorithms discover clusters, which are not known a priori, the final partition of a data set requires some sort of evaluationin most applications, clustering results validation are discussed in the literature. They aim at the quantitative evaluation of the results of the clustering algorithms and are known under the general term cluster validity methods. Many data mining algorithms consider outliers as noise that must be eliminated because they degrade their predictive accuracy. However, as pointed out, "one person's noise could be another person's signal". Outlier mining can be used in telecom or credit card frauds to detect the atypical usag
Trust-based access control (TBAC) is a hot research issue in the area of network security for open networks. clustering of the domains of recommendation entities is a prerequisite for trust quantification and evaluati...
详细信息
Online clustering of streaming sensor data aims at providing summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time...
详细信息
Online clustering of streaming sensor data aims at providing summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving. In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.
The conventional method of lithofacies identification uses well logs and core samples. Manual effort is required for this identification. We hypothesize that we can achieve similar results using the unsupervised data ...
详细信息
Looking back on the past decade of research on clustering algorithms, we witness two major and apparent trends: 1) The already vast amount of existing clustering algorithms, is continuously broadened and 2) clustering...
详细信息
Looking back on the past decade of research on clustering algorithms, we witness two major and apparent trends: 1) The already vast amount of existing clustering algorithms, is continuously broadened and 2) clustering algorithms in general, are becoming more and more adapted to specific application domains with very particular assumptions. As a result, algorithms have grown complicated and/or very scenariodependent, which made clustering a hardly accessible domain for non-expert users. This is an especially critical development, since, due to increasing data gathering, the need for analysis techniques like clustering emerges in many application domains. In this paper, we oppose the current focus on specialization, by proposing our vision of a usable, guided and universally applicable clustering process. In detail, we are going to describe our already conducted work and present our future research directions.
Globalization has changed management concepts and practices in many management fields through human resource management. Human resource management is a new trend in modern management education, challenging traditional...
详细信息
clustering is one of the most important tasks in data mining and can be defined as the process of partitioning objects into groups or clusters, such that objects in the same group are more similar to one another than ...
详细信息
In view of the fact that DBSCAN clustering algorithm can identify the data with arbitrary shape and one-pass clustering algorithm has the quick and efficient feature, this paper proposes a two-stage hybrid clustering ...
详细信息
This paper proposes a novel hierarchical clustering algorithm based on high order dissimilarities. These dissimilarity increments are measures computed over triplets of nearest neighbor points. Recently, the distribut...
详细信息
暂无评论