Data analysis has become an integral part in many economic fields. In this paper, we present several real-world applications occurring in the fields of automobile development and manufacturing, finance, and online com...
详细信息
Data analysis has become an integral part in many economic fields. In this paper, we present several real-world applications occurring in the fields of automobile development and manufacturing, finance, and online communities. The given examples share one aspect in common: time. It is not only the fact to find patterns inside data volumes but also to identify them based on their temporal behaviour. We will give examples of dealing with different models of incorporating the temporal aspects. Furthermore some new results in the area of Visual Data Analysis are presented. These methods offer intuitive methods of guiding the user through the process of data and model inspection and assist in drawing conclusions with the help of meaningful graphical representations.
While in standard fuzzy clustering one optimizes a set of prototypes, one for each cluster, we study fuzzy clustering without prototypes. We define an objective function, which only depends on the distances between da...
详细信息
While in standard fuzzy clustering one optimizes a set of prototypes, one for each cluster, we study fuzzy clustering without prototypes. We define an objective function, which only depends on the distances between data points and the membership degrees of the data points to the clusters, and derive an iterative membership update rule. The properties of the resulting algorithm are then examined, especially w.r.t. to an additional parameter of the objective function (compared to the one proposed in [7]) that can be seen as a more flexible alternative to the fuzzifier. Corresponding experimental results are reported that demonstrate the merits of our approach.
Recently several papers studied resampling approaches to determine the number of clusters in prototype-based clustering. The core idea underlying these approaches is that with the right choice for the number of cluste...
详细信息
Recently several papers studied resampling approaches to determine the number of clusters in prototype-based clustering. The core idea underlying these approaches is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper we investigate whether these approaches can be transferred to fuzzy clustering. It turns out that they are applicable to fuzzy clustering as well, but that not all relative cluster evaluation measures that work for crisp clustering can also be used for fuzzy clustering.
The detection of the types of local surface form deviations is a major step in the automated quality assessment of car body parts during the manufacturing process. In previous studies we compared the performance of di...
详细信息
The detection of the types of local surface form deviations is a major step in the automated quality assessment of car body parts during the manufacturing process. In previous studies we compared the performance of different soft computing techniques for this purpose. We achieved promising results with regard to classification accuracy and interpretability of rule bases, even though the dataset was rather small, high dimensional and unbalanced. In this paper we reconsider the collection of training examples and their assignment to defect types by the quality experts. We attempt to minimize the uncertainty of the quality experts' subjective and error-prone labelling in order to achieve a higher reliability of the defect detection. We show that refined and more accurate classification models can be built on the basis of a preprocessed training set that is more consistent. Using a partially supervised learning strategy we can report improvements in classification accuracy.
We propose a method of acquiring knowledge about the possibility of deletion of adnominal verb phrases from a corpus. Our method acquires such items as the frequency of modification of the noun by adnominal verb phras...
详细信息
The FP-growth algorithm is currently one of the fastest approaches to frequent item set mining. In this paper I describe a C implementation of this algorithm, which contains two variants of the core operation of compu...
详细信息
ISBN:
(纸本)1595932100
The FP-growth algorithm is currently one of the fastest approaches to frequent item set mining. In this paper I describe a C implementation of this algorithm, which contains two variants of the core operation of computing a projection of an FP-tree (the fundamental data structure of the FP-growth algorithm). In addition, projected FP-trees are (optionally) pruned by removing items that have become infrequent due to the projection (an approach that has been called FP-Bonsai). I report experimental results comparing this implementation of the FP-growth algorithm with three other frequent item set mining algorithms I implemented (Apriori, Eclat, and Relim). Copyright 2005 ACM.
Recursive elimination is an algorithm for finding frequent item sets, which is strongly inspired by the FP-growth algorithm and very similar to the H-mine algorithm. It does its work without prefix trees or any other ...
详细信息
ISBN:
(纸本)1595932100
Recursive elimination is an algorithm for finding frequent item sets, which is strongly inspired by the FP-growth algorithm and very similar to the H-mine algorithm. It does its work without prefix trees or any other complicated data structures, processing the transactions directly. Its main strength is not its speed (although it is not slow, even outperforms Apriori and Eclat on some data sets), but the simplicity of its structure. Basically all the work is done in one simple recursive function, which can be written with relatively few lines of code. Copyright 2005 ACM.
Real life transaction data often miss some occurrences of items that are actually present. As a consequence some potentially interesting frequent patterns cannot be discovered, since with exact matching the number of ...
详细信息
暂无评论