检索结果-内蒙古大学图书馆

Conference on Object Detection, Classification, and Tracking Technologies

作者： Hu, Y Yang, J Yu, ZQ Shanghai Jiao Tong Univ Inst Image Proc & Pattern Recognit Shanghai 200030 Peoples R China

ISBN: (纸本)0819442828

Association rule clustering is one of the most important topics in data mining. This paper proposes a generalization of distance-based clustering algorithm of association rules on various types of attributes. Firstly, considering complex database with various data, we present numeralized processing to deal with rules on many kinds of attributes. Secondly, instead of these values of numeralized attributes being computed straightly, we propose an approach to normalize these attributes of association rules. Finally, with applying the numeralized as well as normalization methods, we present the generalization of clustering algorithm based on the different definitions of distances and diameters of rules. This algorithm can be used to handle the rules with attributes of different types and different scales, which extend the method of clustering in Ref.l. Two simple examples are also provided to demonstrate the better results of the clustering algorithm in the end of the paper.

关键词： association rules rules clustering numeric attribute various types attribute clustering algorithm

来源：评论

学校读者我要写书评

暂无评论

Data fusion, ensemble and clustering to improve the classification accuracy for the severity of road traffic accidents in Korea

引用

SAFETY SCIENCE 2003年第1期41卷 1-14页

作者： Sohn, SY Lee, SH Yonsei Univ Dept Comp Sci & Ind Syst Engn Sudaemon Ku Seoul South Korea

Increasing amount of road traffic in 1990s has drawn much attention in Korea due to its influence on safety problems. Various types of data analyses are done in order to analyze the relationship between the severity of road traffic accident and driving environmental factors based on traffic accident records. Accurate results of such accident data analysis can provide crucial information for road accident prevention policy. In this paper, we use various algorithms to improve the accuracy of individual classifiers for two categories of severity of road traffic accident. Individual classifiers used are neural network and decision tree. Mainly three different approaches are applied: classifier fusion based on the Dempster-Shafer algorithm, the Bayesian procedure and logistic model;data ensemble fusion based on arcing and bagging;and clustering based on the k-means algorithm. Our empirical study results indicate that a clustering based classification algorithm works best for road traffic accident classification in Korea. (C) 2002 Elsevier Science Ltd. All rights reserved.

关键词： neural network decision tree Dempster-Shafer Bayesian fusion bagging arcing clustering algorithm

来源：评论

学校读者我要写书评

暂无评论

Threshold selection by clustering gray levels of boundary

引用

PATTERN RECOGNITION LETTERS 2003年第12期24卷 1983-1999页

作者： Wang, LS Bai, J Tsinghua Univ Dept Biomed Engn Inst Biomed Engn Beijing 100084 Peoples R China

In this paper, threshold selection is considered in the continuous image rather than in digital image. We prove that, for each given object within 2D image, its optimal threshold is determined by the mean of the gray values of the points lying on its continuous boundary. Thus, we try to deduce threshold from the gray values of the boundary rather from the gray values of the given discrete sampling points (pixels or edge pixels). By the scheme, we well overcome some disadvantages existing in the threshold methods based on the histogram of edge pixels. Besides, the proposed method has the ability to well handle the image whose histogram has very unequal peaks and broad valley. (C) 2003 Elsevier Science B.V. All rights reserved.

关键词： threshold selection clustering algorithm image segmentation

来源：评论

学校读者我要写书评

暂无评论

Bayesian clustering and product partition models

引用

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY 2003年第2期65卷 557-574页

作者： Quintana, FA Iglesias, PL Pontificia Univ Catolica Chile Dept Estadist Santiago 22 Chile

We present a decision theoretic formulation of product partition models (PPMs) that allows a formal treatment of different decision problems such as estimation or hypothesis testing and clustering methods simultaneously. A key observation in our construction is the fact that PPMs can be formulated in the context of model selection. The underlying partition structure in these models is closely related to that arising in connection with Dirichlet processes. This allows a straightforward adaptation of some computational strategies-originally devised for nonparametric Bayesian problems-to our framework. The resulting algorithms are more flexible than other competing alternatives that are used for problems involving PPMs. We propose an algorithm that yields Bayes estimates of the quantities of interest and the groups of experimental units. We explore the application of our methods to the detection of outliers in normal and Student t regression models, with clustering structure equivalent to that induced by, a Dirichlet process prior. We also discuss the sensitivity of the results considering different prior distributions for the partitions.

关键词： clustering algorithm Dirichlet process prior k-means algor ithm outlier detection sharp model

来源：评论

学校读者我要写书评

暂无评论

Knn density-based clustering for high dimensional multispectral images 2

Knn density-based clustering for high dimensional multispect...

引用

2nd GRS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas

作者： Tran, TN Wehrens, R Buydens, LMC Univ Nijmegen Analyt Chem Lab NL-6525 ED Nijmegen Netherlands

ISBN: (纸本)0780377192

High resolution and high dimension satellite images cause problems for clustering methods due to clusters of different sizes, shapes and densities. The most common clustering methods, e.g. K-means and ISODATA, do not work well for such kinds of datasets. In this work, density estimation techniques and density-based clustering methods are exploited. Density-based clustering is well-known in data mining to classify a data set based on its density parameters, where high density areas are separated by lower density areas, although it can only work with a simple data set in which cluster densities are not very different. Our contribution is to propose the k nearest neighbor (knn) density-based rule for a high dimensional dataset and to develop a new knn density-based clustering (KNNCLUST) for such complex dataset. KNNCLUST is stable, clear and easy to understand and implement. The number of clusters is automatically determined. These properties are illustrated by the segmentation of a multispectral image of a floodplain in The Netherlands.

关键词： clustering algorithm density-estimation high dimension multispectral images

来源：评论

学校读者我要写书评

暂无评论

Upper bounds on empirically optimal quantizers

引用

IEEE TRANSACTIONS ON INFORMATION THEORY 2003年第4期49卷 1037-1046页

作者： Kim, DS Bell, MR Hankuk Univ Foreign Studies Sch Elect & Informat Engn Yongin 449791 Kyonggi Do South Korea Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

In designing a vector quantizer using a training sequence (TS), the training algorithm tries to find an empirically optimal quantizer that minimizes the selected distortion criteria using the sequence. In order to evaluate the performance of the trained quantizer, we can use the empirically minimized distortion that we obtain when designing the quantizer. In this correspondence, several upper bounds on the empirically minimized distortions are proposed with numerical results. The bound holds pointwise, i.e., for each distribution with finite second moment in a class. From the pointwise bounds, it is possible to derive the worst case bound, which is better than the current bounds for practical training ratio beta, the ratio of the TS size to the codebook size. It is. shown that the empirically minimized distortion underestimates the true minimum distortion by more than a factor of (1 - 1 / m), where m is the sequence size. Furthermore, through an asymptotic analysis in the codebook size, a multiplication factor [1 - (1 - e(-beta))/beta] approximate to (1 - 1/beta) for an asymptotic bound is shown. Several asymptotic bounds in terms of the vector dimension and the type of source are also introduced.

关键词： clustering algorithm empirically optimal quantizer training sequence (TS) vector quantizer

来源：评论

学校读者我要写书评

暂无评论

Sampling and subsampling for cluster analysis in data mining: With applications to sky survey data

引用

DATA MINING AND KNOWLEDGE DISCOVERY 2003年第2期7卷 215-232页

作者： Rocke, DM Dai, J Univ Calif Davis Ctr Image Proc & Integrated Computing Davis CA 95616 USA

This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.

关键词： clustering algorithm mixture likelihood sampling star/galaxy classification

来源：评论

学校读者我要写书评

暂无评论

Cluster-based and brute-correcting grammatical rules learning

Cluster-based and brute-correcting grammatical rules learnin...

引用

International Conference on Natural Language Processing and Knowledge Engineering

作者： Hu, W Zhang, DM Shanghai Jiao Tong Univ Dept Comp Sci & Engn Shanghai 200030 Peoples R China

ISBN: (纸本)0780379020

In this paper, we propose a cluster-based and brute-correcting grammatical rules learning method which is based on some conclusions of the cognitive linguistics. First, instances of grammatical category are mapped to graphic vectors and distance between two vectors is defined. The set of vectors and the defined distance are proved to form a distance space. Next, this space is mapped to Euclidean space and a simple clustering algorithm is applied to acquire clusters. Then, grammatical rules are learned to describe the cluster. Finally. brute-correcting progress helps to refine the rules. After describing the method we compare the brute-correcting progress with Eric Brill's transformation-based learning approach [E. Brill, 1995] informally and present an application in Chinese named entity recognition.

关键词： clustering algorithm brute-correcting progress grammatical rules learning

来源：评论

学校读者我要写书评

暂无评论

引用

Conference on Signal Processing, Sensor Fusion, and Target Recognition XII

作者： Yeh, CH Sung, PY Chang, HT Kuo, CJ Univ So Calif Dept Elect Engn Los Angeles CA 90007 USA

ISBN: (纸本)0819449563

Deoxyribonucleic acid (DNA) sequences are difficult to analyze similarity due to their length and complexity. The challenge lies in being able to use digital signal processing (DSP) to solve highly relevant problems in DNA sequences. Here, we transfer a one-dimensional (ID) DNA sequence into a two-dimensional (2D) pattern by using the Peano scan algorithm. Four complex values are assigned to the characters "A", ''C'', "T", and "G", respectively. Then, Fourier transform is employed to obtain far-field amplitude distribution of the 2D pattern. Hereto, a ID DNA sequence becomes a 2D image pattern. Features are extracted from the 2D image pattern with the Principle Component Analysis (PCA) method. Therefore, the DNA sequence database can be established. Unfortunately, comparing features may take a long time when the database is large since multi-dimensional features are often available. This problem is solved by building indexing structure like a filter to filter-out non-relevant items and select a subset of candidate DNA sequences. clustering algorithms can organize the multi-dimensional feature data into the indexing structure for effective retrieval. Accordingly, the query sequence can be only compared against candidate ones rather than all sequences in database. In fact, our algorithm provides a pre-processing method to accelerate the DNA sequence search process. Finally, experimental results further demonstrate the efficiency of our proposed algorithm for DNA sequences similarity retrieval.

关键词： DNA Peano scan Fourier transformation Principle Component Analysis indexing structure clustering algorithm similarity retrieval

来源：评论

学校读者我要写书评

暂无评论

Unified method of knowledge representation in the evolutionary artificial intelligence systems

Unified method of knowledge representation in the evolutiona...

引用

5th Conference on Data Mining and Knowledge Discovery

作者： Bykov, NM Bykova, KN Vinnitsa State Tech Univ UA-21021 Vinnitsa Ukraine

ISBN: (纸本)081944958X

The evolution of artificial intelligence systems called by complicating of their operation topics and science perfecting has resulted in a diversification of the methods both the algorithms of knowledge representation and usage in these systems. Often by this reason it is very difficult to design the effective methods of knowledge discovering and operation for such systems. In the given activity the authors offer a method of unitized representation of the systems knowledge about objects of an external world by rank transformation of their descriptions, made in the different features spaces: deterministic, probabilistic, fuzzy and other. The proof of a sufficiency of the information about the rank configuration of the object states in the features space for decision making is presented. It is shown that the geometrical and combinatorial model of the rank configurations set introduce their by group of. some system of incidence, that allows to store the information on them in a convolute kind. The method of the rank configuration description by the DRP-code (distance rank preserving code) is offered. The problems of its completeness, information capacity, noise immunity and privacy are reviewed. It is shown, that the capacity of a transmission channel for such submission of the information is more than unit, as the code words contain the information both about the object states, and about the distance ranks between them. The effective algorithm of the data clustering for the object states identification, founded on the given code usage, is described. The knowledge representation with the help of the rank configurations allows to unitize and to simplify algorithms of the decision making by fulfillment of logic operations above the DRP-code words. Examples of the proposed. clustering techniques operation on the given samples set, the rank configuration of resulted clusters and its DRP-codes are presented.

关键词： knowledge representation clustering algorithm rank configuration making decision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：