检索结果-内蒙古大学图书馆

9th ACM SIGKDD international conference on Knowledge Discovery and data mining, KDD '03

作者： Yu, Hwanjo Yang, Jiong Han, Jiawei Department of Computer Science University of Illinois Urbana-Champaign IL 61801 United States

ISBN: (纸本)9781581137378

Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convery several salient properties that other methods hardly provide. However, despite the prominent properties of SVMs, they are not as favored for large-scale data mining as for pattern recognition or machine learning because the training complexity of SVMs is highly dependent on the size of a data set. Many real-world data mining applications involve millions or billions of data records where even multiple scans of the entire data are too expensive to perform. this paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learning the SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy. Copyright 2003 ACM.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Comparison of dimensionality reduction methods for wood surface inspection

Comparison of dimensionality reduction methods for wood surf...

引用

6th international conference on Quality Control by Artificial Vision

作者： Niskanen, M Silvén, O Univ Oulu Machine Vis Grp Oulu Finland

ISBN: (纸本)0819449989

Dimensionality reduction methods for visualization map the original high-dimensional data typically into two dimensions. Mapping preserves the important information of the data, and in order to be useful, fulfils the needs of a human observer. We have proposed a self-organizing map (SOM)-based approach for visual surface inspection. the method provides the advantages of unsupervised learning and an intuitive user interface that allows one to very easily set and tune the class boundaries based on observations made on visualization, for example, to adapt to changing conditions or material. there are, however, some problems with a SOM. It does not address the true distances between data, and it has a tendency to ignore rare samples in the training set at the expense of more accurate representation of common samples. In this paper, some alternative methods for a SOM are evaluated. these methods, PCA, MDS, LLE, ISOMAP, and GTM, are used to reduce dimensionality in order to visualize the data. their principal differences are discussed and performances quantitatively evaluated in a few special classification cases, such as in wood inspection using centile features. For the test material experimented with, SOM and GTM outperform the others when classification performance is considered. For data mining kinds of applications, ISOMAP and LLE appear to be more promising methods.

关键词： visual training unsupervised learning lumber inspection projection

来源：评论

学校读者我要写书评

暂无评论

Active learning with support vector machines in the drug discovery process

Active learning with support vector machines in the drug dis...

引用

6th international conference on Chemical Structures

作者： Warmuth, MK Liao, J Rätsch, G Mathieson, M Putta, S Lemmen, C Univ Calif Santa Cruz Dept Comp Sci Santa Cruz CA 95064 USA Australian Natl Univ RSISE Canberra ACT 0200 Australia Rational Discovery LLC Palo Alto CA 94301 USA BioSolveIT GMBH D-53757 St Augustin Germany

We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from machine learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector machines". this hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

Introduction to knowledge management and text mining

Introduction to knowledge management and text mining

引用

IEEE international conference on Fuzzy Systems (FUZZ-IEEE)

作者： R. Krishnapuram IBM India Research Lab New Delhi

来源：评论

学校读者我要写书评

暂无评论

Proceedings. 15th IEEE international conference on Tools with Artificial Intelligence

Proceedings. 15th IEEE International Conference on Tools wit...

引用

international conference on Tools for Artificial Intelligence (ICTAI)

the following topics are discussed: bioinformatics; software engineering with computational intelligence; data mining; evolutionary computing; planning and scheduling; knowledge management and sharing; machine learning; agents; vision and imaging; artificial intelligence in medicine; fuzzy logic; intelligent information retrieval; knowledge representation; satisfiability; computer vision and pattern recognition.

关键词： Knowledge representation Software engineering Fuzzy logic machine vision pattern recognition Information retrieval learning systems

来源：评论

学校读者我要写书评

暂无评论

High performance data mining

引用

5th international conference on High Performance Computing for Computational Science, VECPAR 2002

作者： Kumar, Vipin Joshi, Mahesh V. Han, Eui-Hong Sam Tan, Pang-Ning Steinbach, Michael University of Minnesota 4-192 EE/CSci Building 200 Union Street SE MinneapolisMN55455 United States

ISBN: (纸本)3540008527

Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data[SAD+93, CHY96]. the field of data mining builds upon the ideas from diverse fields such as machine learning, pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional disciplines are often unsuitable due to some unique characteristics of today’s data-sets, such as their enormous sizes, high-dimensionality, and heterogeneity. there is a necessity to develop effective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of two important data mining algorithms: discovery of association rules, and induction of decision trees for classification. We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites that need to be processed to better understand global scale changes in biosphere processes and patterns. © Springer-Verlag Berlin Heidelberg 2003.

关键词： data mining

来源：评论

学校读者我要写书评

暂无评论

An empirical comparison of in-learning and post-learning optimization schemes for tuning the support vector machines in cost-sensitive applications

An empirical comparison of in-learning and post-learning opt...

引用

international conference on Image Analysis and Processing

作者： F. Tortorella DAEIMI Università degli Studi di Cassino Italy

Support vector machines (SVM) are currently one of the classification systems most used in pattern recognition and data mining because of their accuracy and generalization capability. However, when dealing with very complex classification tasks where different errors bring different penalties, one should take into account the overall classification cost produced by the classifier more than its accuracy. It is thus necessary to provide some methods for tuning the SVM on the costs of the particular application. Depending on the characteristics of the cost matrix, this can be done during or after the learning phase of the classifier. In this paper we introduce two optimization schemes based on the two possible approaches and compare their performance on various data sets and kernels. the first experimental results show that both the proposed schemes are suitable for tuning SVM in cost-sensitive applications.

关键词： Support vector machines Costs Support vector machine classification pattern recognition data mining Cancer Error correction Kernel Risk management Classification algorithms

来源：评论

学校读者我要写书评

暂无评论

Discriminative parameter learning of general Bayesian network classifiers

Discriminative parameter learning of general Bayesian networ...

引用

international conference on Tools for Artificial Intelligence (ICTAI)

作者： Bin Shen Xiaoyuan Su R. Greiner P. Musilek C. Cheng Computing Science University of Alberta Edmonton AB Canada Electrical & Computer Engineering University of Alberta Edmonton AB Canada

Greiner and Zhou (1988) presented ELR, a discriminative parameter-learning algorithm that maximizes conditional likelihood (CL) for a fixed Bayesian belief network (BN) structure, and demonstrated that it often produces classifiers that are more accurate than the ones produced using the generative approach (OFE), which finds maximal likelihood parameters. this is especially true when learning parameters for incorrect structures, such as naive Bayes (NB). In searching for algorithms to learn better BN classifiers, this paper uses ELR to learn parameters of more nearly correct BN structures - e.g., of a general Bayesian network (GBN) learned from a structure-learning algorithm by Greiner and Zhou (2002). While OFE typically produces more accurate classifiers with GBN (vs. NB), we show that ELR does not, when the training data is not sufficient for the GBN structure learner to produce a good model. Our empirical studies also suggest that the better the BN structure is, the less advantages ELR has over OFE, for classification purposes. ELR learning on NB (i.e., with little structural knowledge) still performs about the same as OFE on GBN in classification accuracy, over a large number of standard benchmark datasets.

关键词： Bayesian methods Niobium Computer networks Training data Fault diagnosis machine learning Logistics Frequency estimation pattern recognition data mining

来源：评论

学校读者我要写书评

暂无评论

OCEC: A novel classification algorithm based on evolutionary computation 6

OCEC: A novel classification algorithm based on evolutionary...

引用

6th international conference on Signal Processing

作者： Liu, J Zhong, WC Liu, F Jiao, LC Xidian Univ Natl Key Lab Radar Signal Proc Xian 710071 Peoples R China

A novel classification algorithm, OCEC, based on evolutionary computation for data mining is proposed. It is compared to GA-based and non GA-based algorithms on 8 datasets from the UCI machine learning repository. Res... 详细信息

ISBN: (纸本)0780374886

关键词： classification data mining evolutionary computation learning (artificial intelligence) classification algorithm evolutionary computation organizational coevolutionary algorithm data mining UCI machine learning repository prediction accuracy

来源：评论

学校读者我要写书评

暂无评论

An iterative initial-points refinement algorithm for categorical data clustering

引用

pattern recognition LETTERS 2002年第7期23卷 875-884页

作者： Sun, Y Zhu, QM Chen, ZX Univ Nebraska Dept Comp Sci Digital Imaging & Comp Vis Lab Omaha NE 68182 USA

the original k-means clustering algorithm is designed to work primarily on numeric data sets. this prohibits the algorithm from being directly applied to categorical data clustering in many data mining applications. the k-modes algorithm [Z. Huang, Clustering large data sets with mixed numeric and categorical value, in: Proceedings of the First Pacific Asia Knowledge Discovery and data mining conference. World Scientific, Singapore, 1997, pp. 21-34] extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus the k-means fashion of minimizing a numerically valued cost. However, as is the case with most data clustering algorithms, the algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. the differences on the initial points often lead to considerable distinct cluster results. In this paper we present an experimental study on applying Bradley and Fayyad's iterative initial-point refinement algorithm to the k-modes clustering to improve the accurate and repetitiveness of the clustering results [cf. P. Bradley, U. Fayyad, Refining initial points for k-mean clustering, in: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, Los Altos, CA, 1998]. Experiments show that the k-modes clustering algorithm using refined initial points leads to higher precision results much more reliably than the random selection method without refinement, thus making the refinement process applicable to many data mining applications with categorical data. (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： data clustering pattern classification refinement algorithm data mining

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：