检索结果-内蒙古大学图书馆

Active learning with support vector machines in the drug discovery process

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003年第2期43卷 667-673页

作者： Warmuth, MK Liao, J Rätsch, G Mathieson, M Putta, S Lemmen, C Univ Calif Santa Cruz Dept Comp Sci Santa Cruz CA 95064 USA Australian Natl Univ RSISE Canberra ACT 0200 Australia Rational Discovery LLC Palo Alto CA 94301 USA BioSolveIT GMBH D-53757 St Augustin Germany

We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from machine learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector machines". this hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.

关键词： Testing and assessment Algorithms Pharmaceuticals machine learning Drug discovery

来源：评论

学校读者我要写书评

暂无评论

An improved Branch & Bound algorithm in feature selection 9th

引用

9th international conference on Rough Sets, Fuzzy Sets, data mining and Grandular Computing (RSFDGrC 2003)

作者： Wang, ZX Yang, J Li, GZ Shanghai Jiao Tong Univ Inst Pattern Recognit & Image Proc Shanghai 200030 Peoples R China

ISBN: (纸本)3540140409

the Branch & Bound (B&B) algorithm is a globally optimal feature selection method. the high computational complexity of this algorithm is a well-known problem. the B&B algorithm constructs a search tree, and then searches for the optimal feature subset in the tree. Previous work on the B&B algorithm was focused on how to simplify the search tree in order to reduce the search complexity. Several improvements have already existed. A detailed analysis of basic B&B algorithm and existing improvements is given under a common framework in which all the algorithms are compared. Based on this analysis, an improved B&B algorithm, BBPP+, is proposed. Experimental comparison shows that BBPP+ performs best.

关键词： branch & bound feature selection minimum solution tree global optimum machine learning data mining pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Advances in the use of neurophysiologycally-based fusion for visualization and pattern recognition of medical imagery

Advances in the use of neurophysiologycally-based fusion for...

引用

6th international conference on Information Fusion, FUSION 2003

作者： Aguilar, Mario New, Joshua R. Hasanbellin, Erion MCIS Department Knowledge Systems Laboratory Jacksonville State University Jacksonville AL United States

ISBN: (纸本)0972184449

the ever increasing number of image modalities available to doctors for diagnosis purposes has established an important need to develop techniques that support work-load reduction and information maximization. To this end, we have improved on an image fusion architecture first developed for night vision applications. this technique, presented at Fusion 2002, utilizes 3D operators to combine volumetric image sets while maximizing information content. In our approach, we have combined the use of image fusion and user- defined pattern recognition within a 3D human-computer interface. Here, we present our latest advances towards enhancing information visualization and supporting pattern recognition. We also report on results of applying image fusion across a variety of patient cases. Finally, we have also begun the assessment of pattern recognition based on 2D vs. 3D fused image features. Initial results indicate an advantage to fusing imagery across all three dimensions so as to take advantage of the volumetric information available in medical data sets. A description of the system and a number of examples will serve to illustrate our ongoing results. © Commonwealth of Australia 2003.

关键词： data mining

来源：评论

学校读者我要写书评

暂无评论

Classifying large data sets using SVMs with hierarchical clusters 03

Classifying large data sets using SVMs with hierarchical clu...

引用

9th ACM SIGKDD international conference on Knowledge Discovery and data mining, KDD '03

作者： Yu, Hwanjo Yang, Jiong Han, Jiawei Department of Computer Science University of Illinois Urbana-Champaign IL 61801 United States

ISBN: (纸本)9781581137378

Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convery several salient properties that other methods hardly provide. However, despite the prominent properties of SVMs, they are not as favored for large-scale data mining as for pattern recognition or machine learning because the training complexity of SVMs is highly dependent on the size of a data set. Many real-world data mining applications involve millions or billions of data records where even multiple scans of the entire data are too expensive to perform. this paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learning the SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy. Copyright 2003 ACM.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Comparison of dimensionality reduction methods for wood surface inspection

Comparison of dimensionality reduction methods for wood surf...

引用

6th international conference on Quality Control by Artificial Vision

作者： Niskanen, M Silvén, O Univ Oulu Machine Vis Grp Oulu Finland

ISBN: (纸本)0819449989

Dimensionality reduction methods for visualization map the original high-dimensional data typically into two dimensions. Mapping preserves the important information of the data, and in order to be useful, fulfils the needs of a human observer. We have proposed a self-organizing map (SOM)-based approach for visual surface inspection. the method provides the advantages of unsupervised learning and an intuitive user interface that allows one to very easily set and tune the class boundaries based on observations made on visualization, for example, to adapt to changing conditions or material. there are, however, some problems with a SOM. It does not address the true distances between data, and it has a tendency to ignore rare samples in the training set at the expense of more accurate representation of common samples. In this paper, some alternative methods for a SOM are evaluated. these methods, PCA, MDS, LLE, ISOMAP, and GTM, are used to reduce dimensionality in order to visualize the data. their principal differences are discussed and performances quantitatively evaluated in a few special classification cases, such as in wood inspection using centile features. For the test material experimented with, SOM and GTM outperform the others when classification performance is considered. For data mining kinds of applications, ISOMAP and LLE appear to be more promising methods.

关键词： visual training unsupervised learning lumber inspection projection

来源：评论

学校读者我要写书评

暂无评论

Active learning with support vector machines in the drug discovery process

Active learning with support vector machines in the drug dis...

引用

6th international conference on Chemical Structures

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

Introduction to knowledge management and text mining

Introduction to knowledge management and text mining

引用

IEEE international conference on Fuzzy Systems (FUZZ-IEEE)

作者： R. Krishnapuram IBM India Research Lab New Delhi

来源：评论

学校读者我要写书评

暂无评论

Proceedings. 15th IEEE international conference on Tools with Artificial Intelligence

Proceedings. 15th IEEE International Conference on Tools wit...

引用

international conference on Tools for Artificial Intelligence (ICTAI)

the following topics are discussed: bioinformatics; software engineering with computational intelligence; data mining; evolutionary computing; planning and scheduling; knowledge management and sharing; machine learning; agents; vision and imaging; artificial intelligence in medicine; fuzzy logic; intelligent information retrieval; knowledge representation; satisfiability; computer vision and pattern recognition.

关键词： Knowledge representation Software engineering Fuzzy logic machine vision pattern recognition Information retrieval learning systems

来源：评论

学校读者我要写书评

暂无评论

High performance data mining

引用

5th international conference on High Performance Computing for Computational Science, VECPAR 2002

作者： Kumar, Vipin Joshi, Mahesh V. Han, Eui-Hong Sam Tan, Pang-Ning Steinbach, Michael University of Minnesota 4-192 EE/CSci Building 200 Union Street SE MinneapolisMN55455 United States

ISBN: (纸本)3540008527

Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data[SAD+93, CHY96]. the field of data mining builds upon the ideas from diverse fields such as machine learning, pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional disciplines are often unsuitable due to some unique characteristics of today’s data-sets, such as their enormous sizes, high-dimensionality, and heterogeneity. there is a necessity to develop effective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of two important data mining algorithms: discovery of association rules, and induction of decision trees for classification. We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites that need to be processed to better understand global scale changes in biosphere processes and patterns. © Springer-Verlag Berlin Heidelberg 2003.

关键词： data mining

来源：评论

学校读者我要写书评

暂无评论

An empirical comparison of in-learning and post-learning optimization schemes for tuning the support vector machines in cost-sensitive applications

An empirical comparison of in-learning and post-learning opt...

引用

international conference on Image Analysis and Processing

作者： F. Tortorella DAEIMI Università degli Studi di Cassino Italy

Support vector machines (SVM) are currently one of the classification systems most used in pattern recognition and data mining because of their accuracy and generalization capability. However, when dealing with very complex classification tasks where different errors bring different penalties, one should take into account the overall classification cost produced by the classifier more than its accuracy. It is thus necessary to provide some methods for tuning the SVM on the costs of the particular application. Depending on the characteristics of the cost matrix, this can be done during or after the learning phase of the classifier. In this paper we introduce two optimization schemes based on the two possible approaches and compare their performance on various data sets and kernels. the first experimental results show that both the proposed schemes are suitable for tuning SVM in cost-sensitive applications.

关键词： Support vector machines Costs Support vector machine classification pattern recognition data mining Cancer Error correction Kernel Risk management Classification algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：