We investigate the following datamining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible...
详细信息
We investigate the following datamining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from machinelearning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector machines". this hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.
the Branch & Bound (B&B) algorithm is a globally optimal feature selection method. the high computational complexity of this algorithm is a well-known problem. the B&B algorithm constructs a search tree, a...
详细信息
ISBN:
(纸本)3540140409
the Branch & Bound (B&B) algorithm is a globally optimal feature selection method. the high computational complexity of this algorithm is a well-known problem. the B&B algorithm constructs a search tree, and then searches for the optimal feature subset in the tree. Previous work on the B&B algorithm was focused on how to simplify the search tree in order to reduce the search complexity. Several improvements have already existed. A detailed analysis of basic B&B algorithm and existing improvements is given under a common framework in which all the algorithms are compared. Based on this analysis, an improved B&B algorithm, BBPP+, is proposed. Experimental comparison shows that BBPP+ performs best.
the ever increasing number of image modalities available to doctors for diagnosis purposes has established an important need to develop techniques that support work-load reduction and information maximization. To this...
详细信息
Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convery several salient properties that other methods hardly ...
详细信息
ISBN:
(纸本)9781581137378
Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convery several salient properties that other methods hardly provide. However, despite the prominent properties of SVMs, they are not as favored for large-scale datamining as for patternrecognition or machinelearning because the training complexity of SVMs is highly dependent on the size of a data set. Many real-world datamining applications involve millions or billions of data records where even multiple scans of the entire data are too expensive to perform. this paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learningthe SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy. Copyright 2003 ACM.
Dimensionality reduction methods for visualization map the original high-dimensional data typically into two dimensions. Mapping preserves the important information of the data, and in order to be useful, fulfils the ...
详细信息
ISBN:
(纸本)0819449989
Dimensionality reduction methods for visualization map the original high-dimensional data typically into two dimensions. Mapping preserves the important information of the data, and in order to be useful, fulfils the needs of a human observer. We have proposed a self-organizing map (SOM)-based approach for visual surface inspection. the method provides the advantages of unsupervised learning and an intuitive user interface that allows one to very easily set and tune the class boundaries based on observations made on visualization, for example, to adapt to changing conditions or material. there are, however, some problems with a SOM. It does not address the true distances between data, and it has a tendency to ignore rare samples in the training set at the expense of more accurate representation of common samples. In this paper, some alternative methods for a SOM are evaluated. these methods, PCA, MDS, LLE, ISOMAP, and GTM, are used to reduce dimensionality in order to visualize the data. their principal differences are discussed and performances quantitatively evaluated in a few special classification cases, such as in wood inspection using centile features. For the test material experimented with, SOM and GTM outperform the others when classification performance is considered. For datamining kinds of applications, ISOMAP and LLE appear to be more promising methods.
We investigate the following datamining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible...
详细信息
We investigate the following datamining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from machinelearning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector machines". this hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.
the following topics are discussed: bioinformatics; software engineering with computational intelligence; datamining; evolutionary computing; planning and scheduling; knowledge management and sharing; machine learnin...
详细信息
the following topics are discussed: bioinformatics; software engineering with computational intelligence; datamining; evolutionary computing; planning and scheduling; knowledge management and sharing; machinelearning; agents; vision and imaging; artificial intelligence in medicine; fuzzy logic; intelligent information retrieval; knowledge representation; satisfiability; computer vision and patternrecognition.
Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. data...
详细信息
Support vector machines (SVM) are currently one of the classification systems most used in patternrecognition and datamining because of their accuracy and generalization capability. However, when dealing with very c...
详细信息
Support vector machines (SVM) are currently one of the classification systems most used in patternrecognition and datamining because of their accuracy and generalization capability. However, when dealing with very complex classification tasks where different errors bring different penalties, one should take into account the overall classification cost produced by the classifier more than its accuracy. It is thus necessary to provide some methods for tuning the SVM on the costs of the particular application. Depending on the characteristics of the cost matrix, this can be done during or after the learning phase of the classifier. In this paper we introduce two optimization schemes based on the two possible approaches and compare their performance on various data sets and kernels. the first experimental results show that boththe proposed schemes are suitable for tuning SVM in cost-sensitive applications.
暂无评论