chemoinformatics is a scientific area that endeavours to study and solve complex chemical problems using computational techniques and methods. chemoinformatics and Advanced Machine Learning Perspectives: Complex Compu...
详细信息
ISBN:
(数字)9781615209125
ISBN:
(纸本)9781615209118
chemoinformatics is a scientific area that endeavours to study and solve complex chemical problems using computational techniques and methods. chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques provides an overview of current research in machine learning and applications to chemoinformatics tasks. As a timely compendium of research, this book offers perspectives on key elements that are crucial for complex study and investigation.
chemoinformatics is a research field concerned with the study of physical or biological molecular properties through computer science's research fields such as machine learning and graph theory. From this point of...
详细信息
chemoinformatics is a research field concerned with the study of physical or biological molecular properties through computer science's research fields such as machine learning and graph theory. From this point of view, graph kernels provide a nice framework which allows to naturally combine machine learning and graph theory techniques. Graph kernels based on bags of patterns have proven their efficiency on several problems both in terms of accuracy and computational time. Treelet kernel is a graph kernel based on a bag of small subtrees. We propose in this paper several extensions of this kernel devoted to chemoinformatics problems. These extensions aim to weight each pattern according to its influence, to include the comparison of non-isomorphic patterns, to include stereo information and finally to explicitly encode cyclic information into kernel computation. (C) 2014 Elsevier Ltd. All rights reserved.
chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by chemoinforma...
详细信息
chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by chemoinformatics are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning and graph theory techniques. Such kernels prove their efficiency on several chemoinformatics problems and this paper presents two new graph kernels applied to regression and classification problems. The first kernel is based on the notion of edit distance while the second is based on subtrees enumeration. The design of this last kernel is based on a variable selection step in order to obtain kernels defined on parsimonious sets of patterns. Performances of both kernels are investigated through experiments. (C) 2012 Elsevier B.V. All rights reserved.
We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in ch...
详细信息
We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in chemoinformatics and bioinformatics. We initially match a pair of molecules, where one molecule is regarded as random and the other fixed. A type of mixture model is proposed for the point set coordinates, and the parameters of the distribution are a labeling matrix (indicating which pairs of points match) and a concentration parameter. Art important property of the likelihood is that it, is invariant under rotations and translations of tire data. Bayesian inference for tire parameters is carried out using Markov chain Monte Carlo simulation, and it is demonstrated that the procedure works well on the steroid data. The posterior distribution is difficult to simulate from, due to multiple local modes, and we also use additional data (partial charges on atoms) to help with this task. An approximation is considered for speeding up the simulation algorithm, and the approximating fast algorithm leads to essentially identical inference to that trader the exact method for our data. Extensions to multiple molecule alignment are also introduced, and an algorithm is described which also works well on the steroid data set. After all the steroid molecules have been matched, exploratory data analysis is carried out to examine,which molecules are similar. Also, further Bayesian inference for the multiple alignment problem is considered.
作者:
Bajorath, JuergenUniv Bonn
Bonn Aachen Int Ctr Informat Technol Dept Life Sci Informat D-53113 Bonn Germany
The activity cliff concept experiences considerable interest in medicinal chemistry and chemoinformatics. Activity cliffs are defined as pairs or groups of structurally similar or analogous active compounds having lar...
详细信息
The activity cliff concept experiences considerable interest in medicinal chemistry and chemoinformatics. Activity cliffs are defined as pairs or groups of structurally similar or analogous active compounds having large differences in potency. Depending on the research field, views of activity cliffs partly differ. While interpretability and utility of activity cliff information is considered to be of critical importance in medicinal chemistry, large-scale exploration and prediction of activity cliffs are of special interest in chemoinformatics. Much emphasis has recently been put on making activity cliff information accessible for medicinal chemistry applications. Herein, different approaches to the analysis and prediction of activity cliffs are discussed that are of particular relevance from a chemoinformatics viewpoint.
Catalytic functions of proteins are generally described by the EC numbersassigned to the catalyzed chemical *** EC number is simultaneously employed as anidentifier of reactions,enzymes,and enzyme genes,thus linking m...
详细信息
Catalytic functions of proteins are generally described by the EC numbersassigned to the catalyzed chemical *** EC number is simultaneously employed as anidentifier of reactions,enzymes,and enzyme genes,thus linking metabolic and *** chemically meaningful and well established,the EC numbers are based on rulesthat are often ambiguous and heterogeneous among the different types of *** use fordiversity analysis of metabolic reactions (the reactome) is *** the context of genome-scalereconstruction of metabolic pathways,the definition of reaction similarity,and the automaticclassification of enzymatic reactions in terms of EC numbers,are crucial for the similarity-basedproposal of enzyme sequences from their functions.
chemoinformatics strategies to improve drug discovery results With contributions from leading researchers in academia and the pharmaceutical industry as well as experts from the software industry, this book explains h...
详细信息
ISBN:
(数字)9781118742785
ISBN:
(纸本)9781118139103
chemoinformatics strategies to improve drug discovery results With contributions from leading researchers in academia and the pharmaceutical industry as well as experts from the software industry, this book explains how chemoinformatics enhances drug discovery and pharmaceutical research efforts, describing what works and what doesn't. Strong emphasis is put on tested and proven practical applications, with plenty of case studies detailing the development and implementation of chemoinformatics methods to support successful drug discovery efforts. Many of these case studies depict groundbreaking collaborations between academia and the pharmaceutical industry. chemoinformatics for Drug Discovery is logically organized, offering readers a solid base in methods and models and advancing to drug discovery applications and the design of chemoinformatics infrastructures. The book features 15 chapters, including: What are our models really telling us? A practical tutorial on avoiding common mistakes when building predictive models Exploration of structure-activity relationships and transfer of key elements in lead optimization Collaborations between academia and pharma Applications of chemoinformatics in pharmaceutical research—experiences at large international pharmaceutical companies Lessons learned from 30 years of developing successful integrated chemoinformatic systems Throughout the book, the authors present chemoinformatics strategies and methods that have been proven to work in pharmaceutical research, offering insights culled from their own investigations. Each chapter is extensively referenced with citations to original research reports and reviews. Integrating chemistry, computer science, and drug discovery, chemoinformatics for Drug Discovery encapsulates the field as it stands today and opens the door to further advances.
Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowled...
详细信息
Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. ***, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development, mean
In ligand-based screening, retrosynthesis, and other chemoinformatics applications, one often seeks to search large databases of molecules in order to retrieve molecules that are similar to a given query. With the exp...
详细信息
In ligand-based screening, retrosynthesis, and other chemoinformatics applications, one often seeks to search large databases of molecules in order to retrieve molecules that are similar to a given query. With the expanding size of molecular databases, the efficiency and scalability of data structures and algorithms for chemical searches are becoming increasingly important. Remarkably, both the chemoinformatics and information retrieval communities have converged on similar solutions whereby molecules or documents are represented by binary vectors, or fingerprints, indexing their substructures such as labeled paths for molecules and n-grams for text, with the same Jaccard-Tanimoto similarity measure. As a result, similarity search methods from one field can be adapted to the other. Here we adapt recent, state-of-the-art, inverted index methods from information retrieval to speed up similarity searches in chemoinformatics. Our results show a several-fold speed-up improvement over previous methods for both threshold searches and top-K searches. We also provide a mathematical analysis that allows one to predict the level of pruning achieved by the inverted index approach and validate the quality of these predictions through simulation experiments. All results can be replicated using data freely downloadable from http://***/.
暂无评论