BackgroundSimplified representation of compound databases has several applications in cheminformatics. Herein, we introduce an alternative and general method to build single fingerprint representations of compound dat...
详细信息
BackgroundSimplified representation of compound databases has several applications in cheminformatics. Herein, we introduce an alternative and general method to build single fingerprint representations of compound databases. The approach is inspired on the previously published modal fingerprints that are aimed to capture the most significant bits of a fingerprint representation for a compound data set. The novelty of the herein proposed statistical-based database fingerprint (SB-DFP) is that it is generated based on binomial proportions comparisons taking as reference the distribution of 1 bits on a large representative set of the chemical *** illustrate the Method, SB-DFPs were constructed for 28 epigenetic target data sets retrieved from a recently published epigenomics database of interest in probe and drug discovery. For each target data set, the SB-DFPs were built based on two representative fingerprints of different design using as reference a data set with more than 15 million compounds from ZINC. The application of SB-DFP was illustrated and compared to other methods through association relationships of the 28 epigenetic data sets and similarity searching. It was found that SB-DFPs captured overall, the common features between data sets and the distinct features of each set. In similarity searching SB-DFP equaled or outperformed other approaches for at least 20 out of the 28 ***-DFP is a general approach based on binomial proportion comparisons to represent a compound data set with a single fingerprint. SB-DFP can be developed, at least in principle, based on any fingerprint and reference data set. SB-DFP is a good alternative for exploration of relationships between targets through its associated compound data sets and performing similarity searching.
This perspective discusses the current progress of a chemoinformatics group in a major university in Latin America. Three major aspects are discussed in a critical manner: research, education, and collaboration with i...
详细信息
This perspective discusses the current progress of a chemoinformatics group in a major university in Latin America. Three major aspects are discussed in a critical manner: research, education, and collaboration with industry and other public research networks. It is also presented an overview of the progress in applied research and development of research concepts. Efforts to teach chemoinformatics at the undergraduate and graduate levels are discussed. It is addressed how the partnership with industry and other not-for-profit research institutions not only brings additional sources of funding but, more importantly, increases the impact of the multidisciplinary work and offers the students to be exposed to other research environments. We also discuss the main perspectives and challenges that remain to be addressed in these settings.
Introduction: Activity landscapes are valuable tools for exploring systematically the structure-activity relationships (SAR) of chemical databases. Their application to analyze the SAR of DNA methyltransferase (DNMT) ...
详细信息
Introduction: Activity landscapes are valuable tools for exploring systematically the structure-activity relationships (SAR) of chemical databases. Their application to analyze the SAR of DNA methyltransferase (DNMT) inhibitors, which are attractive compounds as potential epi-drugs or epi-probes, provides useful information to identify pharmacophoric regions and plan the development of predictive models and virtual screening. Areas covered: This paper highlights different approaches for conducting SAR analysis of datasets with a particular focus on the activity landscape methodology. SAR information of DNMT inhibitors (DNMTi), stored in a public database, is surveyed to further illustrate concepts and generalities of activity landscape modeling with a special emphasis on structure-activity similarity (SAS) maps. Expert opinion: The increasing SAR information reported for DNMTi opens up avenues to implement activity landscape methods. Despite several activity landscape methods, such as SAS maps, being well established, these need further refinement. For instance, novel combinations of multiple representations, such as the addition of Z-values of similarity (fusion-Z), lead to more robust representations of consensus SAS maps. Density SAS maps improve the visualization of the SAR. A survey of activity cliffs (i.e., pairs of compounds with high structural similarity but high differences in potency) of DNMTi available in a public database suggest that it is feasible to develop predictive models for non-nucleoside DNMTi using approaches such as quantitative structure-activity relationships and that non-nucleoside DNMTi in ChEMBL can be used as query molecules in similarity-based virtual screening.
Herein is presented a tutorial overview on selected chemoinformatics methods useful for assembling, curating/preparing a chemical database, and assessing its diversity and chemical space. Methods for evaluating the st...
详细信息
Herein is presented a tutorial overview on selected chemoinformatics methods useful for assembling, curating/preparing a chemical database, and assessing its diversity and chemical space. Methods for evaluating the structure–activity relationships (SAR) and polypharmacology are also included. Usage of open source tools is emphasized. Step-by-step KNIME workflows are used for illustrating the methods. The methods described in this chapter are applied onto a chemical database especially relevant for epi-polypharmacology that is an emerging area in drug discovery. However, the methods described herein could be extended to other therapeutic areas and potentially to other areas of chemistry.
暂无评论