Background: Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in p...
详细信息
Background: Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. Results: Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section "Results and discussion") while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section "Results and discussion"), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. Conclusion: In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools
The IUPAC International Chemical Identifier (InChI) provides a method to generate a unique text descriptor of molecular structures. Building on this work, we report a process to generate a unique text descriptor for r...
详细信息
The IUPAC International Chemical Identifier (InChI) provides a method to generate a unique text descriptor of molecular structures. Building on this work, we report a process to generate a unique text descriptor for reactions, RInChI. By carefully selecting the information that is included and by ordering the data carefully, different scientists studying the same reaction should produce the same RInChI. If differences arise, these are most likely the minor layers of the InChI, and so may be readily handled. RInChI provides a concise description of the key data in a chemical reaction, and will help enable the rapid searching and analysis of reaction databases.
Background: Recently, the availability of high-resolution microscopy together with the advancements in the development of biomarkers as reporters of biomolecular interactions increased the importance of imaging method...
详细信息
Background: Recently, the availability of high-resolution microscopy together with the advancements in the development of biomarkers as reporters of biomolecular interactions increased the importance of imaging methods in molecular cell biology. These techniques enable the investigation of cellular characteristics like volume, size and geometry as well as volume and geometry of intracellular compartments, and the amount of existing proteins in a spatially resolved manner. Such detailed investigations opened up many new areas of research in the study of spatial, complex and dynamic cellular systems. One of the crucial challenges for the study of such systems is the design of a well stuctured and optimized workflow to provide a systematic and efficient hypothesis verification. Computer Science can efficiently address this task by providing software that facilitates handling, analysis, and evaluation of biological data to the benefit of experimenters and modelers. Results: The Spatio-Temporal Simulation Environment (STSE) is a set of open-source tools provided to conduct spatio-temporal simulations in discrete structures based on microscopy images. The framework contains modules to digitize, represent, analyze, and mathematically model spatial distributions of biochemical species. Graphical user interface (GUI) tools provided with the software enable meshing of the simulation space based on the Voronoi concept. In addition, it supports to automatically acquire spatial information to the mesh from the images based on pixel luminosity (e. g. corresponding to molecular levels from microscopy images). STSE is freely available either as a stand-alone version or included in the linux live distribution Systems Biology Operational Software (***) and can be downloaded from http://***/. The python source code as well as a comprehensive user manual and video tutorials are also offered to the research community. We discuss main concepts of the STSE design and w
In this paper, we present a FinFET compact model and its associated parameter extraction methodology. This explicit model accounts for all major small geometry effects and allows accurate simulations of both n- and p-...
详细信息
ISBN:
(纸本)9788392875604
In this paper, we present a FinFET compact model and its associated parameter extraction methodology. This explicit model accounts for all major small geometry effects and allows accurate simulations of both n- and p-type FinFETs. The model core is physics-based (long-channel model) and some semiempirical corrections are introduced in order to accurately simulate the behavior of ultrashort (L = 25 nm) and ultrathin (W-Si =3 nm) FinFETs. The parameter extraction relies on a software suite allowing an automatic parameter extraction. In this work, the development of our parameter extraction procedure is based on 3-D simulation results. The optimization of parameters related to quantum effects, short-channel effects and channel length modulation illustrates the methodology of parameter extraction. Finally, we compare the FinFET characteristics (drain current and small signal parameters) obtained by our explicit compact model with 3-D numerical simulations for different Fin widths and channel lengths.
Background: Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit share the same core functionality but support different sets of file formats and forcefields, and calculate different fingerpri...
详细信息
Background: Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit share the same core functionality but support different sets of file formats and forcefields, and calculate different fingerprints and descriptors. Despite their complementary features, using these toolkits in the same program is difficult as they are implemented in different languages (C++ versus Java), have different underlying chemical models and have different application programming interfaces (APIs). Results: We describe Cinfony, a python module that presents a common interface to all three of these toolkits, allowing the user to easily combine methods and results from any of the toolkits. In general, the run time of the Cinfony modules is almost as fast as accessing the underlying toolkits directly from C++ or Java, but Cinfony makes it much easier to carry out common tasks in cheminformatics such as reading file formats and calculating descriptors. Conclusion: By providing a simplified interface and improving interoperability, Cinfony makes it easy to combine complementary features of OpenBabel, the CDK and the RDKit.
Background: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the ...
详细信息
Background: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process. Results: Vestige was applied to a reference dataset of the SCL locus from four species and provided clear identification of the known conserved regions in this dataset. To demonstrate the flexibility to use diverse models of molecular evolution and dissect the nature of the evolutionary process Vestige was used to footprint the Ka/Ks ratio in primate BRCA1 with a codon model of evolution. Two regions of putative adaptive evolution were identified illustrating the ability of Vestige to represent the spatial distribution of distinct molecular evolutionary processes. Conclusion: Vestige provides a flexible, open platform for phylogenetic footprinting. Underpinned by the PyEvolve toolkit, Vestige provides a framework for visualising the signatures of evolutionary processes across the genome of numerous organisms simultaneously. By exploiting the maximum-likelihood statistical framework, the complex interplay between mutational processes, DNA repair and se
暂无评论