A single-nucleotide polymorphism (SNP) is a single base change in the DNA sequence and is the most common polymorphism. Since some SNPs have a major influence on disease susceptibility, detecting SNPs plays an importa...
详细信息
ISBN:
(纸本)9781467352345
A single-nucleotide polymorphism (SNP) is a single base change in the DNA sequence and is the most common polymorphism. Since some SNPs have a major influence on disease susceptibility, detecting SNPs plays an important role in biomedical research. To take fully advantage of the next-generation sequencing (NGS) technology and detect SNP more effectively, we propose a Bayesian approach that computes a posterior probability of hidden nucleotide variations at each covered genomic position. The position with higher posterior probability of hidden nucleotide variation has a higher chance to be a SNP. We apply the proposed method to detect SNPs in two cell lines: the prostate cancer cell line PC3 and the embryonic stem cell line H1. A comparison between our results with dbSNP database shows a high ratio of overlap (≥95 %). The positions that are called only under our model but not in dbSNP may serve as candidates for new SNPs.
Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only e...
Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only extreme heterogeneity affects phylogenetic accuracy and suggest that violations of other model assumptions, such as variable rates among sites, are more problematic. In order to explore the interaction between compositional heterogeneity and variable rates among sites, I reanalyzed 3 real heterogeneous datasets using several models. My Bayesian inference recovers accurate topologies under variable rates-among-sites models, but fails under some models that account for compositional heterogeneity. I also ran simulations and found that accounting for rates among sites improves topology accuracy in compositionally heterogeneous data. This indicates that in some cases, models accounting for among-site rate variation can improve outcomes for data that violates the assumption of compositional homogeneity.
The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar pro...
详细信息
The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar properties on the residue-level, such as the distances between two residues and the angles formed by short sequences of residues, can be equally important for structural analysis and modeling, but these have not been examined and documented on a similar scale. While these properties are difficult to measure experimentally, they can be statistically estimated in meaningful ways based on their distributions in known proteins structures. Residue-level structural properties including various types of residue distances and angles are estimated statistically. A software package is built to provide direct access to the statistical data for the properties including some important correlations not previously investigated. The distributions of residue distances and angles may vary with varying sequences, but in most cases, are concentrated in some high probability ranges, corresponding to their frequent occurrences in either α-helices or β-sheets. Strong correlations among neighboring residue angles, similar to those between neighboring torsion angles at the atomic-level, are revealed based on their statistical measures. Residue-level statistical potentials can be defined using the statistical distributions and correlations of the residue distances and angles. Ramachandran-like plots for strongly correlated residue angles are plotted and analyzed. Their applications to structural evaluation and refinement are demonstrated. With the increase in both number and quality of known protein structures, many structural properties can be derived from sets of protein structures by statistical analysis and data mining, and these can even be used as a supplement to the experimental data for structure determinations. Indeed, the statistical measures on various types of residue d
Microbial evolution is complex and is influenced by many sources of variation. Experimental evolution is no exception, although it is more controlled, easily replicated, and typically devoid of interactions between sp...
Microbial evolution is complex and is influenced by many sources of variation. Experimental evolution is no exception, although it is more controlled, easily replicated, and typically devoid of interactions between species. Mathematical modeling of the evolutionary process can help in understanding the underlying mechanisms that drive outcome of such experiments. These models can be complex and parameter rich, limiting their feasibility for statistical inference. In this paper, we introduce the use of Approximate Bayesian Computation (ABC) as a tool for statistical inference in the study of experimental evolution. ABC is a fast and simple method for fitting complex models to data. We utilize this method, coupled with a mechanistic model of experimental evolution, to study the evolution process of bacteriophage ϕ X174 under benign selection pressure. Our results highlight three mutation-selection scenarios that could explain this process: high mutation/low selection pressure, low mutation/high selection pressure, and low mutation/low selection pressure, with posterior support of 19%, 9.5%, and 71.5% for each of these scenarios, respectively. Sequence data support the first candidate. Though surprising, this scenario was not improbable based on our analysis.
Many computational approaches have been developed and used for sampling protein conformations near the native state. However, it has been difficult to evaluate the quality of the conformations sampled or to compare th...
详细信息
Ensembles have been increasingly used to represent the heterogeneity of protein native states and there is a number of exciting recent work that determines such ensembles using experimental *** what extent these ensem...
详细信息
Ensembles have been increasingly used to represent the heterogeneity of protein native states and there is a number of exciting recent work that determines such ensembles using experimental *** what extent these ensembles represent the native states is debatable since the ensemble,which may contain over a hundred conformations,may be underconstrained by the experimental *** this work,we introduce a new feature,the
We consider the problem of reconstructing a maximally parsimonious history of network evolution under models that support gene duplication and loss and independent interaction gain and loss. We introduce a combinatori...
详细信息
P.R.E.S.S. is an R package developed to allow researchers to get access to and manipulate on a large set of statistical data on protein residue-level structural properties such as residue-level virtual bond lengths, v...
详细信息
P.R.E.S.S. is an R package developed to allow researchers to get access to and manipulate on a large set of statistical data on protein residue-level structural properties such as residue-level virtual bond lengths, virtual bond angles, and virtual torsion angles. A large set of high-resolution protein structures are downloaded and surveyed. Their residue-level structural properties are calculated and documented. The statistical distributions and correlations of these properties can be queried and displayed. Tools are also provided for modeling and analyzing a given structure in terms of its residue-level structural properties. In particular, new tools for computing residue-level statistical potentials and displaying residue-level Ramachandran-like plots are developed for structural analysis and refinement. P.R.E.S.S. will be released in R as an open source software package, with a user-friendly GUI, accessible and executable by a public user in any R environment.
暂无评论