Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, s...
详细信息
Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence, structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction. We present an approach to learning and using probabilistic graphical models of residue coupling (GMRCs). These models capture significant conservation and coupling constraints observable in a multiply aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studying both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that GMRCs provide a powerful tool for uncovering, representing, and utilizing significant sequence-structure-function relationships in protein families.
Due to the low complexity associated with their sequences, uncovering the evolutionary and functional relationships in highly repetitive proteins such as elastin, spider silks, resilin and abductin represents a signif...
详细信息
Due to the low complexity associated with their sequences, uncovering the evolutionary and functional relationships in highly repetitive proteins such as elastin, spider silks, resilin and abductin represents a significant challenge. Using the polymeric extracellular protein elastin as a model system, we present a novel computational approach to the study of sequence, function and evolutionary relationships in repetitive proteins. To address the absence of accurate sequence annotation for repetitive proteins such as elastin, we have constructed a new database repository, ElastoDB (http://***/elastin), dedicated to the storage and retrieval of elastin sequence- and meta-data. To analyse their sequencerelationships we have devised an innovative new method, based on the identification of overrepresented 'fuzzy' motifs. Applying this method to elastin sequences derived from mammals, chicken, Xenopus and zebrafish resulted in the identification of both highly conserved, and taxon and species specific motifs that likely represent important functional and/or structural elements. The relative spacing and organization of these elements suggest that exon duplication events have played an important role in the evolution of elastin. Clustering of similarity profiles generated for sets of exons and introns, revealed a pattern of putative duplication events involving exons 15-30 in mammalian and chicken elastins, exons 20-31 in both zebrafish elastins, exons 15-20 in fugu elastin and exons 35-50 in Xenopus elastin 1. The success of this approach for elastin offers a promising route to the elucidation of sequence, structure, function and evolutionary relationships for many other proteins with sequences of low complexity. (c) 2007 Elsevier B.V./International Society of Matrix Biology. All rights reserved.
The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Cu...
详细信息
The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
The alpha/beta-hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence-structure-function relationships, the Lipase Engineering Database () was updated. Ove...
详细信息
The alpha/beta-hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence-structure-function relationships, the Lipase Engineering Database () was updated. Overall, 280 638 protein sequences and 1557 protein structures were analysed. All alpha/beta-hydrolases consist of the catalytically active core domain, but they might also contain additional structural modules, resulting in 12 different architectures: core domain only, additional lids at three different positions, three different caps, additional N- or C-terminal domains and combinations of N- and C-terminal domains with caps and lids respectively. In addition, the alpha/beta-hydrolases were distinguished by their oxyanion hole signature (GX-, GGGX- and Y-types). The N-terminal domains show two different folds, the Rossmann fold or the beta-propeller fold. The C-terminal domains show a beta-sandwich fold. The N-terminal beta-propeller domain and the C-terminal beta-sandwich domain are structurally similar to carbohydrate-binding proteins such as lectins. The classification was applied to the newly discovered polyethylene terephthalate (PET)-degrading PETases and MHETases, which are core domain alpha/beta-hydrolases of the GX- and the GGGX-type respectively. To investigate evolutionary relationships, sequence networks were analysed. The degree distribution followed a power law with a scaling exponent gamma = 1.4, indicating a highly inhomogeneous network which consists of a few hubs and a large number of less connected sequences. The hub sequences have many functional neighbours and therefore are expected to be robust toward possible deleterious effects of mutations. The cluster size distribution followed a power law with an extrapolated scaling exponent tau = 2.6, which strongly supports the connectedness of the sequence space of alpha/beta-hydrolases. Database Supporting data about domains from other proteins with structural similarity to the N- or C-ter
For high-throughput structural genomic and evolutionary bioinformatics approaches, there is a clear need for fast methods to evaluate substitutions structurally. Coarse-grained methods are both powerful and fast, and ...
详细信息
For high-throughput structural genomic and evolutionary bioinformatics approaches, there is a clear need for fast methods to evaluate substitutions structurally. Coarse-grained methods are both powerful and fast, and a coarse-grained approach to position the substituted side chains is presented. Through the application of a coarse-grained method, a speed-up on the single- residue replacement, of at least sevenfold is achieved compared with modern all-atom approaches. At the same time, this approach maintains a small median RMSD from the leading all-atom approach (as measured in coarse-grained space), and predicts the conformation of point mutants with similar accuracy and generates biologically realistic side chain angles. This method is also substantially more predictable in its run time, making it useful for high-throughput studies of protein structural evolution. To demonstrate the utility of this method, it has been implemented in a forward simulation of sequences threaded through the SH2 domains, with selective pressures to fold and bind specifically. The relative substitution rates across the protein structure and at the binding interface are reflective of those observed in SH2 domain evolution. The algorithm has been implemented in C++, with the source code and binaries (currently supported for Linux systems) freely available as SARA at http://***/LiberlesGroup/SARA.
Bio3D-web is an online application for the interactive analysis of sequence-structure-dynamics relationships in user-defined protein structure sets. Major functionality includes structure database searching, sequence ...
详细信息
Bio3D-web is an online application for the interactive analysis of sequence-structure-dynamics relationships in user-defined protein structure sets. Major functionality includes structure database searching, sequence and structure conservation assessment, inter-conformer relationship mapping and clustering with principal component analysis (PCA), and flexibility prediction and comparison with ensemble normal mode analysis (eNMA). Collectively these methods allow users to start with a single sequence or structure and characterize the structural, conformational, and internal dynamic properties of homologous proteins for which there are high-resolution structures available. functionality is also provided for the generation of custom PDF, Word, and HTML analysis reports detailing all user-specified analysis settings and corresponding results. Bio3D-web is available at http://***/bio3d/webapps, as a Docker image https://***/r/bio3d/bio3d-web/, or downloadable source code https://***/Grantlab/bio3d-web. less
暂无评论