To identify the characteristics and networking of biological macromolecules is one of the main works of current biology. While the solved structures are available, the three major factors which researchers concern abo...
详细信息
To identify the characteristics and networking of biological macromolecules is one of the main works of current biology. While the solved structures are available, the three major factors which researchers concern about biological macromolecules are fundamental principles, structures, and functions. Revealing how fundamental principles in physics drive macromolecules folding into their correct structures, how structures possess the characteristics for binding ligands and interacting with each other, and how structures cooperate with fundamental principles together to guide macromolecules function correctly are the main tasks nowadays. Among biological macromolecules, proteins participate in virtually every process within cells. Hence, topics in related fields are in great demand today. In this work, we tried to approach the physic and structural basis underlying interactions between protein and macromolecules/other ligands.
In summary, technically we have developed two complementary structural-alphabet-based methods for approaching structural motif discovery problems: (i) a fully automated strategy to find structural motifs across protein families without requiring a query motif or essential residues and (ii) a strategy using descriptors of key components defined from known motifs to find structurally and functionally equivalent regions in other protein families. We also combined the first strategy with a method based on electrostatic stabilization and evolutionary conservation to illustrate the usefulness for detecting binding sites. Biologically, we pointed out a local structural unit stabilized by conserved intra-interactions employed as the core region for specific function in a known motif can be also found for the same purposes in other proteins with different folds. These kinds of units were defined as key components, such as the ‘corner’ architecture in helix-turn-helix motif and the ‘βα’ components in Rossmann fold domains. The results suggest that
Scientific progress in recent years has led to the generation of huge amounts of biological data, most of which remains unanalyzed. Mining the data may provide insights into various realms of biology, such as finding ...
详细信息
Scientific progress in recent years has led to the generation of huge amounts of biological data, most of which remains unanalyzed. Mining the data may provide insights into various realms of biology, such as finding co-occurring biosequences, which are essential for biological data mining and analysis. Data mining techniques like sequential pattern mining may reveal implicitly meaningful patterns among the DNA or protein sequences. Pattern mining for biological sequences is an important problem in bioinformatics and computational biology. Sequential pattern mining can reveal all-length motifs in biological sequences. Performing sequential pattern mining on biological sequences helps reveal implicit motifs/patterns, which are usually of functional significance and have specific structures. If biologists hope to uncover the potential of sequential pattern mining in their field, it is necessary to move away from traditional sequential pattern mining algorithms since these algorithms have difficulty in handling small alphabets and long sequence lengths in biological data, such as gene and protein sequences. To tackle the problem, this dissertation proposes an approach called Depth-First SPelling (DFSP) algorithm for mining sequential patterns in biological sequences. DFSP is a general model for mining sequential patterns of biological sequences. The algorithm’s processing speed is faster than that of PrefixSpan, its leading competitor, and DFSP is superior to other sequential pattern mining algorithms for biological ***, gap constraints are important in computational biology since they cope with irrelative regions, which are not conserved in evolution. An approach is devised to efficiently mine sequential patterns (motifs) of biological sequences with gap constraints in this dissertation. The approach is called the Depth-First Spelling algorithm for mining sequential patterns with Gap constraints in biological sequences (referred to as DFSG). DFSG is
暂无评论