Transcriptional regulation of gene expression is enacted mainly through binding of transcription factors (TFs) to specific, short DNA sites in cis-regulatory regions of genes. Most TFs are members of protein families ...
Transcriptional regulation of gene expression is enacted mainly through binding of transcription factors (TFs) to specific, short DNA sites in cis-regulatory regions of genes. Most TFs are members of protein families that share a common DNA-binding domain and thus recognize similar DNA-binding sequences. It is not well understood why paralogous TFs often bind different genomic target sitesin vivoto effect different regulatory programs, despite apparently recognizing the same sequence motifs. Here, we designed custom protein-binding microarrays (PBMs) to analyze the DNA-binding specificities of twoSaccharomyces cerevisiaebasic helix-loop-helix (bHLH) proteins, Tye7 and Cbf1, as a model system. Our data reveal that E-box DNA-binding sequences (CAnnTG), when tested in the context of their native genomic flanking sequences, are bound differently by Cbf1 and Tye7. Computational models of the PBM data indicate that DNA sequence features located in the genomic sequences outside the E-box contribute to DNA-binding specificityin vitro. Our analyses suggest that these flanking regions affect DNA-binding specificity indirectly by influencing the three-dimensional structure of the E-box binding sites. Finally, we show that these subtle differences in intrinsic sequence preferences of Cbf1 and Tye7in vitrohelp to explain their differential DNA-binding preferencesin vivo. Our results provide further evidence that the local shape of DNA-binding sites may be an important feature in distinguishing the DNA-binding preferences among paralogous TFs and thus may play a widespread role in determining how transcriptional regulatory specificity within TF families is achieved.
The tremendous demand on computer memory and computing time for prediction of complex secondary structures limits the applicability of most RNA secondary structure prediction programs available to short RNA sequences....
详细信息
ISBN:
(纸本)9781450316705
The tremendous demand on computer memory and computing time for prediction of complex secondary structures limits the applicability of most RNA secondary structure prediction programs available to short RNA sequences. We propose to approach this problem by segmenting a long RNA sequence into shorter non-overlapping chunks, predicting the secondary structures of each chunk individually, and then assembling the prediction results to give the structure of the original sequence. The selection of cutting points is a crucial component of the approach. Noting that stem-loops and pseudoknots always contain an inversion, we developed two cutting methods, the centered and optimized methods, for segmenting long RNA sequences based on inversion excursions. For the majority of the sequences in a dataset of 50 RNAs from the RFAM database, the prediction algorithm PKnotsRG used with these cutting methods produces more accurate secondary structures than those predicted for the whole sequence without segmentation. Both the centered and optimized cutting methods outperform the naïve regular segmentation. These results support our claim that cutting is a promising approach for the prediction of long RNA sequences, and choosing the cutting points intelligently by considering sequence features such as inversion excursions can further enhance prediction accuracy.
Numerous different algorithmic approaches have been developed to map the short-reads produced by next-generation sequencing technologies onto reference genome sequences. When sufficiently close reference genomes do no...
详细信息
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one v...
详细信息
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.
Many computational approaches have been developed and used for sampling protein conformations near the native state. However, it has been difficult to evaluate the quality of the conformations sampled or to compare th...
详细信息
Background: Many recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable prom...
详细信息
Background: Many recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable promise in the domain of modularity detection is the Newman and Girvan (NG) algorithm, which relies on the number of shortest-paths across pairs of vertices in the network traversing a given edge, referred to as the betweenness of that edge. The edge with the highest betweenness is iteratively eliminated from the network, with the betweenness of the remaining edges recalculated in every iteration. This generates a complete dendrogram, from which modules are extracted by applying a quality metric called modularity denoted by Q. This exhaustive computation can be prohibitively expensive for large networks such as Protein-Protein Interaction Networks. In this paper, we present a novel optimization to the modularity detection algorithm, in terms of an efficient termination criterion based on a target edge betweenness value, using which the process of iterative edge removal may be terminated. Results: We validate the robustness of our approach by applying our algorithm on real-world protein-protein interaction networks of Yeast, *** and Drosophila, and demonstrate that our algorithm consistently has significant computational gains in terms of reduced runtime, when compared to the NG algorithm. Furthermore, our algorithm produces modules comparable to those from the NG algorithm, qualitatively and quantitatively. We illustrate this using comparison metrics such as module distribution, module membership cardinality, modularity Q, and Jaccard Similarity Coefficient. Conclusions: We have presented an optimized approach for efficient modularity detection in networks. The intuition driving our approach is the extraction of holistic measures of centrality from graphs, which are representative of inherent modular structure of the underlying network, and the applic
The rate of human death and morbidity due to malaria is increasing in many parts of the developing countries. Thus, there is a great need to understand the critical pathways in malaria parasite in order to develop eff...
详细信息
Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models gener...
详细信息
Protein-RNA interactions play important roles in cellular processes like protein synthesis, RNA processing, and gene expression regulation. Reliable identification of the interfaces involved in RNA-protein interaction...
详细信息
暂无评论