High-throughput sequencing and functional genomics technologies have given us the human genome sequence as well as those of other experimentally, medically, and agriculturally important species, thus enabling large-sc...
ISBN:
(数字)9781848161092
ISBN:
(纸本)9781848161085
High-throughput sequencing and functional genomics technologies have given us the human genome sequence as well as those of other experimentally, medically, and agriculturally important species, thus enabling large-scale genotyping and gene expression profiling of human populations. Databases containing large numbers of sequences, polymorphisms, structures, metabolic pathways, and gene expression profiles of normal and diseased tissues are rapidly being generated for human and model organisms. bioinformatics is therefore gaining importance in the annotation of genomic sequences; the understanding of the interplay among and between genes and proteins; the analysis of the genetic variability of species; the identification of pharmacological targets; and the inference of evolutionary origins, mechanisms, and relationships. This proceedings volume contains an up-to-date exchange of knowledge, ideas, and solutions to conceptual and practical issues of bioinformatics by researchers, professionals, and industry practitioners at the 6th Asia-Pacific bioinformatics Conference held in Kyoto, Japan, in January 2008.
The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data min...
详细信息
ISBN:
(纸本)1860946232
The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile.
The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction me...
详细信息
ISBN:
(纸本)1860946232
The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombination occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the non-recombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.
In this paper, we give a complete characterization of the existence of a galled-tree network in the form of simple sufficient and necessary conditions. As a by-product we obtain as simple algorithm for constructing ga...
详细信息
ISBN:
(纸本)1860946232
In this paper, we give a complete characterization of the existence of a galled-tree network in the form of simple sufficient and necessary conditions. As a by-product we obtain as simple algorithm for constructing galled-tree networks. We also introduce a new necessary condition for the existence of a galled-tree network similar to bi-convexity.
The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of "functional polymorphism" in g...
详细信息
ISBN:
(纸本)1860946232
The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of "functional polymorphism" in genes that is defined as regulatory variation measured in transcription levels due to the cis-acting sequence differences. These regulatory variants are assumed to contribute to modulating gene functions. However, computational identifications of such functional cis-regulatory variants is a much greater challenge than just identifying consensus sequences, because cis-regulatory variants differ by only a few bases from the main consensus sequences, while they have important consequences for organismal phenotype. None of the previous studies have directly addressed this problem. We propose a novel discriminative detection method for precisely identifying transcription factor binding sites and their functional variants from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor) based on the genome-wide location data. Our goal is to find such discriminative substrings that best explain the location data in the sense that the substrings precisely discriminate the positive samples from the negative ones rather than finding the substrings that are simply over-represented among the positive ones. Our method consists of two steps: First, we apply a decision tree learning method to discover discriminative substrings and a hierarchical relationship among them. Second, we extract a main motif and further a second motif as a cis-regulatory variant by utilizing functional annotations. Our genome-wide experimental results on yeast Saccharomyces cerevisiae show that our method presented significantly better performances for detecting experimentally verified consensus sequences than current motif detecting methods. In addition, our method has successfully discovered second motifs of putative functional cis-regulato
Clustering is widely used in gene expression analysis, which helps to group genes with similar biological function together. The traditional clustering techniques are not suitable to be directly applied to gene expres...
详细信息
ISBN:
(纸本)1860946232
Clustering is widely used in gene expression analysis, which helps to group genes with similar biological function together. The traditional clustering techniques are not suitable to be directly applied to gene expression time series data, because of the inhered properties of local regulation and time shift. In order to cope with the existing problems, the local similarity and time shift, we have developed a new similarity measurement technique called Local Similarity Combination in this paper. And at last, we'll run our method on the real gene expression data and show that it works well.
Automated discovery and extraction of biological relations from online documents, particularly MEDLINE texts, has become essential and urgent because such literature data are accumulated in a tremendous growth. In thi...
详细信息
ISBN:
(纸本)1860946232
Automated discovery and extraction of biological relations from online documents, particularly MEDLINE texts, has become essential and urgent because such literature data are accumulated in a tremendous growth. In this paper, we present an ontology-based framework of biological relation extraction system. This framework is unified and able to extract several kinds of relations such as gene-disease, gene-gene, and protein-protein interactions etc. The main contributions of this paper are that we propose a two-level pattern learning algorithm and organize patterns hierarchically.
The factors governing codon and amino acid usages in the predicted protein-coding sequences of Tropheryma whipplei TW08/27 and Twist genomes have been analyzed. Multivariate analysis identifies the replicational-trans...
详细信息
ISBN:
(纸本)1860946232
The factors governing codon and amino acid usages in the predicted protein-coding sequences of Tropheryma whipplei TW08/27 and Twist genomes have been analyzed. Multivariate analysis identifies the replicational-transcriptional selection coupled with DNA strand-specific asymmetric mutational bias as a major driving force behind the significant inter-strand variations in synonymous codon usage patterns in T. whipplei genes, while a residual intra-strand synonymous codon bias is imparted by a selection force operating at the level of translation. The strand-specific mutational pressure has little influence on the amino acid usage, for which the mean hydropathy level and aromaticity are the major sources of variation, both having nearly equal impact. In spite of the intracellular fife-style, the amino acid usage in highly expressed gene products of T whipplei follows the cost-minimization hypothesis. Both the genomes under study are characterized by the presence of two distinct groups of membrane-associated genes, products of which exhibit significant differences in primary and potential secondary structures as well as in the propensity of protein disorder.
The circadian regulatory network is one of the main topics of plant investigations. The intracellular interactions among genes in response to the environmental stimuli of fight are related to the foundation of functio...
详细信息
ISBN:
(纸本)1860946232
The circadian regulatory network is one of the main topics of plant investigations. The intracellular interactions among genes in response to the environmental stimuli of fight are related to the foundation of functional genomics in plant. However, the sensitivity analysis of the circadian system has not analyzed by perturbed stochastic dynamic model via microarray data in plant. In this study, the circadian network is constructed for Arabidopsis thaliana using a stochastic dynamic model with sigmoid interaction, activation delay, and regulation of input light taken into consideration. The describing function method in nonlinear control theory about nonlinear limit cycle (oscillation) is employed to interpret the oscillations of the circadian regulatory networks from the viewpoint that nonlinear network will continue to oscillate if its feedback loop gain is equal to 1 to support the oscillation of circadian network. Based on the dynamic model via microarray data, the system sensitivity analysis is performed to assess the robustness of circadian regulatory network via biological perturbations. We found that the circadian network is more sensitive to the perturbation of the trans-expression threshold, is more sensitive to the activation level of steady state, rather than the trans-sensitivity rate.
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences a...
详细信息
ISBN:
(纸本)1860946232
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences are available. One of the major determinants of such trends is the G + C content at third codon positions of the genes, the average value of which varied from one viral genome to other depending on the overall mutational bias of the species. G(3S) and C-3S interacted synergistically along the first principal axis of Correspondence analysis, on the Relative Synonymous Codon Usage of adenoviral genes, but antagonistically along the second principal axis. Other major determinants of the trends are the natural selection, putatively operative at the level of translation and quite interestingly, hydropathy of the encoded proteins. The trends in codon usage, though characterized by distinct virus-specific mutational bias, do not exhibit any sign of host-specificity. Significant variations are observed in synonymous codon choice in structural and nonstructural genes of adenoviruses.
暂无评论