The circadian regulatory network is one of the main topics of plant investigations. The intracellular interactions among genes in response to the environmental stimuli of fight are related to the foundation of functio...
详细信息
ISBN:
(纸本)1860946232
The circadian regulatory network is one of the main topics of plant investigations. The intracellular interactions among genes in response to the environmental stimuli of fight are related to the foundation of functional genomics in plant. However, the sensitivity analysis of the circadian system has not analyzed by perturbed stochastic dynamic model via microarray data in plant. In this study, the circadian network is constructed for Arabidopsis thaliana using a stochastic dynamic model with sigmoid interaction, activation delay, and regulation of input light taken into consideration. The describing function method in nonlinear control theory about nonlinear limit cycle (oscillation) is employed to interpret the oscillations of the circadian regulatory networks from the viewpoint that nonlinear network will continue to oscillate if its feedback loop gain is equal to 1 to support the oscillation of circadian network. Based on the dynamic model via microarray data, the system sensitivity analysis is performed to assess the robustness of circadian regulatory network via biological perturbations. We found that the circadian network is more sensitive to the perturbation of the trans-expression threshold, is more sensitive to the activation level of steady state, rather than the trans-sensitivity rate.
Microarray gene expression data often contains missing values resulted from various reasons. However, most of the gene expression data analysis algorithms, such as clustering, classification and network design, requir...
详细信息
ISBN:
(纸本)1860946232
Microarray gene expression data often contains missing values resulted from various reasons. However, most of the gene expression data analysis algorithms, such as clustering, classification and network design, require complete information, that is, without any missing values. It is therefore very important to accurately impute the missing values before applying the data analysis algorithms. In this paper, an Iterated Local Least Squares Imputation method (ILLsimpute) is proposed to estimate the missing values. In ILLsimpute, a similarity threshold is learned using known expression values and at every iteration it is used to obtain a set of coherent genes for every target gene containing missing values. The target gene is then represented as a linear combination of the coherent genes, using the least squares. The algorithm terminates after certain iterations or when the imputation converges. The experimental results on real microarray datasets show that ILLsimpute outperforms three most recent methods on several commonly tested datasets.
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences a...
详细信息
ISBN:
(纸本)1860946232
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences are available. One of the major determinants of such trends is the G + C content at third codon positions of the genes, the average value of which varied from one viral genome to other depending on the overall mutational bias of the species. G(3S) and C-3S interacted synergistically along the first principal axis of Correspondence analysis, on the Relative Synonymous Codon Usage of adenoviral genes, but antagonistically along the second principal axis. Other major determinants of the trends are the natural selection, putatively operative at the level of translation and quite interestingly, hydropathy of the encoded proteins. The trends in codon usage, though characterized by distinct virus-specific mutational bias, do not exhibit any sign of host-specificity. Significant variations are observed in synonymous codon choice in structural and nonstructural genes of adenoviruses.
We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics over R-n. The inputs to the algorithm are a set, U, of unlabeled points in R-n, a set of pairs of points, S = {(x,y)i};x,y is an el...
详细信息
ISBN:
(纸本)1860946232
We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics over R-n. The inputs to the algorithm are a set, U, of unlabeled points in R-n, a set of pairs of points, S = {(x,y)i};x,y is an element of U, that are known to be similar, and a set of pairs of points, D = {(x, y)i};x, y is an element of U, that are known to be dissimilar. The algorithm randomly samples S, D, and m-dimensional subspaces of R-n and learns a metric for each subspace. The metric over R-n is a linear combination of the subspace metrics. The randomization addresses issues of efficiency and overfitting. Extensions of the algorithm to learning non-linear metrics via kernels, and as a pre-processing step for dimensionality reduction are discussed. The new method is demonstrated on a regression problem (structure-based chemical shift prediction) and a classification problem (predicting clinical outcomes for immunomodulatory strategies for treating severe sepsis).
The inference of evolutionary relationships is usually aided by a reconstruction method which is expected to produce a reasonably accurate estimation of the true evolutionary history. However, various factors are know...
详细信息
ISBN:
(纸本)1860946232
The inference of evolutionary relationships is usually aided by a reconstruction method which is expected to produce a reasonably accurate estimation of the true evolutionary history. However, various factors are known to impede the reconstruction process and result in inaccurate estimates of the true evolutionary relationships. Detecting and removing errors (wrong branches) from tree estimates bear great significance on the results of phylogenetic analyses. Methods have been devised for assessing the support of (or confidence in) phylogenetic tree branches, which is one way of quantifying inaccuracies in trees. In this paper, we study, via simulations, the performance of the most commonly used methods for assessing branch support: bootstrap of maximum likelihood and maximum parsimony trees, consensus of maximum parsimony trees, and consensus of Bayesian inference trees. Under the conditions of our experiments, our findings indicate that the actual amount of change along a branch does not have strong impact on the support of that branch. Further, we find that bootstrap and Bayesian estimates are generally comparable to each other, and superior to a consensus of maximum parsimony trees. In our opinion, the most significant finding of all is that there is no threshold value for any of the methods that would allow for the elimination of wrong branches while maintaining all correct ones-there are always weakly supported true positive branches.
Finding motifs in DNA sequences plays an important role in deciphering transcriptional regulatory mechanisms and drug target identification. In this paper, we propose an efficient algorithm, EDAM, for finding motifs b...
详细信息
ISBN:
(纸本)1860946232
Finding motifs in DNA sequences plays an important role in deciphering transcriptional regulatory mechanisms and drug target identification. In this paper, we propose an efficient algorithm, EDAM, for finding motifs based on frequency transformation and Minimum Bounding Rectangle (MBR) techniques. It works in three phases,frequency transformation, MBR-clique searching and motif discovery. In frequency transformation, EDAM divides the sample sequences into a set of substrings by sliding windows, then transforms them to frequency vectors which are stored in MBRs. In MBR-clique searching, based on the frequency distance theorems EDAM searches for MBR-cliques used for motif discovery. In motif discovery, EDAM discovers larger cliques by extending smaller cliques with their neighbors. To accelerate the clique discovery, we propose a range query facility to avoid unnecessary computations for clique extension. The experimental results illustrate that EDAM well solves the running time bottleneck of the motif discovery problem in large DNA database.
Genome-wide computational analysis for small nuclear RNA (snRNA) genes resulted in identification of 76 and 73 putative snRNA genes from indica and japonica rice genomes, respectively. We used the basic criteria of a ...
详细信息
ISBN:
(纸本)1860946232
Genome-wide computational analysis for small nuclear RNA (snRNA) genes resulted in identification of 76 and 73 putative snRNA genes from indica and japonica rice genomes, respectively. We used the basic criteria of a minimum of 70% sequence identity to the plant snRNA gene used for genome search, presence of conserved promoter elements: TATA box, USE motif and monocot promoter specific elements (MSPs) and extensive sequence alignment to rice / plant expressed sequence tags to denote predicted sequence as snRNA genes. Comparative sequence analysis with snRNA genes from other organisms and predicted secondary structures showed that there is overall conservation of snRNA sequence and structure with plant specific features (presence of TATA box in both polymerase II and III transcribed genes, location of USE motif upstream to the TATA box at fixed but different distance in polymerase II and polymerase III transcribed snRNA genes) and the presence of multiple monocot specific MSPs upstream to the USE motif. Detailed analysis results including all multiple sequence alignments, sequence logos, secondary structures, sequences etc are available at http://***
Multiple sequence alignments can provide information for comparative analyses of proteins and protein populations. We present some statistical trend-tests that can be used when an aligned data set can be divided into ...
详细信息
ISBN:
(纸本)1860946232
Multiple sequence alignments can provide information for comparative analyses of proteins and protein populations. We present some statistical trend-tests that can be used when an aligned data set can be divided into two or more populations based on phenotypic traits such as preference of temperature, pH, salt concentration or pressure. The approach is based on estimation and analysis of the variation between the values of physicochemical parameters at positions of the sequence alignment. Monotonic trends are detected by applying a cumulative Mann-Kendall test. The method is found to be useful to identify significant physicochemical mechanisms behind adaptation to extreme environments and uncover molecular differences between mesophile and extremophile organisms. A filtering technique is also presented to visualize the underlying structure in the data. All the comparative statistical methods are available in the toolbox DeltaProt.
We address the issue of structured motif inference. This problem is stated as follows: given a set of n DNA sequences and a quorum q (%), find the optimal structured consensus motif described as gaps alternating with ...
详细信息
ISBN:
(纸本)1860946232
We address the issue of structured motif inference. This problem is stated as follows: given a set of n DNA sequences and a quorum q (%), find the optimal structured consensus motif described as gaps alternating with specific regions and shared by at least q x n sequences. Our proposal is in the domain of metaheuristics: it runs solutions to convergence through a cooperation between a sampling strategy of the search space and a quick detection of local similarities in small sequence samples. The contributions of this paper are: (1) the design of a stochastic method whose genuine novelty rests on driving the search with a threshold frequency f discrimining between specific regions and gaps;(2) the original way for justifying the operations especially designed;(3) the implementation of a mining tool well adapted to biologists' exigencies: few input parameters are required (quorum q, minimal threshold frequency f, maximal gap length g). Our approach proves efficient on simulated data, promoter sites in Dicot plants and transcription factor binding sites in E. coli genome. Our algorithm, Kaos, compares favorably with MEME and STARS in terms of accuracy.
This volume contains about 40 papers covering many of the latest developments in the fast-growing field of bioinformatics. The contributions span a wide range of topics, including computational genomics and genetics, ...
ISBN:
(数字)9781860947575
ISBN:
(纸本)9781860947001
This volume contains about 40 papers covering many of the latest developments in the fast-growing field of bioinformatics. The contributions span a wide range of topics, including computational genomics and genetics, protein function and computational proteomics, the transcriptome, structural bioinformatics, microarray data analysis, motif identification, biological pathways and systems, and biomedical applications. There are also abstracts from the keynote addresses and invited *** papers cover not only theoretical aspects of bioinformatics but also delve into the application of new methods, with input from computation, engineering and biology disciplines. This multidisciplinary approach to bioinformatics gives these proceedings a unique viewpoint of the field.
暂无评论