Objective: With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunction...
详细信息
A pattern recognition approach, based on shape feature extraction, is proposed to infer genetic networks from time course microarray data. The proposed algorithm learns patterns from known genetic interactions, such a...
详细信息
ISBN:
(纸本)3540464816
A pattern recognition approach, based on shape feature extraction, is proposed to infer genetic networks from time course microarray data. The proposed algorithm learns patterns from known genetic interactions, such as RT-PCR confirmed gene pairs, and tunes the parameters using particle swarm optimization algorithm. This work also incorporates a score function to separate significant predictions from non-significant ones. The prediction accuracy of the proposed method applied to data sets in Spellman et al. (1998) is as high as 91%, and true-positive rate and false-negative rate are about 61% and 1%, respectively. Therefore, the proposed algorithm may be useful for inferring genetic interactions.
Gene association plays important roles in complex genetic pathology of cancer. However, development of methods for finding cancer-related gene associations is still in its infancy. Based on a biological concept of gen...
详细信息
ISBN:
(纸本)9783642245527;9783642245534
Gene association plays important roles in complex genetic pathology of cancer. However, development of methods for finding cancer-related gene associations is still in its infancy. Based on a biological concept of gene association module (GAM) comprising a center gene and its expression-related genes, this paper proposes a gene association detection model called kernel GAM (kGAM). In the model, we assume that the expression of the center gene can be predicted by the expression-related genes. Based on defining a cost function, a kernel ridge regression algorithm is developed to solve the kGAM model. Finally, to identify a compact GAM for a given center gene, a heuristic search procedure is designed. Experimental results on three publicly available gene expression data sets show the effectiveness and efficiency of the proposed kGAM model in identifying cancer-related gene association patterns.
Insecticide resistance, a character inherited that encompasses alteration in one or more of insect's genes is now a major public health challenge combating world efforts on malaria control strategies. Anopheles ha...
详细信息
ISBN:
(纸本)9781479925780
Insecticide resistance, a character inherited that encompasses alteration in one or more of insect's genes is now a major public health challenge combating world efforts on malaria control strategies. Anopheles has developed heavy resistance to pyrethroids, the only World Health Organization (WHO) recommended class for Indoor Residual Spray (IRS) and Long-Lasting Insecticide Treated Nets (LLITNs) through P450 pathways. We used the biochemical network of Anopheles gambiae (henceforth Ag) to deduce its resistance mechanism(s) using two expression data (when Ag is treated with pyrethroid and when controlled). The employed computational techniques are accessible by a robust, multi-faceted and friendly automated graphic user interface (GUI) tagged 'workbench' with JavaFX Scenebuilder. In this work, we introduced a computational platform to determine and also elucidate for the first time resistance mechanism to a commonly used class of insecticide, Pyrethroid. Significantly, our work is the first computational work to identify genes associated or involved in the efflux system in Ag and as a resistance mechanism in the Anopheles.
microarray represents a recent multidisciplinary technology. It measures the expression levels of several genes under different biological conditions, which allows to generate multiple data. These data can be analyzed...
详细信息
ISBN:
(纸本)9783319091921;9783319091914
microarray represents a recent multidisciplinary technology. It measures the expression levels of several genes under different biological conditions, which allows to generate multiple data. These data can be analyzed through biclustering method to determinate groups of genes presenting a similar behavior under specific groups of conditions. This paper proposes a new evolutionary algorithm based on a new crossover method, dedicated to the biclustering of gene expression data. This proposed crossover method ensures the creation of new biclusters with better quality. To evaluate its performance, an experimental study was done on real microarray datasets. These experimentations show that our algorithm extracts high quality biclusters with highly correlated genes that are particularly involved in specific ontology structure.
microarrays are a powerful tool in studying genes expressions under several conditions. The obtained data need to be analyzed using data mining methods. Biclustering is a data mining method which consists in simultane...
详细信息
ISBN:
(纸本)9781424478354
microarrays are a powerful tool in studying genes expressions under several conditions. The obtained data need to be analyzed using data mining methods. Biclustering is a data mining method which consists in simultaneous clustering of rows and columns in a data matrix. Using biclustering, we can extract genes that have similar behavior (co-express) under specific conditions. These genes may share identical biological functions. The aim in analyzing gene expression data is the extraction of maximal number of genes and conditions that present similar behavior. The two objectives to be optimized (size and similarity) are conflicting. Therefore, multi-objective optimization is suitable for biclustering. In our work, we combine a well-known multi-objective genetic algorithm (NSGA-II) with a heuristic to solve the biclutering problem. Due to the huge size of the datasets, we use a string of integers as a solution representation where integers represent the indexes of the rows and the columns. Experimental results on real data set show that our approach can find significant biclusters of high quality.
A goodness-of-fit (gof) problem, i.e., testing whether observed data come from a specific distribution is one of the important problems in statistics, and various tests for checking distributional assumptions have bee...
详细信息
A goodness-of-fit (gof) problem, i.e., testing whether observed data come from a specific distribution is one of the important problems in statistics, and various tests for checking distributional assumptions have been suggested. Most tests are for one data set with a large enough sample sizes. However, this research focuses on the gof problem when there are a large number of small data sets. In other words, we assume that the number of data sets p increases to infinity and the sample size of each small data set n is finite. In this dissertation, we will denote p and n as the number of data sets and the sample sizes of each data sets, respectively. Since the primary interest of this dissertation is testing whether every small data set comes from a known parametric family of distributions with different parameters, it is important to choose a gof test invariant to parameters of unknown distribution. Hence, as a basic approach, we suggest applying empirical distribution function (edf) based gof tests to every small data set and then combining P-values to obtain a single test. Two P-value combining methods, moment based tests and smoothing based tests, are suggested and their pros and cons are discussed. Especially, the two moment based tests, Edgington's method and Fisher's method, are compared with respect to Pitman efficiency and asymptotic power. We also find conditions that guarantee that the asymptotic null distribution of moment based tests based on empirical P-values is the same as that based on exact P-values. When the null is a location and scale family, there is no difficulty in applying the suggested test procedures. However, when the null is not a location and scale family, edf-based tests may depend on unknown parameters. To handle such a problem, we suggest using unconditional P-values and this requires an additional step of estimating the distribution of unknown parameters. Several issues related to estimating the distribution of unknown parameters and
The low reproducibility of differential expression of individual genes in microarray experiments has led to the suggestion that experiments be analyzed in terms of gene characteristics, such as GO categories or pathwa...
详细信息
We present an “upstream analysis” strategy for causal analysis of multiple “-omics” data. It analyzes promoters using the TRANSFAC database, combines it with an analysis of the upstream signal transduction pathway...
详细信息
暂无评论