A gene-by-gene mixed model analysis is a useful statistical method for assessing significance for microarray gene differential expression. While a large amount of data on thousands of genes are collected in a microarr...
详细信息
A gene-by-gene mixed model analysis is a useful statistical method for assessing significance for microarray gene differential expression. While a large amount of data on thousands of genes are collected in a microarray experiment, the sample size for each gene is usually small, which could limit the statistical power of this analysis. In this report, we introduce an empirical Bayes (EB) approach for general variance component models applied to microarraydata. Within a linear mixed model framework, the restricted maximum likelihood (REML) estimates of variance components of each gene are adjusted by integrating information on variance components estimated from all genes. The approach starts with a series of single-gene analyses. The estimated variance components from each gene are transformed to the "ANOVA components." This transformation makes it possible to independently estimate the marginal distribution of each "ANOVA component." The modes of the posterior distributions are estimated and inversely transformed to compute the posterior estimates of the variance components. The EB statistic is constructed by replacing the REML variance estimates with the EB variance estimates in the usual t statistic. The EB approach is illustrated with a real data example which compares the effects of five different genotypes of male flies on post-mating gene expression in female flies. In a simulation study, the ROC curves are applied to compare the EB statistic and two other statistics. The EB statistic was found to be the most powerful of the three. Though the null distribution of the EB statistic is unknown, a t distribution may be used to provide conservative control of the false positive rate.
作者:
Simon, RichardNCI
Biometr Res Branch Div Canc Treatment & Diag Bethesda MD 20892 USA
DNA microarrays are powerful tools for studying biological mechanisms and for developing prognostic and predictive classifiers for identifying the patients who require treatment and are best candidates for specific tr...
详细信息
DNA microarrays are powerful tools for studying biological mechanisms and for developing prognostic and predictive classifiers for identifying the patients who require treatment and are best candidates for specific treatments. Because microarrays produce so much data from each specimen, they offer great opportunities for discovery and great dangers or producing misleading claims. microarray based studies require clear objectives for selecting cases and appropriate analysis methods. Effective analysis of microarraydata, where the number of measured variables is orders of magnitude greater than the number of cases, requires specialized statistical methods which have recently been developed. Recent literature reviews indicate that serious problems of analysis exist a substantial proportion of publications. This manuscript attempts to provide a non-technical summary of the key principles of statistical design and analysis for studies that utilize microarray expression profiling. Published by Elsevier Ltd.
Background: The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons s...
详细信息
Background: The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarraydata, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarraydata sets with a small number of replicates. Results: We propose a rank-invariant resampling (RIR) based approach to FDR evaluation. Our proposed method generates a biologically relevant null distribution, which maintains similar variability to observed microarraydata. We compare the performance of our RIR-based FDR estimation with that of four other popular methods. Our approach outperforms the other methods both in simulated and real microarraydata. Conclusion: We found that the SAM's random shuffling and SPLOSH approaches were liberal and the other two theoretical methods were too conservative while our RIR approach provided more accurate FDR estimation than the other approaches.
We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data (http://***). In a completely automated fashion, Tilescope will normalize signals between channels and a...
详细信息
We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data (http://***). In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.
We propose a block principal component analysis method for extracting information from a database with a large number of variables and a relatively small number of subjects, such as a microarray gene expression databa...
详细信息
We propose a block principal component analysis method for extracting information from a database with a large number of variables and a relatively small number of subjects, such as a microarray gene expression database. This new procedure has the advantage of computational simplicity, and theory and numerical results demonstrate it to be as efficient as the ordinary principal component analysis when used for dimension reduction, variable selection and data visualization and classification. The method is illustrated with the well-known National Cancer Institute database of 60 human cancer cell lines data (NC160) of gene microarray expressions, in the context of classification of cancer cell lines. Copyright (C) 2002 John Wiley Sons, Ltd.
Biclustering is an important tool in microarrayanalysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classificati...
详细信息
Biclustering is an important tool in microarrayanalysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarraydata matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarraydata. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes. (C) 2007 Elsevier Ltd. All rights reserved.
Biclustering can perform simultaneous pattern classification in both row and column directions in a data matrix and is useful for DNA microarray data analysis. In this paper, a new biclustering method is introduced ba...
详细信息
ISBN:
(纸本)9781424483044
Biclustering can perform simultaneous pattern classification in both row and column directions in a data matrix and is useful for DNA microarray data analysis. In this paper, a new biclustering method is introduced based on a geometrical method of identifying bicluster patterns. The Hough transform in column-pair space is used to find sub-biclusters and a hypergraph model is used to merge the sub-biclusters into larger ones. The hypergraph based geometric biclustering (HGBC) algorithm proposed here reduces the computing time and improves the classification accuracy considerably compared with exiting biclustering methods. Experiments on both simulated and real microarraydata demonstrate that our method can identify biclusters with different noise levels and overlapped degrees.
This paper presents work to support collaborative visualisation and dataanalysis in the microarray time-series explorer (MaTSE) software. We introduce a novel visualisation component called the 'pattern browser...
详细信息
ISBN:
(纸本)9783642160653
This paper presents work to support collaborative visualisation and dataanalysis in the microarray time-series explorer (MaTSE) software. We introduce a novel visualisation component called the 'pattern browser' which is used to support the annotation and adjustment of user queries. This includes an explanation of why this component is required and how it can be used with our online pattern repository by biologists collaborating in the analysis of a microarray time-course data set. To conclude we suggest which other types of collaborative visualisation would benefit from the introduction of a component with comparable functionality.
microarray technology makes it possible to measure expression level of thousands of genes simultaneously in an efficient and inexpensive manner. However, due to various complexities in processing microarrays, expressi...
详细信息
ISBN:
(纸本)9781467392235
microarray technology makes it possible to measure expression level of thousands of genes simultaneously in an efficient and inexpensive manner. However, due to various complexities in processing microarrays, expression information of various genes may be missing due to unreliable measurements. The occurrence of missing values in gene expression data can adversely affect downstream analyses such as clustering, dimensionality reduction etc. Different algorithms have been developed to estimate the missing values in different data sets and none of these algorithm works well with all the data sets. In this work, we explore the possible application of Mutual Nearest Neighbor (MNN) algorithm to impute the missing values, which shows comparable results with other well know imputation algorithms. We also have explored five different methods for missing value imputation namely Row Average Imputation, Mean Imputation, Median Imputation, k-Nearest Neighbor Imputation and combination of kNN based feature selection (kNNFS) and kNN -based imputation. The experiments are carried out on very high dimensional gene expression data such as Notterman Carcinoma and Notterman Adenocarcinoma data and the results are illustrated.
microarray technology has been used extensively for high throughput gene expression studies. Many bioinformatics tools are available for analysis of microarraydata. In the data mining process, it is important to be g...
详细信息
ISBN:
(纸本)9781424427567
microarray technology has been used extensively for high throughput gene expression studies. Many bioinformatics tools are available for analysis of microarraydata. In the data mining process, it is important to be goal oriented so that a set of proper tools can be assembled for the targeted knowledge discovery process. In this paper, we tackle this issue by using a microarraydataset from Brassica endosperm together with EST data to validate our process. We were most interested in which genes are highly expressed in Brassica endosperm and their variations and functions over various stages in embryo development. We also performed gene characterization based on gene ontology analysis. Our results indicate that designing a specific data mining workflow that considers both the log ratio and signal intensity enhances knowledge discovery process. Through this approach, we were able to rind the regulatory relationship between two most important transcription factors, LEC1 and WRI1 in the endosperm of Brassica napus.
暂无评论