Background: Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tis...
详细信息
Background: Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached. Results: We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates. Conclusion: We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.
Background: All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different ge...
详细信息
Background: All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different genes within each cell included in the biological sample under study. Contrary to this common belief, modern microarray technology produces signals aggregated over a random number of individual cells, a "nitty-gritty" aspect of such arrays, thereby causing a random effect that distorts the correlation structure of intra-cellular gene expression levels. Results: This paper provides a theoretical consideration of the random effect of signal aggregation and its implications for correlation analysis and network inference. An attempt is made to quantitatively assess the magnitude of this effect from real data. Some preliminary ideas are offered to mitigate the consequences of random signal aggregation in the analysis of gene expression data. Conclusion: Resulting from the summation of expression intensities over a random number of individual cells, the observed signals may not adequately reflect the true dependence structure of intra-cellular gene expression levels needed as a source of information for network reconstruction. Whether the reported effect is extrime or not, the important point, is to reconize and incorporate such signal source for proper inference. The usefulness of inference on genetic regulatory structures from microarray data depends critically on the ability of investigators to overcome this obstacle in a scientifically sound way. Reviewers: This article was reviewed by Byung Soo KIM, Jeanne Kowalski and Geoff McLachlan
Background: DNA microarray analysis has great potential to become an important clinical tool to individualize prognostication and treatment for breast cancer patients. However, with any emerging technology, there are ...
详细信息
Background: DNA microarray analysis has great potential to become an important clinical tool to individualize prognostication and treatment for breast cancer patients. However, with any emerging technology, there are many variables one must consider before bringing the technology to the bedside. There are already concerted efforts to standardize protocols and to improve reproducibility of DNA microarray. Our study examines one variable that is often overlooked, the timing of tissue acquisition, which may have a significant impact on the outcomes of DNA microarray analyses especially in studies that compare microarray data based on biospecimens taken in vivo and ex vivo. Methods: From 16 patients, we obtained paired fine needle aspiration biopsies (FNABs) of breast cancers taken before (PRE) and after (POST) their surgeries and compared the microarray data to determine the genes that were differentially expressed between the FNABs taken at the two time points. qRT-PCR was used to validate our findings. To examine effects of longer exposure to hypoxia on gene expression, we also compared the gene expression profiles of 10 breast cancers from clinical tissue bank. Results: Using hierarchical clustering analysis, 12 genes were found to be differentially expressed between the FNABs taken before and after surgical removal. Remarkably, most of the genes were linked to FOS in an early hypoxia pathway. The gene expression of FOS also increased with longer exposure to hypoxia. Conclusion: Our study demonstrated that the timing of fine needle aspiration biopsies can be a confounding factor in microarray data analyses in breast cancer. We have shown that FOS-related genes, which have been implicated in early hypoxia as well as the development of breast cancers, were differentially expressed before and after surgery. Therefore, it is important that future studies take timing of tissue acquisition into account.
In this paper, we propose a novel method based on support vector machine (SVM) for microarray classification and gene (feature) selection. The proposed method, called similaritybased SVM (SSVM), incorporates the prior...
详细信息
In this paper, we propose a novel method based on support vector machine (SVM) for microarray classification and gene (feature) selection. The proposed method, called similaritybased SVM (SSVM), incorporates the prior knowledge of gene similarity into the standard SVM by combining the standard l 2 norm and the similarity penalty of all the genes. The preliminary experiments show that our method performs better than the standard SVM, l 2 l 0 SVM and SVMRFE, especially when the features are highly similar.
Background: Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic eve...
详细信息
Background: Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. Results: In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC) on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs) such as NF-kappa B. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin). Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection) of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network. Conclusion: In this study, Da
An important goal of microarray studies is the detection of genes that show significant changes in expression when two classes of biological samples are being compared. We present an ANOVA-style mixed model with param...
详细信息
An important goal of microarray studies is the detection of genes that show significant changes in expression when two classes of biological samples are being compared. We present an ANOVA-style mixed model with parameters for array normalization, overall level of gene expression, and change of expression between the classes. For the latter we assume a mixing distribution with a probability mass concentrated at zero, representing genes with no changes, and a normal distribution representing the level of change for the other genes. We estimate the parameters by optimizing the marginal likelihood. To make this practical, Laplace approximations and a backfitting algorithm are used. The performance of the model is studied by simulation and by application to publicly available data sets.
This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the av...
详细信息
This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the available data and evaluating an index which expresses the similarity of the successive partitions obtained. We present a method for automatically estimating K by starting from the distribution of the similarity index. We investigate how the selection of the hierarchical clustering (HQ method, respectively, the similarity index, influences the estimation accuracy. The paper introduces a new similarity index based on a partition distance. The performance of the new index and that of other well-known indices are experimentally evaluated by comparing the "true" data partition with the partition obtained at each level of an HC tree. A case study is conducted with a publicly available Leukemia dataset.
暂无评论