Many methods have been proposed to identify informative subsets of genes in microarray studies in order to focus the research. For instance, the recently proposed binarization of consensus partition matrices (Bi-CoPaM...
详细信息
ISBN:
(纸本)9781479903566
Many methods have been proposed to identify informative subsets of genes in microarray studies in order to focus the research. For instance, the recently proposed binarization of consensus partition matrices (Bi-CoPaM) method has, amongst its various features, the ability to generate tight clusters of genes while leaving many genes unassigned from all clusters. We propose exploiting this particular feature by applying the Bi-CoPaM over genome-wide microarraydata from multiple datasets to generate more clusters than required. Then, these clusters are tightened so that most of their genes are left unassigned from all clusters, and most of the clusters are left totally empty. The tightened clusters, which are still not empty, include those genes that are consistently co-expressed in multiple datasets when examined by various clustering methods. An example of this is demonstrated in this paper for cyclic and acyclic genes as well as for genes that are highly expressed and that are not. Thus, the results of our proposed approach cannot be reproduced by other methods of genes' periodicity identification or by other methods of clustering.
Background: Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly adv...
详细信息
To gain more accurate and reliable cancer classification results through DNA microarrayanalysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data ...
详细信息
ISBN:
(纸本)9781467347143
To gain more accurate and reliable cancer classification results through DNA microarrayanalysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data are split into subspaces by column. For each subspace, partial least square (PLS) is applied to extract orthogonal latent variables (LVs). In conjunction with the Minimal Redundancy and Maximal Relevance (MRMR) as the gene-selection preprocessing method, the adverse effect of the too high dimensional variables with too small samples could be overcome successfully. Then, all available LVs are aggregated as the new training data where the Decision Forest is trained for classification. Therefore relying on the feature extraction power of PLS and the orthogonality of LVs, the multi-colinearity and high noise inherent in microarraydata could be eliminated effectively. Moreover, the Decision Forest could enhance the data variety and further lighten the dependence of the classification results to the given data. The applications to two microarraydatasets show that compared with Rotation Forest, Bagging and Boosting, the new SPDF method yields consistently accurate and robust predictive performance, with the maximal improvement reaching 7.26% in terms of classification accuracy on the Colon cancer classification.
Background: microarray experimentation requires the application of complex analysis methods as well as the use of non-trivial computer technologies to manage the resultant large data sets. This, together with the prol...
详细信息
Background: microarray experimentation requires the application of complex analysis methods as well as the use of non-trivial computer technologies to manage the resultant large data sets. This, together with the proliferation of tools and techniques for microarray data analysis, makes it very challenging for a laboratory scientist to keep up-to-date with the latest developments in this field. Our aim was to develop a distributed e-support system for microarray data analysis and management. Results: EMAAS (Extensible microarrayanalysis System) is a multi-user rich internet application (RIA) providing simple, robust access to up-to-date resources for microarraydata storage and analysis, combined with integrated tools to optimise real time user support and training. The system leverages the power of distributed computing to perform microarray analyses, and provides seamless access to resources located at various remote facilities. The EMAAS framework allows users to import microarraydata from several sources to an underlying database, to pre-process, quality assess and analyse the data, to perform functional analyses, and to track dataanalysis steps, all through a single easy to use web portal. This interface offers distance support to users both in the form of video tutorials and via live screen feeds using the web conferencing tool EVO. A number of analysis packages, including R-Bioconductor and Affymetrix Power Tools have been integrated on the server side and are available programmatically through the Postgres-PLR library or on grid compute clusters. Integrated distributed resources include the functional annotation tool DAVID, GeneCards and the microarraydata repositories GEO, CELSIUS and MiMiR. EMAAS currently supports analysis of Affymetrix 3' and Exon expression arrays, and the system is extensible to cater for other microarray and transcriptomic platforms. Conclusion: EMAAS enables users to track and perform microarraydata management and analysis tasks
Many methods have been proposed to identify informative subsets of genes in microarray studies in order to focus the research. For instance, the recently proposed binarization of consensus partition matrices (Bi-CoPaM...
详细信息
ISBN:
(纸本)9781479903573
Many methods have been proposed to identify informative subsets of genes in microarray studies in order to focus the research. For instance, the recently proposed binarization of consensus partition matrices (Bi-CoPaM) method has, amongst its various features, the ability to generate tight clusters of genes while leaving many genes unassigned from all clusters, We propose exploiting this particular feature by applying the Bi-CoPaM over genome-wide microarraydata from multiple datasets to generate more clusters than required. Then, these clusters are tightened so that most of their genes are left unassigned from all clusters, and most of the clusters are left totally empty. The tightened clusters, which are still not empty, include those genes that are consistently co-expressed in multiple datasets when examined by various clustering methods. An example of this is demonstrated in this paper for cyclic and acyclic genes as well as for genes that are highly expressed and that are not. Thus, the results of our proposed approach cannot be reproduced by other methods of genes' periodicity identification or by other methods of clustering.
This paper proposes the authors" algorithm for gene selection in microarray data analysis comparing conditions with replicates. Based on background noise computation in replicated arrays, this algorithm uses the ...
详细信息
This paper proposes the authors" algorithm for gene selection in microarray data analysis comparing conditions with replicates. Based on background noise computation in replicated arrays, this algorithm uses the global False Discovery Rate based on "Inter" and "Intra" group comparisons of replicates to select differential expressed gene sets. This method uses two statistic types that lead to improve the selection procedure when confronted to very high background noise. Using simulated and the well known Latin square datasets, the proposed method is compared to some well known algorithms.
The regulation of gene expression is a dynamic process, hence it is of vital interest to identify and characterize changes in gene expression over time. We present here a general statistical method for detecting chang...
详细信息
The regulation of gene expression is a dynamic process, hence it is of vital interest to identify and characterize changes in gene expression over time. We present here a general statistical method for detecting changes in microarray expression over time within a single biological group and is based on repeated measures (RM) ANOVA. In this method, unlike the classical F-statistic, statistical significance is determined taking into account the time dependency of the microarraydata. A correction factor for this RM F-statistic is introduced leading to a higher sensitivity as well as high specificity. We investigate the two approaches that exist in the literature for calculating the p-values using resampling techniques of gene-wise p-values and pooled p-values. It is shown that the pooled p-values method compared to the method of the gene-wise p-values is more powerful, and computationally less expensive, and hence is applied along with the introduced correction factor to various synthetic data sets and a real data set. These results show that the proposed technique outperforms the current methods. The real data set results are consistent with the existing knowledge concerning the presence of the genes. The algorithms presented are implemented in R and are freely available upon request.
Recently, many methods have been proposed for constructing gene regulatory networks (GRNs). However, most of the existing methods ignored the time delay regulatory relation in the GRN predictions. In this paper, we pr...
详细信息
Recently, many methods have been proposed for constructing gene regulatory networks (GRNs). However, most of the existing methods ignored the time delay regulatory relation in the GRN predictions. In this paper, we propose a hybrid method, termed GA/PSO with DTW, to construct GRNs from microarraydatasets. The proposed method uses test of correlation coefficient and the dynamic time warping (DTW) algorithm to determine the existence of a time delay relation between two genes. In addition, it uses the particle swarm optimization (PSO) to find thresholds for discretizing the microarraydataset. Based on the discretized microarraydataset and the predicted types of regulatory relations among genes, the proposed method uses a genetic algorithm to generate a set of candidate GRNs from which the predicted GRN is constructed. Three real-life sub-networks of yeast are used to verify the performance of the proposed method. The experimental results show that the GA/PSO with DTW is better than the other existing methods in terms of predicting sensitivity and specificity. (C) 2011 Elsevier B.V. All rights reserved.
A common procedure for estimating the number of genes that are differentially expressed (DE) in two experiments involves two steps. In the first step, data from the two experiments are separately analyzed to produce a...
详细信息
A common procedure for estimating the number of genes that are differentially expressed (DE) in two experiments involves two steps. In the first step, data from the two experiments are separately analyzed to produce a list of genes declared to be DE in each experiment. Usually, each list is produced using a method that attempts to control the false discovery rate (FDR) in each experiment at some desired level alpha. In the second step, the number of genes common to both lists is used as an estimate of the number of genes DE in both experiments. A problem with this approach is that the resulting estimates can vary greatly with alpha, and the value of alpha that produces the best estimate for any given pair of experiments is difficult to predict. We propose a method that uses the p-values from both experiments simultaneously to produce one estimate-which does not depend on FDR level alpha-for the number of genes that are DE in both experiments. We use two simulation studies (one involving independent, normally distributed data and one involving microarraydata) to compare the performances of our proposed method, the commonly used method, and another method proposed in literature to test for consistency of replicate experiments. The results of the simulation studies demonstrate the advantages of our approach. We conclude the article by estimating the number of genes that are DE in both of two experiments involving gene expressions in maize leaves.
At present, a range of clinical indicators are used to gain insight into the course a newly-presented individual's disease may take, and so inform treatment regimes. However, such indicators are not absolutely pre...
详细信息
ISBN:
(纸本)9789898425904
At present, a range of clinical indicators are used to gain insight into the course a newly-presented individual's disease may take, and so inform treatment regimes. However, such indicators are not absolutely predictive and patients with apparently low-risk disease may follow a more aggressive course. Advances in molecular medicine offer the hope of improved disease stratification and personalised treatment. For example, the identification of "genetic sigantures" characteristics of disease subtypes is facilitated by high-throughput transcriptional profiling techniques (microarray) in which gene expression levels for thousands of gene are measured across a range of biopsy samples. However, the selection of a compact gene set conferring the most clinically-relevant information from complex and high-dimensional microarraydatasets is a challenging task. We reduced this complexity using a Pathway Enrichment and Gene Network analysis (PEGNA) method, which integrated gene expression data with prior biological knowledge to select a group of strongly-correlated genes providing accurate discrimination of complex disease subtypes. In our method, pathway enrichment analysis was applied to a microarraydataset in order to identify the most impacted biological processes. Secondly, we used gene network analysis to find a group of strongly-correlated genes from which subsets of genes were selected to use for disease classification with a support vector machine classifier. In this way, we were able to more accurately classify disease states, using smaller numbers of genes, compared to other methods across a range of biological datasets.
暂无评论