Reconstructing large-scale gene regulatory networks (GRNs) is a challenging problem in the field of computational biology. Various methods for inferring GRNs have been developed, but they fail to accurately infer GRNs...
详细信息
Reconstructing large-scale gene regulatory networks (GRNs) is a challenging problem in the field of computational biology. Various methods for inferring GRNs have been developed, but they fail to accurately infer GRNs with a large number of genes. Additionally, the existing evaluation indexes for evaluating the constructed networks have obvious disadvantages because GRNs in most biological systems are sparse. In this paper, we develop a new method for inferring GRNs based on randomized singular value decomposition (RSVD) and ordinary differential equation (ODE)-based optimization, denoted as IGRSVD, from large-scale time series data with noise. The three major contributions of this paper are as follows. First, the IGRSVD algorithm uses the RSVD to handle the noise and reduce the original large-scale data into small-scale problems. Second, we propose two new evaluated indexes, the expected value accuracy (EVA) and the expected value error (EVE), to evaluate the performance of inferred networks by considering the sparse features in the network. Finally, the proposed IGRSVD algorithm is compared with the existing SVD algorithm and PCA_CMI algorithm using four subsets from E. coli and datasets from DREAM challenge. The experimental results demonstrate that the IGRSVD algorithm is effective and more suitable for reconstructing large-scale networks.
Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithm...
详细信息
Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithms for gene data clustering due to its simplicity and computational efficiency. But, K-means algorithm is highly sensitive to the choice of initial cluster centers. Thus, the algorithm easily gets trapped with local optimum if the initial centers are chosen randomly. This paper proposes a deterministic initialization algorithm for K-means (DK-means) by exploring a set of probable centers through a constrained bi-partitioning approach. The proposed algorithm is compared with classical K-means with random initialization and improved K-means variants such as K-means++ and MinMax algorithms. It is also compared with three deterministic initialization methods. Experimental analysis on gene expression datasets demonstrates that DK-means achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.
In this paper, a novel game theory method is proposed to identify subnetwork markers by integrating gene expression profile and protein-protein interaction network. The proposed method has been evaluated on different ...
详细信息
ISBN:
(纸本)9789811045059;9789811045042
In this paper, a novel game theory method is proposed to identify subnetwork markers by integrating gene expression profile and protein-protein interaction network. The proposed method has been evaluated on different cancer datasets in order to classify cancer phenotypes. To verify the performance of our approach, the identified subnetwork markers are compared with a greedy search method. The proposed method is capable of identifying robust subnetwork markers and presents higher classification performance.
BackgroundWhen designing an epigenome-wide association study (EWAS) to investigate the relationship between DNA methylation (DNAm) and some exposure(s) or phenotype(s), it is critically important to assess the sample ...
详细信息
BackgroundWhen designing an epigenome-wide association study (EWAS) to investigate the relationship between DNA methylation (DNAm) and some exposure(s) or phenotype(s), it is critically important to assess the sample size needed to detect a hypothesized difference with adequate statistical power. However, the complex and nuanced nature of DNAm data makes direct assessment of statistical power challenging. To circumvent these challenges and to address the outstanding need for a user-friendly interface for EWAS power evaluation, we have developed *** current implementation of pwrEWAS accommodates power estimation for two-group comparisons of DNAm (e.g. case vs control, exposed vs non-exposed, etc.), where methylation assessment is carried out using the Illumina Human Methylation BeadChip technology. Power is calculated using a semi-parametric simulation-based approach in which DNAm data is randomly generated from beta-distributions using CpG-specific means and variances estimated from one of several different existing DNAm data sets, chosen to cover the most common tissue-types used in EWAS. In addition to specifying the tissue type to be used for DNAm profiling, users are required to specify the sample size, number of differentially methylated CpGs, effect size(s) (), target false discovery rate (FDR) and the number of simulated data sets, and have the option of selecting from several different statistical methods to perform differential methylation analyses. pwrEWAS reports the marginal power, marginal type I error rate, marginal FDR, and false discovery cost (FDC). Here, we demonstrate how pwrEWAS can be applied in practice using a hypothetical EWAS. In addition, we report its computational efficiency across a variety of user *** under- and overpowered studies unnecessarily deplete resources and even risk failure of a study. With pwrEWAS, we provide a user-friendly tool to help researchers circumvent these risks and to assist in t
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However...
详细信息
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However, thousands of gene expressions measured for each biological sample using microarray pose a great challenge. Many statistical and machine learning methods have been applied to get most relevant genes prior to cancer classification. A two phase hybrid model for cancer classification is being proposed, integrating Correlation-based Feature Selection (CFS) with improved-Binary Particle Swarm Optimization (iBPSO). This model selects a low dimensional set of prognostic genes to classify biological samples of binary and multi class cancers using NaiveBayes classifier with stratified 10-fold cross-validation. The proposed iBPSO also controls the problem of early convergence to the local optimum of traditional BPSO. The proposed model has been evaluated on 11 benchmark microarraydatasets of different cancer types. Experimental results are compared with seven other well known methods, and our model exhibited better results in terms of classification accuracy and the number of selected genes in most cases. In particular, it achieved up to 100% classification accuracy for seven out of eleven datasets with a very small sized prognostic gene subset (up to <1.5%) for all eleven datasets. (C) 2017 Elsevier B.V. All rights reserved.
The genes of the NF kappa B pathway are involved in the control of a plethora of biological processes ranking from inhibition of apoptosis to metastasis in cancer. It has been described that Gliobastoma multiforme (GB...
详细信息
The genes of the NF kappa B pathway are involved in the control of a plethora of biological processes ranking from inhibition of apoptosis to metastasis in cancer. It has been described that Gliobastoma multiforme (GBM) patients carry aberrant NF kappa B activation, but the molecular mechanisms are not completely understood. Here, we present a NF kappa B pathway analysis in tumor specimens of GBM compared to nonneoplasic brain tissues, based on the different kind of cycles found among genes of a gene co-expression network constructed using quantized data obtained from the microarrays. A cycle is a closed walk with all vertices distinct (except the first and last). Thanks to this way of finding relations among genes, a more robust interpretation of gene correlations is possible, because the cycles are associated with feedback mechanisms that are very common in biological networks. In GBM samples, we could conclude that the stoichiometric relationship between genes involved in NF kappa B pathway regulation is unbalanced. This can be measured and explained by the identification of a cycle. This conclusion helps to understand more about the biology of this type of tumor. (C) 2017 Elsevier Ltd. All rights reserved.
Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision ru...
详细信息
Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). Results: We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions: The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selec
Identifying the informative genes has always been a major step in microarray data analysis. The complexity of various cancer datasets makes this issue still challenging. In this paper, a novel Bio-inspired Multi-objec...
详细信息
Identifying the informative genes has always been a major step in microarray data analysis. The complexity of various cancer datasets makes this issue still challenging. In this paper, a novel Bio-inspired Multi-objective algorithm is proposed for gene selection in microarraydata classification specifically in the binary domain of feature selection. The presented method extends the traditional Bat Algorithm with refined formulations, effective multi-objective operators, and novel local search strategies employing social learning concepts in designing random walks. A hybrid model using the Fisher criterion is then applied to three widely-used microarray cancer datasets to explore significant biomarkers which reveal the effectiveness of the proposed method for genomic analysis. Experimental results unveil new combinations of informative biomarkers have association with other studies.
microarray technology is capable of providing biomedical and biological researchers with a massive amount of gene expression information to enable rapid significant discoveries in life sciences. microarraydata analys...
详细信息
microarray technology is capable of providing biomedical and biological researchers with a massive amount of gene expression information to enable rapid significant discoveries in life sciences. microarray data analysis has been developing at a fast pace during the last decade and has become a popular and standard research method for gene expression studies undertaken by genomics research groups worldwide. Many computational tools have been applied to mine this data in order to discover biologically meaningful knowledge. One of the most useful analysis tools is the fuzzy clustering approach which can be modeled in many types of the continuous partitions of data and are well known for its ability to identify co-expressed genes and annotate functions for novel genes. As the computational analysis of microarraydata has been developing rapidly, articles surveying its progress of research and developments are periodically needed. In this paper, we review the recent research into microarray data analysis based on fuzzy clustering algorithms and present a newly developed fuzzy clustering technique which, potentially, can be applied to perform microarray data analysis.
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for dataanalysis. In this paper, we pr...
详细信息
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for dataanalysis. In this paper, we propose a novel hybrid method for feature selection in microarray data analysis. This method first uses a genetic algorithm with dynamic parameter setting (GADP) to generate a number of subsets of genes and to rank the genes according to their occurrence frequencies in the gene subsets. Then, this method uses the chi(2)-test for homogeneity to select a proper number of the top-ranked genes for dataanalysis. We use the support vector machine (SVM) to verify the efficiency of the selected genes. Six different microarraydatasets are used to compare the performance of the GADP method with the existing methods. The experimental results show that the GADP method is better than the existing methods in terms of the number of selected genes and the prediction accuracy. (c) 2009 Elsevier B.V. All rights reserved.
暂无评论