Missing values often exist in the data of gene expression microarray experiments. A number of methods such as the Row Average (RA) method, KNNimpute algorithm and SVDimpute algorithm have been proposed to estimate the...
详细信息
Missing values often exist in the data of gene expression microarray experiments. A number of methods such as the Row Average (RA) method, KNNimpute algorithm and SVDimpute algorithm have been proposed to estimate the missing values. Recently, Kim et al. proposed a Local Least Squares Imputation (LLSI) method for estimating the missing values. In this paper, we propose a Weighted Local Least Square Imputation (WLLSI) method for missing values estimation. WLLSI allows training on the weighting and therefore can take advantage of both the LLSI method and the RA method. Numerical results on both synthetic data and real microarray data are given to demonstrate the effectiveness of our proposed method. The imputation methods are then applied to a breast cancer dataset.
This paper proposes two extensions to a Multi-Label Correlation Based Feature Selection Method (ML-CFS): (1) ML-CFS using the absolute value of the correlation coefficient in the equation for evaluating a candidate fe...
详细信息
ISBN:
(纸本)9781479906505
This paper proposes two extensions to a Multi-Label Correlation Based Feature Selection Method (ML-CFS): (1) ML-CFS using the absolute value of the correlation coefficient in the equation for evaluating a candidate feature subset, and (2) ML-CFS using Mutual Information for class label weighting. These extensions are evaluated in a bioinformatics case study addressing the multi-label classification of a cancer-related DNA microarray dataset with over 20,000 features. The results show that ML-CFS with absolute value of correlation obtained a significantly better predictive accuracy (smaller hamming loss) than the original ML-CFS. On the other hand, using Mutual Information to assign weights to labels showed some positive effect when using the ML-RBF classifier, but it showed a negative effect when using the ML-kNN classifier.
Wrapper based gene selection methods tend to obtain better classification accuracy than filter methods, while it is much more time consuming. Accelerating this process without degrading the high accuracy is of great v...
详细信息
ISBN:
(纸本)9781479956708
Wrapper based gene selection methods tend to obtain better classification accuracy than filter methods, while it is much more time consuming. Accelerating this process without degrading the high accuracy is of great value for researchers to better analyze gene expression profiles. In this paper, we explore to reduce the time complexity of wrapper based gene selection method with K-Nearest-Neighbor (KNN) classifier embedded. Instead of taking KNN as a black box, we incrementally construct and maintain a classifier distance matrix to speed up the gene selection process. Experiments on six publicly available microarrays were first conducted to show the effectiveness of incremental wrapper based gene selection method with KNN. Then, to demonstrate the performance gain in time cost reduction, we analyzed the time complexity and experimentally evaluated it. Both theoretical analysis and experimental results prove that the proposed approach greatly accelerates the gene selection process without degrading the classification accuracy.
Parkinson's disease (PD) is a typical case of neurodegenerative disorder, which often impairs the sufferer's motor skills, speech, and other functions. Combination of protein-protein interaction (PPI) network ...
详细信息
Parkinson's disease (PD) is a typical case of neurodegenerative disorder, which often impairs the sufferer's motor skills, speech, and other functions. Combination of protein-protein interaction (PPI) network analysis and gene expression studies provides a better insight of Parkinson's disease. A computational approach was developed in our work to identify protein signal network in PD study. First, a linear regression model is setup and then a network-constrain regularization analysis was applied to microarray data from transgenic mouse model with Parkinson's disease. Then protein network was detected based on an integer linear programming model by integrating microarray data and PPI database.
Selecting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function *** support vector machine recursive feature eliminati...
详细信息
Selecting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function *** support vector machine recursive feature elimination(SVM-RFE) is one of the most effective feature selection method which has been successfully used in selecting informative genes for cancer classification. While,the SVM-RFE selects genes only using the gene expression data without using any other biological information of the *** on the biology information of the genes,it may be beneficial to identify the genes that are relevant to the *** propose a novel SVM-RFE method for gene selection by incorporating the Kyoto Encyclopedia of genes and genomes(KEGG) pathway information into feature selection *** results indicate that the novel SVM-RFE tends to provide better variable selection results than the SVM-RFE.
An abstract of a study related to unsupervised approach on gene saliency based on the fuzzy rule, which was conducted by Nishchal K. Verma, Pooja Agrawal, and Yan Cui, is presented.
An abstract of a study related to unsupervised approach on gene saliency based on the fuzzy rule, which was conducted by Nishchal K. Verma, Pooja Agrawal, and Yan Cui, is presented.
Subtyping of tumor transcriptome expression profiles is a routine method used to distinguish tumor heterogeneity. Unsupervised clustering techniques are often combined with survival analysis to decipher the relationsh...
详细信息
ISBN:
(纸本)9781450372152
Subtyping of tumor transcriptome expression profiles is a routine method used to distinguish tumor heterogeneity. Unsupervised clustering techniques are often combined with survival analysis to decipher the relationship between genes and the survival times of patients. However, the reproducibility of these subtyping based studies is poor. There are multiple reports which have conflicting subtype and gene-survival time relationship results. In this study, we introduce the issues underlying the lack of reproducibility in transcriptomic subtyping studies. This problem arises from the routine analysis of small cohorts (< 100 individuals) and use of biased traditional consensus clustering techniques. Our approach carefully combines multiple RNA-sequencing and microarray datasets, followed by subtyping via Monte-Carlo Consensus Clustering and creation of deep subtyping classifiers. This paper demonstrates an improved subtyping methodology by investigating pancreatic ductal adenocarcinoma. Importantly, our methodology identifies six biologically novel pancreatic ductal adenocarcinoma subtypes. Our approach also enables a degree of reproducibility, via our pancreatic ductal adenocarcinoma classifier PDACNet, which classical subtyping studies have failed to establish.
Background: Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic eve...
详细信息
Background: Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. Results: In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC) on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs) such as NF-kappa B. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin). Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection) of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network. Conclusion: In this study, Da
An extensive empirical study is presented in this work to identify potential biomarkers of ESCC by employing fifteen prominent biclustering algorithms on synthetic and real datasets. For systematic analyses, we implem...
详细信息
An extensive empirical study is presented in this work to identify potential biomarkers of ESCC by employing fifteen prominent biclustering algorithms on synthetic and real datasets. For systematic analyses, we implement the algorithms on a variety of synthetic datasets and evaluate the quality of biclusters using recovery and relevance scores. The biclustering algorithms showing adequate results on synthetic datasets are implemented on real ESCC microarray dataset of both normal and disease samples separately. Gene enrichment analysis has been carried out to recognize the best possible bicluster(s) of individual algorithms. Our approach exploits the set of best possible biclusters in the downstream analysis towards the identification of the potential biomarkers with reference to a set of established elite genes for ESCC. Our approach depends on Pearson correlation, conversion of floating valued correlation matrix into a binary matrix, degree analysis based on elite genes, deviation of degree in their respective mapping bicluster, significant alteration of gene expression values while transitioning from normal to disease conditions, and gene ontology and pathway analyses. Finally, we detect 9 ESCC potential biomarker genes;SH3GLB1, ARPC2, APPL1, CALM1, FTL, LPAR1, PLAU, PSMB4, and SCP2;which shows the topological as well as biological significance of ESCC elite genes.
暂无评论