microarray data analysis approach has became a widely used tool for disease detection. It uses tens of thousands of genes as input dimension that would be a huge computational problem for dataanalysis. In this chapte...
详细信息
ISBN:
(纸本)9783319173146;9783319173139
microarray data analysis approach has became a widely used tool for disease detection. It uses tens of thousands of genes as input dimension that would be a huge computational problem for dataanalysis. In this chapter, the proposed approach deals with selection of feature genes and classification of microarraydata under support vector machine (SVM) approach. Feature genes can be finding out according to the adjustable epsilon-support vector regression (epsilon-SVR) and then to select high ranked genes after all microarraydata. Moreover, multi-class support vector classification (multi-class SVC) and cross-validation methods apply to acquire great prediction classification accuracy and less computing time.
microarray technology is capable of providing biomedical and biological researchers with a massive amount of gene expression information to enable rapid significant discoveries in life sciences. microarraydata analys...
详细信息
microarray technology is capable of providing biomedical and biological researchers with a massive amount of gene expression information to enable rapid significant discoveries in life sciences. microarray data analysis has been developing at a fast pace during the last decade and has become a popular and standard research method for gene expression studies undertaken by genomics research groups worldwide. Many computational tools have been applied to mine this data in order to discover biologically meaningful knowledge. One of the most useful analysis tools is the fuzzy clustering approach which can be modeled in many types of the continuous partitions of data and are well known for its ability to identify co-expressed genes and annotate functions for novel genes. As the computational analysis of microarraydata has been developing rapidly, articles surveying its progress of research and developments are periodically needed. In this paper, we review the recent research into microarray data analysis based on fuzzy clustering algorithms and present a newly developed fuzzy clustering technique which, potentially, can be applied to perform microarray data analysis.
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for dataanalysis. In this paper, we pr...
详细信息
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for dataanalysis. In this paper, we propose a novel hybrid method for feature selection in microarray data analysis. This method first uses a genetic algorithm with dynamic parameter setting (GADP) to generate a number of subsets of genes and to rank the genes according to their occurrence frequencies in the gene subsets. Then, this method uses the chi(2)-test for homogeneity to select a proper number of the top-ranked genes for dataanalysis. We use the support vector machine (SVM) to verify the efficiency of the selected genes. Six different microarraydatasets are used to compare the performance of the GADP method with the existing methods. The experimental results show that the GADP method is better than the existing methods in terms of the number of selected genes and the prediction accuracy. (c) 2009 Elsevier B.V. All rights reserved.
The complexity, approximation and algorithmic issues of several clustering problems are studied. These non-traditional clustering problems arise from recent studies in microarray data analysis. We prove the following ...
详细信息
The complexity, approximation and algorithmic issues of several clustering problems are studied. These non-traditional clustering problems arise from recent studies in microarray data analysis. We prove the following results. (1) Two variants of the Order-Preserving Submatrix problem are NP-hard. There are polynomial-time algorithms for the Order-Preserving Submatrix problem when the condition or gene sets are fixed. (2) Three variants of the Smooth Clustering problem are NP-hard. The Smooth Subset problem is approximable with ratio 0.5, but it cannot be approximable with ratio 0.5 + delta for any delta > 0 unless NP = P. (3) The inferring plaid model problem is NP-hard.
We are designing new data mining techniques on boolean contexts to identify a priori interesting bi-sets, i.e., sets of objects (or transactions) and associated sets of attributes (or items). It improves the state of ...
详细信息
We are designing new data mining techniques on boolean contexts to identify a priori interesting bi-sets, i.e., sets of objects (or transactions) and associated sets of attributes (or items). It improves the state of the art in many application domains where transactional/boolean data are to be mined (e. g., basket analysis, WWW usage mining, gene expression dataanalysis). The so-called (formal) concepts are important special cases of a priori interesting bi-sets that associate closed sets on both dimensions thanks to the Galois operators. Concept mining in boolean data is tractable provided that at least one of the dimensions (number of objects or attributes) is small enough and the data is not too dense. The task is extremely hard otherwise. Furthermore, it is important to enable user-defined constraints on the desired bi-sets and use them during the extraction to increase both the efficiency and the a priori interestingness of the extracted patterns. It leads us to the design of a new algorithm, called D-Miner, for mining concepts under constraints. We provide an experimental validation on benchmark data sets. Moreover, we introduce an original data mining technique for microarray data analysis. Not only boolean expression properties of genes are recorded but also we add biological information about transcription factors. In such a context, D-Miner can be used for concept mining under constraints and outperforms the other studied algorithms. We show also that data enrichment is useful for evaluating the biological relevancy of the extracted concepts.
Being motivated by combining the advantages of hyperplane-based pattern analysis and fuzzy clustering techniques, we present in this paper a fuzzy mix-prototype (FMP) clustering for microarray data analysis. By integr...
详细信息
Being motivated by combining the advantages of hyperplane-based pattern analysis and fuzzy clustering techniques, we present in this paper a fuzzy mix-prototype (FMP) clustering for microarray data analysis. By integrating spherical and hyper-planar cluster prototypes, the FMP is capable of capturing latent data models with both spherical and non-spherical geometric structures. Our contributions of the paper can be summarized into three folds: first, the objective function of the FMP is formulated. Second, an iterative solution which minimizes the objective function under given constraints is derived. Third, the effectiveness of the proposed FMP is demonstrated through experiments on yeast and leukemia data sets. (c) 2017 Elsevier B.V. All rights reserved.
Cancer has been identified as the leading cause of death. It is predicted that around 20-26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective m...
详细信息
Cancer has been identified as the leading cause of death. It is predicted that around 20-26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. microarray technology provides a useful basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes and individualized treatment. Amongst clustering techniques, k-means is normally chosen for its simplicity and efficiency. However, it does not account for the different importance of data attributes. This paper presents a new locally weighted extension of k-means, which has proven more accurate across many published datasets than the original and other extensions found in the literature.
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniq...
详细信息
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarraydata, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers. (c) 2006 Elsevier Ireland Ltd. All rights reserved.
Background The aim of this study was to gain further investigation of non-small cell lung cancer (NSCLC) tumorigenesis and identify biomarkers for clinical management of patients through comprehensive bioinformatics a...
详细信息
Background The aim of this study was to gain further investigation of non-small cell lung cancer (NSCLC) tumorigenesis and identify biomarkers for clinical management of patients through comprehensive bioinformatics analysis. Methods miRNA and mRNA microarraydatasets were downloaded from GEO (Gene Expression Omnibus) database under the accession number GSE102286 and GSE101929, respectively. Genes and miRNAs with differential expression were identified in NSCLC samples compared with controls, respectively. The interaction between differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) was predicted, followed by functional enrichment analysis, and construction of miRNA-gene regulatory network, protein-protein interaction (PPI) network, and competing endogenous RNA (ceRNA) network. Through comprehensive bioinformatics analysis, we anticipate to find novel therapeutic targets and biomarkers for NSCLC. Results A total of 123 DEmiRs (5 up- and 118 down-regulated miRNAs) and 924 DEGs (309 up- and 615 down-regulated genes) were identified. These genes and miRNAs were significantly involved in different pathways including adherens junction, relaxin signaling pathway, and axon guidance. Furthermore, hsa-miR-9-5p, has-miR-196a-5p and hsa-miR-31-5p, as well as hsa-miR-1, hsa-miR-218-5p and hsa-miR-135a-5p were shown to have higher degree in the miRNA-gene regulatory network and ceRNA network, respectively. Furthermore, BIRC5 and FGF2, as well as RTKN2 and SLIT3 were hubs in the PPI network and ceRNA network, respectively. Conclusion Several pathways (adherens junction, relaxin signaling pathway, and axon guidance) miRNAs (hsa-miR-9-5p, has-miR-196a-5p, hsa-miR-31-5p, hsa-miR-1, hsa-miR-218-5p and hsa-miR-135a-5p) and genes (BIRC5, FGF2, RTKN2 and SLIT3) may play important roles in the pathogenesis of NSCLC.
暂无评论