版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Jilin Univ Coll Comp Sci & Technol Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Jilin Peoples R China Tsinghua Univ Sch Software Beijing Peoples R China
出 版 物:《IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS》 (IEEE-ACM计算生物学与生物信息学汇刊)
年 卷 期:2021年第18卷第2期
页 面:621-632页
核心收录:
学科分类:0710[理学-生物学] 0808[工学-电气工程] 08[工学] 0714[理学-统计学(可授理学、经济学学位)] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China [61572105, 61872418, 71774154] Natural Science Foundation of Jilin Province [20180101331JC]
主 题:Feature extraction Microwave integrated circuits Genetic algorithms Cancer Nonhomogeneous media Gene expression Heuristic algorithms Gene selection genetic algorithm recursive feature elimination microarray data cancer classification
摘 要:Microarray gene expression data have become a topic of great interest for cancer classification and for further research in the field of bioinformatics. Nonetheless, due to the large p, small n paradigm of limited biosamples and high-dimensional data, gene selection is becoming a demanding task, which is aimed at selecting a minimal number of discriminatory genes associated closely with a phenotype. Feature or gene selection is still a challenging problem owing to its nondeterministic polynomial time complexity and thus most of the existing feature selection algorithms utilize heuristic rules. A multilayer recursive feature elimination method based on an embedded integer-coded genetic algorithm, MGRFE, is proposed here, which is aimed at selecting the gene combination with minimal size and maximal information. On the basis of 19 benchmark microarray datasets including multiclass and imbalanced datasets, MGRFE outperforms state-of-the-art feature selection algorithms with better cancer classification accuracy and a smaller selected gene number. MGRFE could be regarded as a promising feature selection method for high-dimensional datasets especially gene expression data. Moreover, the genes selected by MGRFE have close biological relevance to cancer phenotypes. The source code of our proposed algorithm and all the 19 datasets used in this paper are available at https://***/Pengeace/MGRFE-GaRFE.