Accurate cell classification is crucial but expensive for large-scale single-cell RNA sequencing (scrna-seq) analysis. Gene selection (GS) emerges as a pivotal technique in identifying gene subsets of scrna-seq for cl...
详细信息
Accurate cell classification is crucial but expensive for large-scale single-cell RNA sequencing (scrna-seq) analysis. Gene selection (GS) emerges as a pivotal technique in identifying gene subsets of scrna-seq for classification accuracy improvement and gene scale reduction. Nevertheless, the rising scale of scrna-seqdata presents challenges to existing GS methods regarding performance and computational time. Thus, we propose a surrogate-assisted evolutionary algorithm for multiobjective GS to address these deficiencies. An innovative two-phase initialization method is proposed to select sparse solutions to provide preliminary insights into gene contributions. Then, a binary competitive swarm optimizer is proposed for effective global search, where a local search method is embedded to eliminate irrelevant genes for efficiency consideration. Additionally, a surrogate model is adopted to forecast classification accuracy efficiently and substitutes part of the computationally expensive classification process. Experiments are conducted on eight large-scalescrna-seqdatasets with more than 20 000 genes. The effectiveness of the proposed GS method for scrna-seq cell classification compared with eight state-of-the-art methods is validated. Gene expression analysis results of selected genes further validated the significance of the genes selected by the proposed method in the classification of scrna-seqdata.
暂无评论