版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Siksha O Anusandhan Ctr Internet Things Dept Comp Sci & Engn Bhubaneswar 751030 India Siksha O Anusandhan Dept Comp Sci & Informat Technol Bhubaneswar 751030 India Siksha O Anusandhan Dept Comp Sci & Engn Bhubaneswar 751030 India
出 版 物:《IEEE ACCESS》 (IEEE Access)
年 卷 期:2025年第13卷
页 面:17673-17682页
核心收录:
基 金:Siksha 'O' Anusandhan (Deemed to be University) Bhubaneswar Odisha India
主 题:Breast cancer Support vector machines Accuracy Feature extraction Genetic algorithms Machine learning Gene expression Pipelines Machine learning algorithms Vectors Biomarker breast cancer feature selection LASSO modified compact genetic algorithm
摘 要:Breast cancer is the most common cancer type among females and is one of the leading causes of death worldwide. Being a heterogeneous disease, subtyping breast cancer plays a vital role in its treatment. In this regard, gene expression plays an important role. Thus, in this work gene expression data is used to identify the most significant gene biomarkers. The identified biomarkers are highly associated with each breast cancer subtype such as Luminal A, Luminal B, HER2-Enriched and Basal-Like. To identify such biomarkers, initially LASSO in association with four machine learning models such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and Naive Bayes (NB) are applied on the dataset to find the initial reduced set of genes as well as the best learning model based on classification accuracy;SVM in this case. Thereafter, Modified Compact Genetic Algorithm (mCGA) is performed to identify the final set of genes as biomarkers for each specific subtype. Experimental results suggest that our proposed method assesses AUC-ROC values of 0.9878 and 0.97311 for LumA and LumB and 1 for Basal and HER2 subtypes. To validate the biological significance of the identified biomarkers, KEGG pathway and GO enrichment analysis are carried out.