版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Menoufia Univ Fac Elect Engn Comp Sci & Engn Dept Menoufia Egypt Delta Univ Sci & Technol Fac Engn Gamasa Egypt
出 版 物:《PEERJ COMPUTER SCIENCE》 (PeerJ Comput. Sci.)
年 卷 期:2021年第7卷
页 面:e492页
核心收录:
基 金:The authors received no funding for this work
主 题:Survival analysis Deep learning Autoencoder Breast cancer Cox regression Graphical processing unit RNAseq data
摘 要:Background. Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge. Methods. In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes. Results. We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting