版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Chengdu Univ Informat Technol Sch Comp Sci Chengdu 610225 Peoples R China Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 611731 Peoples R China Chengdu Univ Informat Technol Sch Software Engn Chengdu 610225 Peoples R China Chengdu Univ Informat Technol Sch Management Chengdu 610103 Peoples R China
出 版 物:《EXPERT SYSTEMS WITH APPLICATIONS》 (专家系统及其应用)
年 卷 期:2021年第183卷
页 面:115404-115404页
核心收录:
学科分类:1201[管理学-管理科学与工程(可授管理学、工学学位)] 0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China [61772091, 61802035, 61702058, 61962006, 61962038, U1802271, U2001212, 62072311] China Postdoctoral Science Foundation [2017M612948] CCF-Huawei Database System Innovation Research Plan [CCF-HuaweiDBIR2020004A] Sichuan Science and Technology Program [2021JDJQ0021, 2020YFG0153, 20YYJC2785, 2020YJ0481, 2020YFS0466, 2020YJ0430, 2020JDR0164, 2020YFS0399, 2019YFS0067] Natural Science Foundation of Guangxi [2018GXNSFDA138005] Guangdong Basic and Applied Basic Research Foundation [2020B1515120028] Guangxi Bagui Teams for Innovation and Research Major Project of Digital Key Laboratory of Sichuan Province in Sichuan Conservatory of Music [21DMAKL02]
主 题:Transcription factor binding sites Convolutional neural networks Motif discovery Bioinformatics Autoencoder
摘 要:Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.