版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:United Int Univ Dept Comp Sci & Engn Plot 2Madani Ave Dhaka 1212 Bangladesh
出 版 物:《JOURNAL OF THEORETICAL BIOLOGY》 (理论生物学杂志)
年 卷 期:2019年第460卷
页 面:64-78页
核心收录:
学科分类:0710[理学-生物学] 07[理学] 09[农学]
主 题:DNA binding proteins Classification algorithm Sequence based features Feature selection Handling overfitting Independent test set
摘 要:DNA-binding proteins (DBPs) are responsible for several cellular functions, starting from our immunity system to the transport of oxygen. In the recent studies, scientists have used supervised machine learning based methods that use information from the protein sequence only to classify the DBPs. Most of the methods work effectively on the train sets but performance of most of them degrades in the independent test set. It shows a room for improving the prediction method by reducing over-fitting. In this paper, we have extracted several features solely using the protein sequence and carried out two different types of feature selection on them. Our results have proven comparable on training set and significantly improved on the independent test set. On the independent test set our accuracy was 82.26% which is 1.62% improved compared to the previous best state-of-the-art methods. Performance in terms of sensitivity and area under receiver operating characteristic curve for the independent test set was also higher and they were 0.95 and 0.823 respectively. (C) 2018 Elsevier Ltd. All rights reserved.