咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A New Hybrid Feature Subset Se... 收藏

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

基于二进制基因算法和信息理论的一个新混合特征子集选择框架

作     者:Shukla, Alok Kumar Singh, Pradeep Vardhan, Manu 

作者机构:Dept Comp Sci & Engn Raipur Madhya Pradesh India 

出 版 物:《INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS》 (国际计算智能及应用杂志)

年 卷 期:2019年第18卷第3期

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:Binary genetic algorithm bioinformatics conditional mutual information maximization k-nearest neighbor microarray 

摘      要:The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using /nearest neighbor (k-NN) and highes

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分