版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Maine LIUM Lab Informat EA 4023 IC2Inst Informat Claude Chappe Ave Olivier Messiaen F-72000 Le Mans France LORIA Equipe Synalp Batiment B F-54506 Vandoeuvre Les Nancy France Dalian Univ Technol WiseLAB Dalian 116024 Liaoning Peoples R China
出 版 物:《NEURAL COMPUTING & APPLICATIONS》 (神经网络计算与应用)
年 卷 期:2021年第33卷第19期
页 面:12939-12956页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Artificial Intelligence Data Mining and Knowledge Discovery Probability and Statistics in Computer Science Computational Science and Engineering Image Processing and Computer Vision Computational Biology/Bioinformatics
摘 要:This paper focuses on using feature salience to evaluate the quality of a partition when dealing with hard clustering. It is based on the hypothesis that a good partition is an easy to label partition, i.e. a partition for which each cluster is made of salient features. This approach is mostly compared to usual approaches relying on distances between data, but also to more recent approaches based on entropy or stability. We show that our feature-based approach outperforms the compared indexes for optimal model selection: they are more efficient from low- to high-dimensional range as well as they are more robust to noise. To show the efficiency of our indexes on a real-life application, we consider the task of diachronic analysis on a textual dataset. We demonstrate that our approach allows to get some interesting and relevant results in that context, while other approaches mostly lead to unusable results.