版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Computer Science Technion - Israel Institute of Technology Haifa 32000 Israel School of Computer Science and Engineering Center for Neural Computation Hebrew University Jerusalem 91904 Israel
出 版 物:《Journal of Machine Learning Research》 (J. Mach. Learn. Res.)
年 卷 期:2003年第3卷
页 面:1183-1208页
核心收录:
学科分类:1205[管理学-图书情报与档案管理] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:US Constitution Articles I II and III. Ibid . 14 CFR §§400-1199 (2008). 49 USC §70101 (2000 Suppl. 2004). See e.g. Project of the Nuclear Age Peace Foundation Presidential Directive on National Space Policy http://nuclearfiles.org/menu/key-issues/space-weapons/issues/national-space-policy-presidential-directive.html (last visited 1 October 2008). T.R. Hughes E. Rosenberg
摘 要:We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with word-cluster representation is compared with SVM-based categorization using the simpler bag-of-words (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the word-based representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters-21578 and WebKB) the word-based representation slightly outperforms the word-cluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets. © 2003 Ron Bekkerman, Ran El-Yaniv, Naftali Tishby, and Yoad Winter.