咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Distributional word clusters v... 收藏

Distributional word clusters vs. words for text categorization

作     者:Bekkerman, Ron El-Yaniv, Ran Tishby, Naftali Winter, Yoad 

作者机构:Department of Computer Science Technion - Israel Institute of Technology Haifa 32000 Israel School of Computer Science and Engineering Center for Neural Computation Hebrew University Jerusalem 91904 Israel 

出 版 物:《Journal of Machine Learning Research》 (J. Mach. Learn. Res.)

年 卷 期:2003年第3卷

页      面:1183-1208页

核心收录:

学科分类:1205[管理学-图书情报与档案管理] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:US Constitution  Articles I  II  and III. Ibid . 14 CFR §§400-1199 (2008). 49 USC §70101 (2000  Suppl. 2004). See  e.g.  Project of the Nuclear Age Peace Foundation  Presidential Directive on National Space Policy  http://nuclearfiles.org/menu/key-issues/space-weapons/issues/national-space-policy-presidential-directive.html (last visited 1 October 2008). T.R. Hughes E. Rosenberg 

主  题:Support vector machines 

摘      要:We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with word-cluster representation is compared with SVM-based categorization using the simpler bag-of-words (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the word-based representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters-21578 and WebKB) the word-based representation slightly outperforms the word-cluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets. © 2003 Ron Bekkerman, Ran El-Yaniv, Naftali Tishby, and Yoad Winter.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分