咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Improving Web Document Cluster... 收藏

Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques

Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques

作     者:李鹏 王斌 晋薇 

作者机构:Institute of Computing TechnologyChinese Academy of Sciences Department of Computer ScienceNorth Dakota State University1340 Administration Ave.FargoND 58102U.S.A. 

出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))

年 卷 期:2012年第27卷第3期

页      面:554-566页

核心收录:

学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National Natural Science Foundation of China under Grant No. 61070111 

主  题:web document clustering social bookmarking topic model tag expansion 

摘      要:As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分