文献详情 >Hierarchical Dirichlet Multino... 收藏

Hierarchical Dirichlet Multinomial Allocation Model for Multi-Source Document Clustering

作者：Huang, Ruizhang Xu, Weijia Qin, Yongbin Chen, Yanping

作者机构：Guizhou Univ Coll Comp Sci & Technol Guiyang 550025 Peoples R China Guizhou Intelligent Human Comp Interact Engn Tech Guiyang 550025 Peoples R China Guizhou Univ Guizhou Prov Key Lab Publ Big Data Guiyang 550025 Peoples R China

出版物：《IEEE ACCESS》 (IEEE Access)

年卷期：2020年第8卷

页面：109917-109927页

核心收录：

基　　金：National Natural Science Foundation of China [U1836205] Major Research Program of National Natural Science Foundation of China Major Special Science and Technology Projects of Guizhou Province [3002] Key Projects of Science and Technology of Guizhou Province [ 1Z055]

主　　题：Clustering algorithms Data models Resource management Partitioning algorithms Clustering methods Social networking (online) Licenses Document clustering multi-source document clustering Dirichlet distribution Gibbs sampling

摘要：Mining a document structure from multiple data sources in terms of their underlying topics has become an important task of document clustering. The traditional document clustering approach cannot be applied directly to the multi-source document clustering problem. There are three typical difficulties: 1) The topics of different data sources are related but not the same. 2) Usually, each data source has its own focus on topics. 3) The number of clusters of the data sources are not necessarily the same and are not known beforehand. In this paper, based on our previous research, we design a novel multi-source document clustering model, namely, the hierarchical Dirichlet multinomial allocation (HDMA) model, to solve all the above problems. The HDMA model is investigated with a two-step hierarchical topic generation process. Topics are learnt to share their general characteristics across data source, while at the same time preserve the local characteristics of the data source. Each data source is applied with an exclusive topic partition to learn the source-level topic emphasis. A Gibbs sampling algorithm is then used to learn the number of clusters for each data source as well as the parameters of the HDMA model at the same time. Experimental results demonstrate that the HDMA model is effective.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Hierarchical Dirichlet Multinomial Allocation Model for Multi-Source Document Clustering

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Hierarchical Dirichlet Multinomial Allocation Model for Multi-Source Document Clustering

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：