multi-view document clustering (MVDC) is a sophisticated approach in natural language processing that leverages multiple representations or views of data to improve clustering performance. Existing solutions are chall...
详细信息
multi-view document clustering (MVDC) is a sophisticated approach in natural language processing that leverages multiple representations or views of data to improve clustering performance. Existing solutions are challenging due to inconsistency of documentviews, high dimensions, and sparseness in text documents. On the other hand, existing MVDC-based methods often depend on the performance of bag-of-words and pretrained language models. However, these models usually do not consider contextual semantics and are suitable for single-viewdocumentclustering. This paper addresses these challenges by proposing a deep MVDC model that utilizes enhanced semantic embedding and consistent context semantics (SECS). SECS uses semantic embedding to address high-dimensional challenges by considering complementary semantic information. Meanwhile, SECS takes advantage of the potential benefits of view-consistent context semantics based on pretrained language models. The proposed model captures intricate semantic relationships between words and documents through advanced embedding techniques, ensuring a richer and more nuanced representation of textual content. Furthermore, by incorporating consistent context semantics, SECS maintains contextual integrity across multiple views, leading to more coherent and meaningful clusters. Experimental results on benchmark datasets demonstrate the superiority of our model over state-of-the-art MVDC methods, highlighting its effectiveness in improving clustering quality and interpretability.
multi-view document clustering, which learns common representations from multiple views to achieve consistent partition, has emerged lots of increasing work. Though promising performance has been demonstrated in vario...
详细信息
ISBN:
(数字)9783031171208
ISBN:
(纸本)9783031171208;9783031171192
multi-view document clustering, which learns common representations from multiple views to achieve consistent partition, has emerged lots of increasing work. Though promising performance has been demonstrated in various applications, their view representations are learned with no consideration of achieving a consistent clustering partition. In this paper, we propose a multi-view document clustering model with Joint Contrastive learning (MCJC) to address the aforementioned issue. Our model learns the view representations with a joint contrastive learning module by introducing a task-specific objective so that it can effectively achieve consistency both in cluster-wise and featurewise hidden spaces. Meanwhile, in the clustering module, we collect the view-level cluster agreement and document-level clustering partition to refine the contrastive learning and obtain document assignments. As a result, the proposed model can use a joint contrastive module to learn clustering-friendly representation and through multi-level clustering to achieve better clustering performance. Extensive experiments on real datasets demonstrate that our model achieves state-of-the-art clustering effectiveness.
multi-view document clustering, which aims to discover clustering partitions based on multiple documentviews, has attracted increasing research interest. However, the potential advantages of incorporating context sem...
详细信息
multi-view document clustering, which aims to discover clustering partitions based on multiple documentviews, has attracted increasing research interest. However, the potential advantages of incorporating context semantics to enhance multi-view document clustering are yet to be fully explored. To address the above limitation, we propose a deep multi-view document clustering model that explores consistent context semantics called CSMDC which consists of three modules. Specifically, a novel view-translator is designed to convert non-contextual documentviews into contextual views. With its help, all documentviews can be processed to obtain their semantic representations within the view-translator representation learning module. Then the data-based view consistency self-supervising module is developed to fine-tune the semantic representations of documentviews by jointly incorporating view-wise representation relevance and consistent clustering assignments. Additionally, the task-based documentclustering module is employed to simultaneously improve the view semantic representations and documentclustering results. To the best of our knowledge, this is the first study to explicitly apply consistent context semantics under the guidance of data-based and task- based objectives in multi-view document clustering. Comprehensive experimental results demonstrated the effectiveness of the proposed model.
documentclustering, a fundamental task in natural language processing, aims to divede large collections of documents into meaningful groups based on their similarities. multi -viewdocumentclustering (MvDC) has emer...
详细信息
documentclustering, a fundamental task in natural language processing, aims to divede large collections of documents into meaningful groups based on their similarities. multi -viewdocumentclustering (MvDC) has emerged as a promising approach, leveraging information from diverse views to improve clustering accuracy and robustness. However, existing multi -viewclustering methods suffer from two issues: (1) a lack of interrelations across documents during consensus semantic learning;(2) the neglect of consensus structure mining in the multi -viewdocumentclustering. To address these issues, we propose a Hierarchical Consensus Learning model for multi -viewdocumentclustering, termed as MvDC-HCL. Our model incorporates two key modules: The Data -oriented Consensus Semantic Learning (CSeL) module focuses on learning consensus semantics across various views by leveraging a hybrid contrastive consensus objective. The Task -oriented Consensus Structure clustering (CStC) module employs a gated fusion network and clustering -driven structure contrastive learning to mine consensus structures effectively. Specifically, CSeL module constructs a contrastive consensus learning objective based on intra-sample and inter -sample relationships in multi -view data, aiming to optimize the view semantic representations obtained by the semantic learner. This facilitates consistent semantic learning across various views of the same sample and consistent relationship learning among samples from different views. Then, the learned view semantic representations are fed into the fusion network of CStC to obtain fused sample semantic representations. Together with the view semantic representations, sample -level and view -level clustering structures are derived for consensus structure mining. Additionally, CStC introduces clustering -driven objectives to guide consensus structure mining and achieve consistent clustering results. By hierarchically extracting implicit consensus semantics and
暂无评论