检索结果-内蒙古大学图书馆

Simulated annealing-based text clustering

PATTERN RECOGNITION LETTERS 2025年 193卷 128-134页

作者： Chikhi, Nacim Fateh Univ Blida 1 Fac Sci Dept Comp Sci BP 270 Route Soumaa Blida 09000 Algeria

Like traditional K-means, the main drawback of spherical K-means is its high sensitivity to the initialization of centroids. This issue can cause the algorithm to converge to poor local optima, resulting in clusters that do not accurately reflect the true structure of the data. In this paper, we propose two new text clustering algorithms that are less sensitive to initialization and that significantly improve clustering performance. The first algorithm employs simulated annealing to avoid getting trapped in poor local optima. The second algorithm, a relaxed version of simulated annealing, also uses randomization to escape poor local optima but requires significantly fewer computations than the first algorithm. The two algorithms are extensively evaluated across more than thirty text datasets. Experimental results demonstrate that the proposed approaches significantly outperform well-established text clustering algorithms in terms of clustering quality. Furthermore, the second algorithm is as efficient as standard spherical K-means regarding clustering speed, as both have the same time complexity. Finally, an important advantage of the proposed algorithms is that they can be applied to other domains involving directional data, such as recommender systems, social network analysis, image analysis, and more.

关键词： text clustering Spherical K-means Simulated annealing

来源：评论

学校读者我要写书评

暂无评论

(Chinavis 2024) textLens: large language models-powered visual analytics enhancing text clustering

引用

JOURNAL OF VISUALIZATION 2025年 1-19页

作者： Peng, Ruixiao Dong, Yu Li, Guan Tian, Dong Shan, Guihua Chinese Acad Sci Comp Network Informat Ctr Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Univ Chinese Acad Sci Hangzhou Inst Adv Study Hangzhou Zhejiang Peoples R China

text clustering is a cornerstone task in natural language processing with a broad spectrum of applications. Given the advancements in large language models, employing such models to enhance general text clustering has shown promising potential in boosting clustering effectiveness. However, current LLMs-driven approaches often act as black boxes in analyzing the processes of text clustering, leading to poor interpretability. Additionally, these approaches are associated with significant API usage costs and lack effective techniques to explore cluster details. To align these challenges, we propose an LLMs-powered visual analytics approach, called textLens, to enhance text clustering. First, we present an LLMs-powered framework that integrated LLMs for guiding topic extraction, anomaly filtering, and modification assessment. Second, we introduce a visual analytics system designed to support proposed framework, which facilitates interactive exploration of clusters, analysis of cluster-level thematic extraction, and iterative refinement of clustering results. Finally, we conduct evaluations by applying two datasets into four case studies and a user study to compare clustering outcomes with previous methods, demonstrating the effectiveness and scalability of our approach.

关键词： text clustering Large Language Models Visual Analytics Natural Language Processing

来源：评论

学校读者我要写书评

暂无评论

text clustering with a hybrid multi-objective optimization approach: The multi-objective firefly differential Jaya Algorithm

引用

SWARM AND EVOLUTIONARY COMPUTATION 2025年 93卷

作者： Naderi, Muhammad Amiri, Maryam Arak Univ Fac Engn Dept Comp Engn Arak *** Iran

The exponential growth of unstructured text data generated by internet users has created an urgent need for efficient organization methods to uncover valuable insights. text clustering, a widely used data mining approach, often relies on single-objective optimization, which can struggle to deliver optimal results for datasets with diverse clustering criteria. To address these challenges, we propose the Multi-objective Firefly Differential Jaya (MFDJ) algorithm, a novel nature-inspired optimization method designed to enhance text clustering. MFDJ integrates the strengths of NSGA-II, a well-established multi-objective optimization framework, with three complementary algorithms: the Firefly algorithm for swarm intelligence-based optimization, Differential Evolution for robust exploration through mutation, and the Jaya algorithm for parameter-free improvement leveraging both the best and worst solutions. This synergy significantly enhances the algorithm's ability to balance exploration and exploitation, yielding superior clustering performance. We evaluated MFDJ on eight benchmark text datasets, where it demonstrated consistent superiority over state-of-the-art methods, including NSGA-II and MOMDE. On average, MFDJ achieved a 67.89% improvement in F-measure over NSGA-II and a 5.87% improvement over MOMDE, while also exhibiting better convergence properties for the majority of datasets. These results underscore the capability of MFDJ to generate high-quality clusters, making it a versatile tool for tackling complex text clustering and broader optimization challenges.

关键词： Evolutionary computation text clustering Multi-objective optimization Optimization algorithms

来源：评论

学校读者我要写书评

暂无评论

text clustering with feature selection by using statistical data

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2008年第5期20卷 641-652页

作者： Li, Yanjun Luo, Congnan Chung, Soon M. Fordham Univ Dept Comp & Informat Sci Bronx NY 10458 USA Teradata Corp San Diego CA 92127 USA Wright State Univ Dept Comp Sci & Engn Dayton OH 45435 USA

Feature, selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. In this paper, we propose a new supervised feature selection method, named CHIR, which is based on the chi(2) statistic and new statistical data that can measure the positive term-category dependency. We also propose a new text clustering algorithm, named text clustering with Feature Selection (TCFS). TCFS can incorporate CHIR to identify relevant features (i.e., terms) iteratively, and the clustering becomes 6 learning process. We compared TCFS and the K-means clustering algorithm in combination with different feature selection methods for various real data sets. Our experimental results show that TCFS with CHIR has better clustering accuracy in terms of the F-measure and the purity.

关键词： text clustering text mining chi(2) Statistic feature selection performance analysis

来源：评论

学校读者我要写书评

暂无评论

text clustering using VSM with feature clusters

引用

NEURAL COMPUTING & APPLICATIONS 2015年第4期26卷 995-1003页

作者： Cao Qimin Guo Qiao Wang Yongliang Wu Xianghua Beijing Inst Technol Sch Automat Beijing 100081 Peoples R China

Representation of documents is the basis of clustering systems. In addition, non-contiguous phrases appear more and more frequent in the text in the Web 2.0 age, and these phrases can affect the result of text clustering. In order to improve the quality of text clustering, this paper proposed a feature cluster-based vector space model (FC-VSM) which used the text feature clusters co-occurrence matrix to represent document and proposed to identify non-contiguous phrases in the text preprocessing stage. Our method can reduce dimension of features compared with the traditional VSM-based model. It identified non-contiguous phrases, used distributed representation of features, and implements feature clusters. Despite their simplicity, our methods are surprisingly effective and can improve the accuracy of clustering significantly which is shown in experimental results.

关键词： text clustering Feature clusters Distributed representation FC-VSM Non-contiguous phrases

来源：评论

学校读者我要写书评

暂无评论

text clustering with Seeds Affinity Propagation

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2011年第4期23卷 627-637页

作者： Guan, Renchu Shi, Xiaohu Marchese, Maurizio Yang, Chen Liang, Yanchun Jilin Univ Coll Comp Sci & Technol Changchun 130012 Peoples R China Univ Trent Fac Sci Dept Informat Engn & Comp Sci I-38100 Trento Italy Jilin Univ Coll Earth Sci Changchun 130061 Peoples R China

Based on an effective clustering algorithm-Affinity Propagation (AP)-we present in this paper a novel semisupervised text clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions in our approach: 1) a new similarity metric that captures the structural information of texts, and 2) a novel seed construction method to improve the semisupervised clustering process. To study the performance of the new algorithm, we applied it to the benchmark data set Reuters-21578 and compared it to two state-of-the-art clustering algorithms, namely, k-means algorithm and the original AP algorithm. Furthermore, we have analyzed the individual impact of the two proposed contributions. Results show that the proposed similarity metric is more effective in text clustering (F-measures ca. 21 percent higher than in the AP algorithm) and the proposed semisupervised strategy achieves both better clustering results and faster convergence (using only 76 percent iterations of the original AP). The complete SAP algorithm obtains higher F-measure (ca. 40 percent improvement over k-means and AP) and lower entropy (ca. 28 percent decrease over k-means and AP), improves significantly clustering execution time (20 times faster) in respect that k-means, and provides enhanced robustness compared with all other methods.

关键词： Affinity propagation text clustering cofeature set unilateral feature set significant cofeature set

来源：评论

学校读者我要写书评

暂无评论

text clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence

Text Clustering Algorithm Based on the Graph Structures of S...

引用

International Conference on Information System and Artificial Intelligence (ISAI)

作者： Jin, Chun-Xia Bai, Qiu-Chan Huaiyin Inst Technol Fac Comp & Software Engn Huaian Peoples R China Huaiyin Inst Technol Fac Automat Huaian Peoples R China

ISBN: (纸本)9781509015856

text theme is the key of text clustering, while the co-occurrence words can be very stronger to express text theme in document. This paper proposes a text clustering algorithm based on the text semantic representation and the graph structure of word co-occurrence on the basis of in-depth studying text theme mining and word co-occurrence. First, the algorithm constructs the text graph-structure according to the co-occurrence of feature words. In other words, it uses the graph structure to represent all texts. Then, it adopts the maximum common sub-graph between two texts to calculate their similarity and combines with K-means clustering algorithm to realize the document clustering. The compared experimental results with hierarchical clustering algorithm show the K-means clustering algorithm based on the graph structures of word co-occurrence greatly reduce the high dimension of text vector and the algorithm complexity, significantly improves the efficiency and accuracy of text clustering, and it can also produce the clustering effect of good quality.

关键词： co-occurrence unit graph structure maximum common sub-graph text clustering

来源：评论

学校读者我要写书评

暂无评论

text clustering Based on Domain Ontology and Latent Semantic Analysis

Text Clustering Based on Domain Ontology and Latent Semantic...

引用

International Conference on Mechatronics Engineering and Computing Technology (ICMECT)

作者： Li Yaxiong Pan Deng Hubei Univ Sci & Technol Network Management Ctr Wuhan 437100 Hubei Peoples R China Hubei Univ Sci & Technol Sch Foreign Languages Wuhan 437100 Hubei Peoples R China

ISBN: (纸本)9783038351153

One key step in text mining is the categorization of texts, i. e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in the pre-processing stage of text clustering, substituting the initial matrix (lexicon-text matrix) in the latent semantic analysis with concept-text matrix. In the clustering analysis stage, this model adopts semantic similarity, partially overcoming the difficulty in accurately and effectively evaluating the degree of similarity of text due to simply taking into account the frequency of words and/or phrases in the text. Experimental results indicate that this method is helpful in improving the result of text clustering.

关键词： Domain Ontology Latent Semantic Analysis Concept-text Matrix text clustering

来源：评论

学校读者我要写书评

暂无评论

Deep Feature-Based text clustering and its Explanation

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2022年第8期34卷 3669-3680页

作者： Guan, Renchu Zhang, Hao Liang, Yanchun Giunchiglia, Fausto Huang, Lan Feng, Xiaoyue Jilin Univ Coll Comp Sci & Technol Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Jilin Univ Zhuhai Coll Key Lab Symbol Computat & Knowledge Engn Zhuhai LabMinist Educ Zhuhai 519041 Peoples R China Univ Trento DISI I-38122 Trento Italy

text clustering is a critical step in text data analysis and has been extensively studied by the text mining community. Most existing text clustering algorithms are based on the bag-of-words model, which faces the high-dimensional and sparsity problems and ignores text structural and sequence information. Deep learning-based models such as convolutional neural networks and recurrent neural networks regard texts as sequences but lack supervised signals and explainable results. In this paper, we propose a deep feature-based text clustering (DFTC) framework that incorporates pretrained text encoders into text clustering tasks. This model, which is based on sequence representations, breaks the dependency on supervision. The experimental results show that our model outperforms classic text clustering algorithms and the state-of-the-art pretrained language model, i.e., BERT, on almost all the considered datasets. In addition, the explanation of the clustering results is significant for understanding the principles of the deep learning approach. Our proposed clustering framework includes an explanation module that can help users understand the meaning and quality of the clustering results.

关键词： Task analysis Computational modeling Feature extraction clustering algorithms Semantics Data models Recurrent neural networks Deep learning explanation model feature extraction text clustering transfer learning

来源：评论

学校读者我要写书评

暂无评论

text clustering as graph community detection 8

Text clustering as graph community detection

引用

8th Annual International Conference of the Biologically-Inspired-Cognitive Architectures-Society on Biologically Inspired Cognitive Architectures (BICA)

作者： Mikhina, Elizaveta K. Trifalenkov, Vsevolod, I Natl Res Nucl Univ MEPhI Moscow Engn Phys Inst Kashirskoe Highway 31 Moscow 115409 Russia

This article suggests a method of text clustering that does not depend on any user-set parameters. text documents and connections between them are represented as graph nodes and edges and graph community detection method is thus applied to the text clustering problem. The method was tested against news articles collections and proved effective manual and automatic clustering of text documents in collections were same or really close. (C) 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/3.0/)Peer-review under responsibility of the scientific committee of the 8th Annual International Conference on Biologically Inspired Cognitive Architectures

关键词： text clustering non-parameter clustering graph community detection modularity

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：