版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Jawaharlal Nehru Technol Univ Dept Comp Sci & Engn Anantapur India
出 版 物:《INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS》 (国际人工智能工具杂志)
年 卷 期:2024年第33卷第1期
页 面:2350061-2350061页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Dirichlet process k-means clustering machine learning particle swarm optimizer text document clustering
摘 要:In the present digital era, vast amounts of data are generated by millions of Internet users in the form of unstructured text documents. The clustering and organizing of text documents play a crucial role in the applications of data analysis and market research. In this research manuscript, a new modified version of metaheuristic-based optimization technique is proposed with k-means for clustering the text documents. In the initial phase, the input data are acquired from the three-benchmark databases such as Reuters-21578, 20-Newsgroup and British Broadcasting Corporation (BBC)-sport. Further, the data denoising is accomplished by using the common techniques: stemming, lemmatization, tokenization, and stop word removal. In addition to this, the denoised data are transformed into feature vectors by utilizing Term Frequency (TF)-Inverse Document Frequency (IDF) technique. The computed feature vectors are given to the Modified Particle Swarm Optimization (MPSO) with k-means to group the closely related text documents by minimizing the similarity in different clusters. The experimental examination showed that the proposed MPSO with k-means model achieved accuracy of 0.85, 0.85 and 0.86 on the Reuters-21578, 20-Newsgroup and BBC-sport databases, which are superior to the comparative models.