版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Jilin Univ Sch Business & Management Changchun Peoples R China Jilin Univ Informat Resource Res Ctr Changchun Peoples R China Nanjing Normal Univ Sch Journalism & Commun Dept Network & New Media Nanjing Peoples R China
出 版 物:《DATA TECHNOLOGIES AND APPLICATIONS》 (Data Technol. Appl.)
年 卷 期:2022年第58卷第3期
页 面:1-27页
核心收录:
基 金:Major Project of the National Social Science Foundation of China [20ZD125] National Natural Science Foundation of Jilin Province [20210101480JC]
主 题:Data clustering Density peak clustering algorithm Merging strategy Pinhole imaging strategy Point-domain Point-domain similarity
摘 要:Purpose The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. rho value (local density) and delta value (the distance between a point and another point with a higher rho value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher rho value and a higher delta value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP. Design/methodology/approach First, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results. Findings The experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms. Originality/value The authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.