咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Topological K-means clustering... 收藏

Topological K-means clustering in reproducing kernel Hilbert spaces

作     者:Dixon, Matthew Chen, Yuzhou Gel, Yulia R. 

作者机构:IIT Dept Appl Math Chicago IL 60616 USA Univ Calif Riverside Dept Stat Riverside CA USA Virginia Tech VA & Natl Sci Fdn Dept Stat Blacksburg VA USA 

出 版 物:《ELECTRONIC JOURNAL OF STATISTICS》 (Electron. J. Stat.)

年 卷 期:2025年第19卷第1期

页      面:204-239页

核心收录:

基  金:NSF [TIP-2333703] ONR [N00014-21-1-2530] 

主  题:Topological data analysis K-means clustering reproducing kernel Hilbert spaces clustering stability 

摘      要:We propose a new topological clustering methodology, based on generalizing an empirical risk minimization framework, using a reproducing kernel Hilbert space (RKHS) for vectorized persistent homology representations of point clouds. In contrast to conventional Euclidean-based clustering methods which address only pairwise similarity among data points, our new approach of topological K-means clusters data based on similarity of shapes which are exhibited by the local vicinity of each data point at multiple scales. Thereby, topological clustering systematically captures the inherent local and global higher order data characteristics that are otherwise inaccessible with Euclidean-based clustering. We summarize the extracted shape characteristics of each local vicinity in the form of a persistence diagram (PD) and embed the PDs into a RKHS, which induces a distance among shapes of local vicinities in Hilbert space. Our derived theoretical guarantees on stability and consistency of the topological partitions are the first theoretical results of this kind at the intersection of topological data analysis and statistical inference. Additionally, we establish a number of new theoretical results on bounds of covering numbers in Hilbert spaces which are of independent interest in statistical learning theory. We demonstrate the superior performance of the new topological K-means clustering on simulations and the US COVID-19 data.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分