版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:UC San Diego United States University of Michigan United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2024年
核心收录:
主 题:Adaptive algorithms
摘 要:Recovering the underlying clustering of a set U of n points by asking pair-wise same-cluster queries has garnered significant interest in the past few years. Given a query S ⊂ U, |S| = 2, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be Θ(nk), where k is the number of clusters. However, non-adaptive schemes allow for the querying process to be parallelized, which is a highly desirable property. It turns out that non-adaptive pair-wise query algorithms are extremely limited for the above problem: even for k = 3, such algorithms require Ω(n2) queries, which matches the trivial O(n2) upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a natural generalization of this problem to subset queries for |S| 2, where the oracle returns the number of clusters intersecting S. Our aim is to determine the minimum number of queries needed for exactly recovering an arbitrary k-clustering. Allowing for subset queries of unbounded size, O(n) queries are sufficient with an adaptive scheme. However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making O(n log k·(log k+log log n)2) queries, which improves to O(n log log n) when k is a constant. In addition to non-adaptivity, we take into account other practical considerations, such as enforcing a bound, s, on the query size. In this setting we prove that Ω(max(n2/s2, n)) queries are necessary and obtain algorithms making Õ(n2k/s2) queries for any s ≤ √n and Õ(n2/s) queries for any s ≤ n. In particular, our first algorithm is optimal up to log n factors when k is constant. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make O(n log k