The min-sum k-clustering problem is to partition a metric space ( P, d) into k clusters C-1,..., C-k subset of P such that Sigma(k)(i=1) Sigma(p,q is an element of Ci) d(p, q) is minimized. We show the first efficient...
详细信息
ISBN:
(纸本)9783540709176
The min-sum k-clustering problem is to partition a metric space ( P, d) into k clusters C-1,..., C-k subset of P such that Sigma(k)(i=1) Sigma(p,q is an element of Ci) d(p, q) is minimized. We show the first efficient construction of a coreset for this problem. Our coreset construction is based on a new adaptive sampling algorithm. With our construction of coresets we obtain two main algorithmic results. The first result is a sublinear-time ( 4 + epsilon)-approximation algorithm for the minsum k-clustering problem in metric spaces. The running time of this algorithm is O( n) for any constant k and epsilon, and it is o( n2) for all k = o( log n/ log log n). Since the full description size of the input is Theta(n(2)), this is sublinear in the input size. The fastest previously known o( log n)-factor approximation algorithm for k > 2 achieved a running time of Omega(nk), and no non-trivial o(n(2))-time algorithm was known before. Our second result is the first pass-efficient data streaming algorithm for min-sum k-clustering in the distance oracle model, i.e., an algorithm that uses poly( log n, k) space and makes 2 passes over the input point set, which arrives in form of a data stream in arbitrary order. It computes an implicit representation of a clustering of ( P, d) with cost at most a constant factor larger than that of an optimal partition. Using one further pass, we can assign each point to its corresponding cluster. To develop the coresets, we introduce the concept of alpha-preserving metric embeddings. Such an embedding satisfies properties that the distance between any pair of points does not decrease and the cost of an optimal solution for the considered problem on input ( P, d')is within a constant factor of the optimal solution on input ( P, d). In other words, the goal is to find a metric embedding into a ( structurally simpler) metric space that approximates the original metric up to a factor of a with respect to a given problem. We believe that this con
In this paper, we revisit the problem of sampling edges in an unknown graph G = (V, E) from a distribution that is (pointwise) almost uniform over E. We consider the case where there is some a priori upper bound on th...
详细信息
Locally decodable codes are error-correcting codes that admit efficient decoding algorithms;any bit of the original message can be recovered by looking at only a small number of locations of a corrupted codeword. The ...
详细信息
Locally decodable codes are error-correcting codes that admit efficient decoding algorithms;any bit of the original message can be recovered by looking at only a small number of locations of a corrupted codeword. The tradeoff between the rate of a code and the locality/efficiency of its decoding algorithms has been well studied, and it has widely been suspected that nontrivial locality must come at the price of low rate. A particular setting of potential interest in practice is codes of constant rate. For such codes, decoding algorithms with locality O(k(epsilon)) were known only for codes of rate c(Omega(1/epsilon)), where k is the length of the message. Furthermore, for codes of rate > 1/2, no nontrivial locality had been achieved. In this article, we construct a new family of locally decodable codes that have very efficient local decoding algorithms, and at the same time have rate approaching 1. We show that for every epsilon > 0 and alpha > 0, for infinitely many k, there exists a code C which encodes messages of length k with rate 1 -alpha, and is locally decodable from a constant fraction of errors using O(k(epsilon)) queries and time. These codes, which we call multiplicity codes, are based on evaluating multivariate polynomials and their derivatives. Multiplicity codes extend traditional multivariate polynomial codes;they inherit the local-decodability of these codes, and at the same time achieve better tradeoffs and flexibility in the rate and minimum distance.
We give two fully dynamic algorithms that maintain a (1 + epsilon)-approximation of the weight M of a minimum spanning forest (MSF) of an n-node graph G with edges weights in [1, W], for any epsilon > 0. (1) Our de...
详细信息
We give two fully dynamic algorithms that maintain a (1 + epsilon)-approximation of the weight M of a minimum spanning forest (MSF) of an n-node graph G with edges weights in [1, W], for any epsilon > 0. (1) Our deterministic algorithm takes O(W2 log W /epsilon(3)) worst-case update time, which is O (1) if both W and E are constants. (2) Our randomized (Monte -Carlo style) algorithm works with high probability and runs in worst-case O (log W /epsilon(4)) update time if W = O((m*)(1/6)/log(2/3) n), where m* is the minimum number of edges in the graph throughout all the updates. It works even against an adaptive adversary. We complement our algorithmic results with two cell-probe lower bounds for dynamically maintaining an approximation of the weight of an MSF of a graph. (C) 2021 Elsevier Inc. All rights reserved.
暂无评论