In this paper we present a sublinear-time (1 + epsilon)-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an n-point metric space. The running time of the algorithm is (O) over ...
详细信息
In this paper we present a sublinear-time (1 + epsilon)-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an n-point metric space. The running time of the algorithm is (O) over tilde (n/epsilon(O(1))). Since the full description of an n-point metric space is of size Theta(n(2)), the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in o(n) time the weight of the minimum spanning tree to within any factor. We also show that no deterministic algorithm can achieve a B-approximation in o(n(2)/B-3) time. Furthermore, it has been previously shown that no o(n(2)) algorithm exists that returns a spanning tree whose weight is within a constant times the optimum.
Recent works in graph visualization attempt to reduce the runtime of repulsion force computation of force-directed algorithms using sampling. However, they fail to reduce the runtime for attraction force computation t...
详细信息
Recent works in graph visualization attempt to reduce the runtime of repulsion force computation of force-directed algorithms using sampling. However, they fail to reduce the runtime for attraction force computation to sublinear in the number of edges. We present the sublinearForce framework for a fully sublinear-time force computation algorithm for drawing large complex graphs. More precisely, we present new sublinear-time algorithms for the attraction force computation of force-directed algorithms. We then integrate them with sublinear-time repulsion force computation to give a fully sublinear-time force computation. Extensive experiments show that our algorithms compute layouts on average 80% faster than the existing linear-time force computation algorithm, while obtaining significantly better quality metrics such as edge crossing and shape-based metrics.
The bin packing problem is defined as follows: given a set of n items with sizes 0 0, we present an algorithm A, that has sampling access to the input instance and outputs a value k such that C-opt <= k <= (1 +...
详细信息
The bin packing problem is defined as follows: given a set of n items with sizes 0 < w(1), w(2), .... w(n) <= 1, find a packing of these items into a minimum number Of Unit-size bins possible. We present a sublinear-time asymptotic approximation scheme for the bin packing problem;that is, for any epsilon > 0, we present an algorithm A, that has sampling access to the input instance and outputs a value k such that C-opt <= k <= (1 + epsilon) . C-opt + 1, where C-opt is the cost of an optimal solution. It is clear that uniform sampling by itself will not allow a sublinear-time algorithm in this setting;a small number of items might constitute most of the total weight and uniform samples will not hit them. In this work we use weighted samples, where item i is sampled with probability proportional to its weight: that is, with probability w(i)/Sigma w(i). In the presence of weighted samples, the approximation algorithm runs in (O) over tilde root n . poly(1/epsilon)) +g(1/x) time, where g(x) is an exponential function of x. When both weighted sampling and uniform sampling are allowed, (O) over tilde (n(1/3).poly(1/epsilon)) + g(1/epsilon) time suffices. In addition to an approximate value to Copt, our algorithm can also output a constant-size "template" of a packing that can later be used to find a near-optimal packing in linear time. (C) 2009 Elsevier B.V. All rights reserved.
We present sublinear-time (randomized) algorithms for finding simple cycles of length at least k3 and tree-minors in bounded-degree graphs. The complexity of these algorithms is related to the distance of the graph fr...
详细信息
We present sublinear-time (randomized) algorithms for finding simple cycles of length at least k3 and tree-minors in bounded-degree graphs. The complexity of these algorithms is related to the distance of the graph from being C-k-minor free (resp., free from having the corresponding tree-minor). In particular, if the graph is (1)-far from being cycle-free (i.e., a constant fraction of the edges must be deleted to make the graph cycle-free), then the algorithm finds a cycle of polylogarithmic length in time O(N), where N denotes the number of vertices. This time complexity is optimal up to polylogarithmic factors. The foregoing results are the outcome of our study of the complexity of one-sided error property testing algorithms in the bounded-degree graphs model. For example, we show that cycle-freeness of N-vertex graphs can be tested with one-sided error within time complexity O(poly(1/E)N), where denotes the proximity parameter. This matches the known (N) query lower bound for one-sided error cycle-freeness testing, and contrasts with the fact that any minor-free property admits a two-sided error tester of query complexity that only depends on . We show that the same upper bound holds for testing whether the input graph has a simple cycle of length at least k, for any k3. On the other hand, for any fixed tree T, we show that T-minor freeness has a one-sided error tester of query complexity that only depends on the proximity parameter . Our algorithm for finding cycles in bounded-degree graphs extends to general graphs, where distances are measured with respect to the actual number of edges. Such an extension is not possible with respect to finding tree-minors in o(N) complexity. (c) 2012 Wiley Periodicals, Inc. Random Struct. Alg., 45, 139-184, 2014
In this paper we develop a sublinear-time compressive sensing algorithm for approximating functions of many variables which are compressible in a given Bounded Orthonormal Product Basis (BOPB). The resulting algorithm...
详细信息
In this paper we develop a sublinear-time compressive sensing algorithm for approximating functions of many variables which are compressible in a given Bounded Orthonormal Product Basis (BOPB). The resulting algorithm is shown to both have an associated best s-term recovery guarantee in the given BOPB, and also to work well numerically for solving sparse approximation problems involving functions contained in the span of fairly general sets of as many as similar to 10(230) orthonormal basis functions. All code is made publicly available. As part of the proof of the main recovery guarantee new variants of the well known CoSaMP algorithm are proposed which can utilize any sufficiently accurate support identification procedure satisfying a Support Identification Property (SIP) in order to obtain strong sparse approximation guarantees. These new CoSaMP variants are then proven to have both runtime and recovery error behavior which are largely determined by the associated runtime and error behavior of the chosen support identification method. The main theoretical results of the paper are then shown by developing a sublinear-time support identification algorithm for general BOPB sets which is robust to arbitrary additive errors. Using this new support identification method to create a new CoSaMP variant then results in a new robust sublinear-time compressive sensing algorithm for BOPB-compressible functions of many variables.
Locally decodable codes are error-correcting codes that admit efficient decoding algorithms;any bit of the original message can be recovered by looking at only a small number of locations of a corrupted codeword. The ...
详细信息
ISBN:
(纸本)9781450306911
Locally decodable codes are error-correcting codes that admit efficient decoding algorithms;any bit of the original message can be recovered by looking at only a small number of locations of a corrupted codeword. The tradeoff between the rate of a code and the locality/efficiency of its decoding algorithms has been well studied, and it has widely been suspected that nontrivial locality must come at the price of low rate. A particular setting of potential interest in practice is codes of constant rate. For such codes, decoding algorithms with locality O(K-epsilon) were known only for codes of rate exp(1/epsilon), where k is the length of the message. Furthermore, for codes of rate > 1/2, no nontrivial locality has been achieved. In this paper we construct a new family of locally decodable codes that have very efficient local decoding algorithms, and at the same time have rate approaching 1. We show that for every epsilon > 0 and alpha > 0, for infinitely many k, there exists a code C which encodes messages of length k with rate 1 - alpha, and is locally decodable from a constant fraction of errors using O(k(epsilon)) queries and time. The high rate and local decodability are evident even in concrete settings (and not just in asymptotic behavior), giving hope that local decoding techniques may have practical implications. These codes, which we call multiplicity codes, are based on evaluating high degree multivariate polynomials and their derivatives. Multiplicity codes extend traditional multivariate polynomial based codes;they inherit the local-decodability of these codes, and at the same time achieve better tradeoffs and flexibility in their rate and distance.
The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other...
详细信息
ISBN:
(纸本)9781728149523
The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic programming computes the edit distance between two strings of length n in O(n(2)) time, and a more sophisticated algorithm runs in time O(n + t(2)) when the edit distance is t [Landau, Myers and Schmidt, SICOMP 1998]. In pursuit of obtaining faster running time, the last couple of decades have seen a flurry of research on approximating edit distance, including polylogarithmic approximation in near-linear time [Andoni, Krauthgamer and Onak, FOCS 2010], and a constant-factor approximation in subquadratic time [Chakrabarty, Das, Goldenberg, Koucky and Saks, FOCS 2018]. We study sublinear-time algorithms for small edit distance, which was investigated extensively because of its numerous applications. Our main result is an algorithm for distinguishing whether the edit distance is at most t or at least t(2) (the quadratic gap problem) in time (O) over tilde (n/t + t(3)). This time bound is sublinear roughly for all tin [omega(1), o(n(1/3))], which was not known before. The best previous algorithms solve this problem in sublineartime only for t = omega(n(1/3)) [Andoni and Onak, STOC 2009]. Our algorithm is based on a new approach that adaptively switches between uniform sampling and reading contiguous blocks of the input strings. In contrast, all previous algorithms choose which coordinates to query non-adaptively. Moreover, it can be extended to solve the t vs t(2-epsilon) gap problem in time (O) over tilde (n/t(1-epsilon) + t(3)).
The min-sum k-clustering problem is to partition a metric space ( P, d) into k clusters C-1,..., C-k subset of P such that Sigma(k)(i=1) Sigma(p,q is an element of Ci) d(p, q) is minimized. We show the first efficient...
详细信息
The min-sum k-clustering problem is to partition a metric space ( P, d) into k clusters C-1,..., C-k subset of P such that Sigma(k)(i=1) Sigma(p,q is an element of Ci) d(p, q) is minimized. We show the first efficient construction of a coreset for this problem. Our coreset construction is based on a new adaptive sampling algorithm. With our construction of coresets we obtain two main algorithmic results. The first result is a sublinear-time ( 4 + epsilon)-approximation algorithm for the minsum k-clustering problem in metric spaces. The running time of this algorithm is O( n) for any constant k and epsilon, and it is o( n2) for all k = o( log n/ log log n). Since the full description size of the input is Theta(n(2)), this is sublinear in the input size. The fastest previously known o( log n)-factor approximation algorithm for k > 2 achieved a running time of Omega(nk), and no non-trivial o(n(2))-time algorithm was known before. Our second result is the first pass-efficient data streaming algorithm for min-sum k-clustering in the distance oracle model, i.e., an algorithm that uses poly( log n, k) space and makes 2 passes over the input point set, which arrives in form of a data stream in arbitrary order. It computes an implicit representation of a clustering of ( P, d) with cost at most a constant factor larger than that of an optimal partition. Using one further pass, we can assign each point to its corresponding cluster. To develop the coresets, we introduce the concept of alpha-preserving metric embeddings. Such an embedding satisfies properties that the distance between any pair of points does not decrease and the cost of an optimal solution for the considered problem on input ( P, d')is within a constant factor of the optimal solution on input ( P, d). In other words, the goal is to find a metric embedding into a ( structurally simpler) metric space that approximates the original metric up to a factor of a with respect to a given problem. We believe that this con
We consider the problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributions have means that do...
详细信息
We consider the problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributions have means that do not differ by more than some given parameter and should reject collections that are relatively far from having this property. By "far" we mean that it is necessary to modify the distributions in a relatively significant manner (according to some predetermined distance measure) so as to obtain the property. We study this problem in two models. In the first model (the query model) the algorithm may ask for samples from any distribution of its choice, and in the second model (the sampling model) the distributions from which it gets samples are selected randomly. We provide upper and lower bounds in both models. In particular, in the query model, the complexity of the problem is polynomial in 1/is an element of (where is an element of is the given distance parameter), while in the sampling model, the complexity grows roughly as m(1- poly(is an element of)), where m is the number of distributions.
Construction of error-correcting codes achieving a designated minimum distance parameter is a central problem in coding theory. In this work, we study a very simple construction of binary linear codes that correct a g...
详细信息
Construction of error-correcting codes achieving a designated minimum distance parameter is a central problem in coding theory. In this work, we study a very simple construction of binary linear codes that correct a given number of errors K. Moreover, we design a simple, nearly optimal syndrome decoder for the code as well. The running time of the decoder is only logarithmic in the block length of the code and nearly linear in the number of errors K. This decoder can be applied to exact for-all sparse recovery over any field, improving upon previous results with the same number of measurements. Furthermore, computation of the syndrome from a received word can be done in nearly linear time in the block length. We also demonstrate an application of these techniques in nonadaptive group testing and construct simple explicit measurement schemes with O(K2 log2 N) tests and O(K3 log2 N) recovery time for identifying up to K defectives in a population of size N.
暂无评论