We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-...
详细信息
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its a""th subword complexity , for small a""). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.
We study the complexity of local graph-centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performin...
详细信息
We study the complexity of local graph-centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, which we apply to PageRank and Heat Kernel, for constructing a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of n nodes and m arcs, with probability (1 -delta) computes a multiplicative (1 +/-epsilon)-approximation of its score by examining only O(min(n(1/2)Delta(1/2),n(1/2)m(1/4))) nodes/arcs, where Delta is the maximum outdegree of the graph and poly(epsilon (-1)) and polylog(delta(-1)) factors are omitted for readability. A similar bound holds for computational cost. We also prove a lower bound of Omega (min(n(1/2)Delta(1/2), n(1/3)m(1/3))) for both query complexity and computational complexity. Moreover, in the jump-and-crawl graph -access model, our technique yields a O(min(n(1/2)Delta(1/2), n(2/3)))-queries algorithm;we show that this algorithm is optimal up to a logarithmic factor-in fact, sublogarithmic in the case of PageRank. These are the first algorithms with sublinear worst-case bounds for general directed graphs and any choice of the target node.
We consider standard T-interval dynamic networks, under the synchronous timing model and the broadcast CONGEST model. In a T-interval dynamic network, the set of nodes is always fixed and there are no node failures. T...
详细信息
We consider standard T-interval dynamic networks, under the synchronous timing model and the broadcast CONGEST model. In a T-interval dynamic network, the set of nodes is always fixed and there are no node failures. The edges in the network are always undirected, but the set of edges in the topology may change arbitrarily from round to round, as determined by some adversary and subject to the following constraint: For every T consecutive rounds, the topologies in those rounds must contain a common connected spanning subgraph. Let H-r to be the maximum (in terms of number of edges) such subgraph for round r through r + T - 1. We define the backbone diameter d of a T-interval dynamic network to be the maximum diameter of all such H-r's, for r >= 1. We use n to denote the number of nodes in the network. Within such a context, we consider a range of fundamental distributed computing problems including COUNT/MAX/MEDIAN/SUM/LEADERELECT/CONSENSUS/CONFIRMEDFLOOD. Existing algorithms for these problems all have time complexity of Omega(n) rounds, even for T = infinity and even when d is as small as O(1). This paper presents a novel approach/framework, based on the idea of massively parallel aggregation. Following this approach, we develop a novel deterministic COUNT algorithm with O(d(3)log(2) n) complexity, for T-interval dynamic networks with T >= c center dot d(2) log(2) n. Here c is a (sufficiently large) constant independent of d, n, and T. To our knowledge, our algorithm is the very first such algorithm whose complexity does not contain a Theta(n) term. This paper further develops novel algorithms for solving MAX/MEDIAN/SUM/LEADERELECT/CONSENSUS/CONFIRMEDFLOOD, while incurring O(d(3) polylog (n)) complexity. Again, for all these problems, our algorithms are the first ones whose time complexity does not contain a Theta(n) term.
We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performin...
详细信息
ISBN:
(纸本)9781538642306
We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of m arcs, with probability (1 - delta) computes a multiplicative (1 +/- epsilon)-approximation of its score by examining only (O) over tilde (min(m(2/3)Delta(1/3)d(-2/3), m(4/5) d(-3/5))) nodes/arcs, where Delta and d are respectively the maximum and average outdegree of the graph (omitting for readability poly(epsilon(-1)) and polylog(delta(-1)) factors). A similar bound holds for computational cost. We also prove a lower bound of Omega(min(m(1/2)Delta(1/2)d(-1/2), m(2/3)d(-1/3))) for both query complexity and computational complexity. Moreover, our technique yields a (O) over tilde (n(2/3))-queries algorithm for an n-node graph in the access model of Brautbar et al. [1], widely used in social network mining;we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node.
We consider standard T-interval dynamic networks, under the synchronous timing model and the broadcast CONGEST model. In a T-interval dynamic network, the set of nodes is always fixed and there are no node failures. T...
详细信息
ISBN:
(纸本)9781450369350
We consider standard T-interval dynamic networks, under the synchronous timing model and the broadcast CONGEST model. In a T-interval dynamic network, the set of nodes is always fixed and there are no node failures. The edges in the network are always undirected, but the set of edges in the topology may change arbitrarily from round to round, as determined by some adversary and subject to the following constraint: For every T consecutive rounds, the topologies in those rounds must contain a common connected spanning subgraph. Let H-r to be the maximum (in terms of number of edges) such subgraph for round r through r + T - 1. We define the backbone diameter d of a T-interval dynamic network to be the maximum diameter of all such H-r's, for r >= 1. We use n to denote the number of nodes in the network. Within such a context, we consider a range of fundamental distributed computing problems including CouNT/MAx/MEDIAN/Sum/LEADERELECT/CONSENSUS/CONFIRMEDFLOOD. Existing algorithms for these problems all have time complexity of Omega(n) rounds, even for T = infinity and even when d is as small as O(1). This paper presents a novel O (d(3) log(2) n) deterministic algorithm for computing COUNT, for T-interval dynamic networks with T >= c . d(2) log(2) n. Here c is a (sufficiently large) constant independent of d, n, and T. To our knowledge, our algorithm is the very first such algorithm whose complexity does not contain a Theta(n) term. For d = O(n(a)) with constant a < 1/3, our deterministic algorithm has o(n) complexity, which is better than all (both randomized and deterministic) existing COUNT algorithms in this setting. For d = O(polylog(n)), our algorithm is exponentially faster. Following the framework of our COUNT algorithm, this paper further develops novel algorithms for solving MAX/MEDIAN/SUM/LEADERELECT/CONSENSUS/CONFIRMEDFLOOD, while incurring either O (d(3) log(2) n) or O(d(3) log(3) n) complexity. Again, for all these problems, our algorithms are the first ones
We consider semidefinite optimization in a saddle point formulation where the primal solution is in the spectrahedron and the dual solution is a distribution over affine functions. We present an approximation algorith...
详细信息
We consider semidefinite optimization in a saddle point formulation where the primal solution is in the spectrahedron and the dual solution is a distribution over affine functions. We present an approximation algorithm for this problem that runs in sublinear time in the size of the data. To the best of our knowledge, this is the first algorithm to achieve this. Our algorithm is also guaranteed to produce low-rank solutions. We further prove lower bounds on the running time of any algorithm for this problem, showing that certain terms in the running time of our algorithm cannot be further improved. Finally, we consider a non-affine version of the saddle point problem and give an algorithm that under certain assumptions runs in sublinear time.
sublinear time algorithms for approximating maximum matching size have long been studied. Much of the progress over the last two decades on this problem has been on the algorithmic side. For instance, an algorithm of ...
详细信息
ISBN:
(纸本)9781450399135
sublinear time algorithms for approximating maximum matching size have long been studied. Much of the progress over the last two decades on this problem has been on the algorithmic side. For instance, an algorithm of [Behnezhad;FOCS'21] obtains a 1/2-approximation in (O) over tilde (n) time for..-vertex graphs. A more recent algorithm by [Behnezhad, Roghani, Rubinstein, and Saberi;SODA'23] obtains a slightly-better-than-1/2 approximation in O(n(1+epsilon)) time (for arbitrarily small constant epsilon > 0). On the lower bound side, [Parnas and Ron;TCS'07] showed 15 years ago that obtaining any constant approximation of maximum matching size requires Omega(n) time. Proving any super-linear in.. lower bound, even for (1-epsilon)-approximations, has remained elusive since then. In this paper, we prove the first super-linear in.. lower bound for this problem. We show that at least n(1.2)-(o(1)) queries in the adjacency list model are needed for obtaining a (2/3 +Omega(1))-approximation of the maximum matching size. This holds even if the graph is bipartite and is promised to have a matching of size Theta(n). Our lower bound argument builds on techniques such as correlation decay that to our knowledge have not been used before in proving sublinear time lower bounds. We complement our lower bound by presenting two algorithms that run in strongly sublinear time of n(2-Omega(1)). The first algorithm achieves a (2/3 - epsilon)-approximation (for any arbitrarily small constant epsilon > 0);this significantly improves prior close-to-1/2 approximations. Our second algorithm obtains an even better approximation factor of (23 + Omega(1)) for bipartite graphs. This breaks 2/3-approximation which has been a barrier in various settings of the matching problem, and importantly shows that our n(1.2-o(1)) time lower bound for (2/3 + Omega(1))-approximations cannot be improved all the way to n(2-o(1)).
In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give a greedy algorithm that ...
详细信息
ISBN:
(纸本)9781728196213
In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give a greedy algorithm that distinguishes in time (O) over tilde (n/k + k(2)) between length-n input strings with edit distance at most k and those with edit distance more than 4k(2). This is an improvement and a simplification upon the main result of [Goldenberg, Krauthgamer, Saha, FOCS 2019], where the k vs Theta(k(2)) gap edit distance problem is solved in (O) over tilde (n/k + k(3)) time. We further generalize our result to solve the k vs alpha k gap edit distance problem in time (O) over tilde (n/alpha + k(2) + k/alpha root nk), strictly improving upon the previously known bound (O) over tilde (n/alpha + k(3)). Finally, we show that if the input strings do not have long highly periodic substrings, then the gap edit distance problem can be solved in sublinear time within any factor alpha > 1. Specifically, if the strings contain no substring of length l with the shortest period of length at most 2k, then the k vs (1 + epsilon)k gap edit distance problem can be solved in time (O) over tilde (n/epsilon(2)k + k(2)l). We further give the first sublinear-time algorithm for the probabilistic embedding of edit distance to Hamming distance. Our (O) over tilde (n/p)-time procedure yields an embedding with distortion k(2)p, where k is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between k-p+1/p and k(2) with good probability. This generalizes the linear-time embedding of [Chakraborty, Goldenberg, Koucky, STOC 2016], where the resultant Hamming distance is between k and k(2). Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.
We initiate an investigation of sublinear algorithms for geometric problems in two and three dimensions. We give optimal algorithms for intersection detection of convex polygons and polyhedra, point location in two-di...
详细信息
ISBN:
(纸本)9781581136746
We initiate an investigation of sublinear algorithms for geometric problems in two and three dimensions. We give optimal algorithms for intersection detection of convex polygons and polyhedra, point location in two-dimensional Delaunay triangulations and Voronoi diagrams, and ray shooting in convex polyhedra, all of which run in time O(√n), where n is the size of the input. We also provide sublinear solutions for the approximate evaluation of the volume of a convex polytope and the length of the shortest path between two points on the boundary.
We provide a combinatorial characterization of all testable properties of k-uniform hypergraphs (k-graphs for short). Here, a k-graph property P is testable if there is a randomized algorithm which makes a bounded num...
详细信息
暂无评论