In this paper, we present a new degree-based estimator for the size of maximum matching in bounded arboricity graphs. When the arboricity of the graph is bounded by alpha, the estimator gives a alpha + 2 factor approx...
详细信息
In this paper, we present a new degree-based estimator for the size of maximum matching in bounded arboricity graphs. When the arboricity of the graph is bounded by alpha, the estimator gives a alpha + 2 factor approximation of the matching size. For planar graphs, we show the estimator does better and returns a 3.5 approximation of the matching size. Using this estimator, we get new results for approximating the matching size of planar graphs in the streaming and distributed models of computation. In particular, in the vertex-arrival streams, we get a randomized O (root n/epsilon(2) log n) space algorithm for approximating the matching size of a planar graph on n vertices within (3.5+epsilon) factor. Similarly, we get a simultaneous protocol in the vertex-partition model for approximating the matching size within (3.5 + epsilon) factor using O (n(2/3)/epsilon(2) log n) communication from each player. In comparison with previous works, the estimator in this paper does not need to know the arboricity of the input graph and improves the approximation factor in the case of planar graphs.
Given a graph G = (V, E), for a vertex set S subset of V, let N(S) denote the set of vertices in V that have a neighbor in S. Extending the concept of binding number of graphs by Woodall (1973), for a vertex set X sub...
详细信息
Given a graph G = (V, E), for a vertex set S subset of V, let N(S) denote the set of vertices in V that have a neighbor in S. Extending the concept of binding number of graphs by Woodall (1973), for a vertex set X subset of V, we define the binding number of X, denoted by bind(X), as the maximum number b such that for every S subset of X where N(S) not equal V(G) it holds that vertical bar N(S)vertical bar >= b vertical bar S vertical bar. Given this definition, we prove that if a graph V(G) contains a subset X with bind (X) = 1/k where k is an integer, then G possesses a matching of size at least vertical bar X vertical bar/(k + 1). Using this statement, we derive tight bounds for the estimators of the matching size in planar graphs. These estimators are previously used in designing sublinear space algorithms for approximating the matching size in the datastream model of computation. In particular, we show that the number of locally superior vertices is a 3 factor approximation of the matching size in planar graphs. The previous analysis by Jowhari (2023) proved a 3.5 approximation factor. As another application, we show a simple variant of an estimator by Esfandiari et al. (2015) achieves 3factor approximation of the matching size in planar graphs. Namely, let s be the number of edges with both endpoints having degree at most 2 and let h be the number of vertices with degree at least 3. We prove that when the graph is planar, the size of matching is at least (s + h)/3. This result generalizes a known fact that every planar graph on n vertices with minimum degree 3 has a matching of size at least n/3. (c) 2024 Elsevier B.V. All rights reserved.
stream monitoring is fundamental in many datastream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often...
详细信息
stream monitoring is fundamental in many datastream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's utility. For example, if a video connection has high tail (e.g., 99'th percentile) latency, the perceived quality will suffer, even if the average and median latencies are *** this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, value) tuples, and we wish to track the quantiles for each ID. Existing quantile sketches are designed for a plain number stream (i.e., containing just a value). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them *** first present a couple of simple and effective algorithms that serve as baselines, a sampling approach and a sketching approach. Then, we present SQUAD, an algorithm that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the quantiles of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. The algorithms are rigorously analyzed, and we demonstrate SQUAD's superiority using extensive~simulations on real-world traces.
The notion of L-p sampling, and corresponding algorithms known as L-p samplers, has found a wide range of applications in the design of data stream algorithms and beyond. In this survey, we present some of the core al...
详细信息
The notion of L-p sampling, and corresponding algorithms known as L-p samplers, has found a wide range of applications in the design of data stream algorithms and beyond. In this survey, we present some of the core algorithms to achieve this sampling distribution based on ideas from hashing, sampling, and sketching. We give results for the special cases of insertion-only inputs, lower bounds for the sampling problems, and ways to efficiently sample multiple elements. We describe a range of applications of L-p sampling, drawing on problems across the domain of computer science, from matrix and graph computations, as well as to geometric and vector streaming problems.
A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterize...
详细信息
A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data are given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be extremely useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods and describe efficient implementations. In doing so, we propose new variants that have not been studied before, yet which outperform existing methods. To illustrate this, we provide detailed experimental comparisons demonstrating the trade-offs between space, time, and accuracy for quantile computation.
Given m distributed datastreams A1, . . . ,Am, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over A1, . . . ,Am. We identify a broad class of algorithms ...
详细信息
The pressing need for efficient compression schemes for XML documents has recently been focused on stack computation (Hariharan, S., & Shankar, P. in: Proceedings of the 2006 IEEE data compression conference, p. 4...
详细信息
The pressing need for efficient compression schemes for XML documents has recently been focused on stack computation (Hariharan, S., & Shankar, P. in: Proceedings of the 2006 IEEE data compression conference, p. 453, 2006;League, C., & Eng, K. in: Proceedings of the 2007 IEEE data compression conference, pp. 272-282, 2007), and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal analysis of their performance and a more ambitious use of the stack in XML compression, where so far it is mainly connected to parsing mechanisms. In this paper we introduce the model of pushdown compressor, based on pushdown transducers that compute a single injective function while keeping the widest generality regarding stack computation. We also consider online compression algorithms that use at most polylogarithmic space (plogon). These algorithms correspond to compressors in the datastream model. We compare the performance of these two families of compressors with each other and with the general purpose Lempel-Ziv algorithm. This comparison is made without any a priori assumption on the data's source and considering the asymptotic compression ratio for infinite sequences. We prove that in all cases they are incomparable.
We show the following transformation: any two-party protocol for outputting a (1 + epsilon)-approximation to f(x, y) = Sigma(n)(j=1)g(x(j), y(j)) with probability at least 2/3, for any non-negative efficienty computab...
详细信息
ISBN:
(纸本)9781450306911
We show the following transformation: any two-party protocol for outputting a (1 + epsilon)-approximation to f(x, y) = Sigma(n)(j=1)g(x(j), y(j)) with probability at least 2/3, for any non-negative efficienty computable function g, can be transformed into a two-party private approximation protocol with only a polylogarithmic factor loss in communication, computation, and round complexity. In general it is insufficient to use secure function evaluation or fully homomorphic encryption on a standard, non-private protocol for approximating f. This is because the approximation may reveal information about x and y that does not follow from f(x,y). Applying our transformation and variations of it, we obtain near-optimal private approximation protocols for a wide range of problems in the datastream literature for which previously nothing was known. We give near-optimal private approximation protocols for the l(p)-distance for every p >= 0, for the heavy hitters and importance sampling problems with respect to any l(p)-norm, for the max-dominance and other dominant l(p)-norms, for the distinct summation problem, for entropy, for cascaded frequency moments, for subspace approximation and block sampling, and for measuring independence of datasets. Using a result for datastreams, we obtain private approximation protocols with polylogarithmic communication for every non-decreasing and symmetric function g(x(j), y(j)) = h(x(j) - y(j)) with at most quadratic growth. If the original (non-private) protocol is a simultaneous protocol, e.g., a sketching algorithm, then our only cryptographic assumption is efficient symmetric computationally-private information retrieval;otherwise it is fully homomorphic encryption. For all but one of these problems, the original protocol is a sketching algorithm. Our protocols generalize straightforwardly to more than two parties.
We show there is a distribution over linear mappings R : l(1)(n) -> l(1)(O(d log d)) such that with arbitrarily large constant probability, for any fixed d-dimensional subspace L, for all x is an element of L we ha...
详细信息
ISBN:
(纸本)9781450306911
We show there is a distribution over linear mappings R : l(1)(n) -> l(1)(O(d log d)) such that with arbitrarily large constant probability, for any fixed d-dimensional subspace L, for all x is an element of L we have parallel to x parallel to(1) <= parallel to Rx parallel to(1) = O(d log d)parallel to x parallel to(1). This provides the first analogue of the ubiquitous subspace Johnson-Lindenstrauss embedding for the l(1)-norm. Importantly, the target dimension and distortion are independent of the ambient dimension n. We give several applications of this result. First, we give a faster algorithm for computing well-conditioned bases. Our algorithm is simple, avoiding the linear programming machinery required of previous algorithms. We also give faster algorithms for least absolute deviation regression and l(1)-norm best fit hyperplane problems, as well as the first single pass streaming algorithms with low space for these problems. These results are motivated by practical problems in image analysis, spam detection, and statistics, where the l(1)-norm is used in studies where outliers may be safely and effectively ignored. This is because the l(1)-norm is more robust to outliers than the l(2)-norm.
We give a space-optimal streaming algorithm with update time O(log(2)(1/epsilon) log log(1/epsilon)) for approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a datastream up to a f...
详细信息
ISBN:
(纸本)9781450306911
We give a space-optimal streaming algorithm with update time O(log(2)(1/epsilon) log log(1/epsilon)) for approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a datastream up to a factor of 1 +/- epsilon. This provides a nearly exponential improvement over the previous space-optimal algorithm of [Kane-Nelson-Woodruff, SODA 2010], which had update time Omega(1/epsilon(2)). When combined with the work of [Harvey-Nelson-Onak, FOCS 2008], we also obtain the first algorithm for entropy estimation in turnstile streams which simultaneously achieves near-optimal space and fast update time.
暂无评论