Submodular maximization is a significant area of interest in combinatorial optimization. It has various real-world applications. In recent years, streaming algorithms for submodular maximization have gained attention,...
详细信息
Submodular maximization is a significant area of interest in combinatorial optimization. It has various real-world applications. In recent years, streaming algorithms for submodular maximization have gained attention, allowing realtime processing of large data sets by examining each piece of data only once. However, most of the current state-of-the-art algorithms are only applicable to monotone submodular maximization. There are still significant gaps in the approximation ratios between monotone and non-monotone objective functions. In this paper, we propose a streaming algorithm framework for non-monotone submodular maximization and use this framework to design deterministic streaming algorithms for the d-knapsack constraint and the knapsack constraint. Our 1-pass streaming algorithm for the d-knapsack constraint has a 1/4(d+1)-& varepsilon;approximation ratio, using O((B) over tilde log (B) over tilde/& varepsilon;) memory, and O(log (B) over tilde & varepsilon;) query time per element, where (B) over tilde = min(n,b) is the maximum number of elements that the knapsack can store. As a special case of the d-knapsack constraint, we have the 1-pass streaming algorithm with a 1/8 - & varepsilon;approximation ratio to the knapsack constraint. To our knowledge, there is currently no streaming algorithm for this constraint when the objective function is non-monotone, even when d = 1. In addition, we propose a multi-pass streaming algorithm with 1/6 - & varepsilon;approximation, which stores O((B) over tilde )elements.
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases and machine learning. MinHash is a well-known technique for approximating Jaccard similarity of sets and ...
详细信息
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases and machine learning. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications. Its two compressed versions, b-bit MinHash and Odd Sketch, can significantly reduce the memory usage of the MinHash, especially for estimating high similarities (i.e., similarities around 1). Although MinHash can be applied to static sets as well as streaming sets, of which elements are given in a streaming fashion, unfortunately, b-bit MinHash and Odd Sketch fail to deal with streaming data. To solve this problem, we previously designed a memory-efficient sketch method, MaxLogHash, to accurately estimate Jaccard similarities in streaming sets. Compared with MinHash, our method uses smaller sized registers (each register consists of less than 7 bits) to build a compact sketch for each set. In this paper, we further develop a faster method, MaxLogOPH++. Compared with MaxLogHash, MaxLogOPH++ reduces the time complexity for updating each coming element from O(k) to O(1) with a small additional memory. We conduct experiments on a variety of datasets, and experimental results demonstrate the efficiency and effectiveness of our methods.
We give the first treatment of the classic independent set problem in graphs and hypergraphs in the streaming setting. The objective is to find space-efficient algorithms that output independent sets that are "co...
详细信息
We give the first treatment of the classic independent set problem in graphs and hypergraphs in the streaming setting. The objective is to find space-efficient algorithms that output independent sets that are "combinatorially optimal", that is, with size guarantee in terms of the degree sequence alone. Our main result is a randomized algorithm that achieves this using space in bits that is linear in the number of vertices. We use this to examine assumptions about the streaming model, and advocate the study of output-efficient algorithms that measure space usage relative to the size of the output solution. In that sense, our main algorithm uses space linear in the output size. We also examine algorithms that use little or no space in addition to the bits storing the output. Our algorithms fall also into an online streaming model, where output-changes can go only in one direction. In particular a feasible solution must be maintained at all times, and items that are removed from the solution can never reenter. We obtain tight bounds on deterministic algorithms for independent sets in graphs in that model.
Subspace analysis (SA) is a widely used technique for coping with high-dimensional data and is becoming a fundamental step in the early treatment of many signal-processing tasks. However, traditional SA often requires...
详细信息
Subspace analysis (SA) is a widely used technique for coping with high-dimensional data and is becoming a fundamental step in the early treatment of many signal-processing tasks. However, traditional SA often requires a large amount of memory and computational resources, as it is equivalent to eigenspace determination. To address this issue, specialized streaming algorithms have been developed, allowing SA to be run on low-power devices, such as sensors or edge devices. Here, we present a classification and a comparison of these methods by providing a consistent description and highlighting their features and similarities. We also evaluate their performance in the task of subspace identification with a focus on computational complexity and memory footprint for different signal dimensions. Additionally, we test the implementation of these algorithms on common hardware platforms typically employed for sensors and edge devices.
We present (single-pass) streaming algorithms for maintaining extent measures of a stream S of n points in . We focus on designing streaming algorithms whose working space is polynomial in d (poly(d)) and sub-linear i...
详细信息
We present (single-pass) streaming algorithms for maintaining extent measures of a stream S of n points in . We focus on designing streaming algorithms whose working space is polynomial in d (poly(d)) and sub-linear in n. For the problems of computing diameter, width and minimum enclosing ball of S, we obtain lower bounds on the worst-case approximation ratio of any streaming algorithm that uses poly(d) space. On the positive side, we introduce the notion of blurred ball cover and use it for answering approximate farthest-point queries and maintaining approximate minimum enclosing ball and diameter of S. We describe a streaming algorithm for maintaining a blurred ball cover whose working space is linear in d and independent of n.
Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional onl...
详细信息
Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic (1 + epsilon)-approximation with (O) over tilde (1/epsilon), where (O) over tilde hides logarithmic factors. Moreover, such a space bound is essentially optimal. Our algorithm implies a streaming (d + epsilon)-approximation for VECTOR BIN PACKING in d dimensions, running in space (O) over tilde (d/epsilon). For the related VECTOR SCHEDULING problem, we show how to construct an input summary in space (O) over tilde (d(2) . m/epsilon(2)) that preserves the optimum value up to a factor of 2 - 1/m + epsilon, where m is the number of identical machines.
The study of non-submodular maximization on the integer lattice is an important extension of submodular optimization. In this paper, streaming algorithms for maximizing non -negative monotone non-submodular functions ...
详细信息
The study of non-submodular maximization on the integer lattice is an important extension of submodular optimization. In this paper, streaming algorithms for maximizing non -negative monotone non-submodular functions with knapsack constraint on integer lattice are considered. We first design a two-pass streamingKnapsack algorithm combining with BinarySearch as a subroutine for this problem. By introducing the DR ratio gamma and the weak DR ratio gamma w of the non-submodular objective function, we obtain that the approximation ratio is min{gamma 2(1 - epsilon)/2 gamma +1, 1 - 1/gamma w2 gamma - epsilon}, the total memory complexity is O (K log K /epsilon), and the total query complexity for each element is O (log K log(K/epsilon 2)/epsilon). Then, we design a one-pass streaming algorithm by dynamically updating the maximal function value among unit vectors along with the currently arriving element. Finally, in order to decrease the memory complexity, we design an improved streamingKnapsack algorithm and reduce the memory complexity to O (K /epsilon 2).(c) 2022 Elsevier B.V. All rights reserved.
We consider a generalization of the Steiner tree problem, the Steiner forest problem, in the Euclidean plane: the input is a multiset X subset of R2, partitioned into k color classes C1,. .. , C k subset of X . The go...
详细信息
We consider a generalization of the Steiner tree problem, the Steiner forest problem, in the Euclidean plane: the input is a multiset X subset of R2, partitioned into k color classes C1,. .. , C k subset of X . The goal is to find a minimum-cost Euclidean graph G such that every color class C is connected in G . We study this Steiner forest problem in the streaming setting, where the stream consists of insertions and deletions of points to X . Each input point x is an element of X arrives with its color color(x) is an element of [k], and as usual for dynamic geometric streams, the input is restricted to the discrete grid {1, ... , Delta }2. We design a single-pass streaming algorithm that uses poly(k log Delta) space and time, and estimates the cost of an optimal Steiner forest solution within ratio arbitrarily close to the famous Euclidean Steiner ratio a 2 (currently 1.1547 <= a 2 <= 1.214). This approximation guarantee matches the state-of-the-art bound for streaming Steiner tree, i.e., when k = 1, and it is a major open question to improve the ratio to 1 + & even for this special case. Our approach relies on a novel combination of streaming techniques, like sampling and linear sketching, with the classical Arora-style dynamic-programming framework for geometric optimization problems, which usually requires large memory and so far has not been applied in the streaming setting. We complement our streaming algorithm for the Steiner forest problem with simple arguments showing that any finite multiplicative approximation requires Omega (k) bits of space.
We consider the problem of estimating the size of a maximum matching when the edges are revealed in a streaming fashion. When the input graph is planar, we present a simple and elegant streaming algorithm that, with h...
详细信息
We consider the problem of estimating the size of a maximum matching when the edges are revealed in a streaming fashion. When the input graph is planar, we present a simple and elegant streaming algorithm that, with high probability, estimates the size of a maximum matching within a constant factor using (O) over tilde (n(2/3)) space, where n is the number of vertices. The approach generalizes to the family of graphs that have bounded arboricity, which include graphs with an excluded constant-size minor. To the best of our knowledge, this is the first result for estimating the size of a maximum matching in the adversarial-order streaming model (as opposed to the random-order streaming model) in o(n) space. We circumvent the barriers inherent in the adversarial-order model by exploiting several structural properties of planar graphs, and more generally, graphs with bounded arboricity. We further reduce the required memory size to (O) over tilde(root n) for three restricted settings: (i) when the input graph is a forest;(ii) when we have 2-passes and the input graph has bounded arboricity;and (iii) when the edges arrive in random order and the input graph has bounded arboricity. Finally, we design a reduction from the Boolean Hidden Matching Problem to show that there is no randomized streaming algorithm that estimates the size of the maximum matching to within a factor better than 3/2 and uses only o(n(1/2)) bits of space. Using the same reduction, we show that there is no deterministic algorithm that computes this kind of estimate in o(n) bits of space. The lower bounds hold even for graphs that are collections of paths of constant length.
We study the following variant of the well-known line-simplification problem: we are getting a (possibly infinite) sequence of points p (0),p (1),p (2),aEuro broken vertical bar in the plane defining a polygonal path,...
详细信息
We study the following variant of the well-known line-simplification problem: we are getting a (possibly infinite) sequence of points p (0),p (1),p (2),aEuro broken vertical bar in the plane defining a polygonal path, and as we receive the points, we wish to maintain a simplification of the path seen so far. We study this problem in a streaming setting, where we only have a limited amount of storage, so that we cannot store all the points. We analyze the competitive ratio of our algorithms, allowing resource augmentation: we let our algorithm maintain a simplification with 2k (internal) points and compare the error of our simplification to the error of the optimal simplification with k points. We obtain the algorithms with O(1) competitive ratio for three cases: convex paths, where the error is measured using the Hausdorff distance (or Fr,chet distance), xy-monotone paths, where the error is measured using the Hausdorff distance (or Fr,chet distance), and general paths, where the error is measured using the Fr,chet distance. In the first case the algorithm needs O(k) additional storage, and in the latter two cases the algorithm needs O(k (2)) additional storage.
暂无评论