In the big data era, data often comes in the form of streams and fast data stream analysis has recently attracted intensive research interest. Submodular optimization naturally appears in many streaming data applicati...
详细信息
In the big data era, data often comes in the form of streams and fast data stream analysis has recently attracted intensive research interest. Submodular optimization naturally appears in many streaming data applications such as social network influence maximization with the property of diminishing returns. However, in a practical setting, streaming data frequently comes with noises that are small but significant enough to impact the optimality of submodular optimization solutions. Following the framework of differential privacy (DP), this paper considers a streaming model with DP noise that is small by construction. Within this noisy streaming model, the paper strives to address the general problem of submodular maximization with a cardinality constraint. The main theoretical result we obtained is a streaming algorithm that is one-pass and has an approximation guarantee of 1/(2+ (1 + 1/k)(2))(1 + 1/k) - delta for any delta > 0. Finally, we implement the algorithm and evaluate it against several baseline methods. Numerical results support the practical performance of our algorithm across several real datasets. (c) 2022 Published by Elsevier B.V.
Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs an...
详细信息
Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as a "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set S of elements approximately by using fewer bits than a precise representation. The price for succinctness is allowing for some errors: For any x is an element of S, it should always answer Yes, and for any x is not an element of S it should answer Yes only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries, we show that there exists a Bloom filter for sets of size n and error e that is secure against t queries and uses only O(n log 1/ epsilon+ t) bits of memory. In comparison, n log 1/epsilon is the best possible under a non-adaptive adversary.
An address block is defined as a set of continuous addresses between two points in an address space. Counting the number of distinct address blocks that have been accessed during a measurement period can provide usefu...
详细信息
An address block is defined as a set of continuous addresses between two points in an address space. Counting the number of distinct address blocks that have been accessed during a measurement period can provide useful information for cyber security, computer networks, and storage systems. However, this counting problem becomes challenging when addresses are accessed randomly since adjacent addresses must be carefully identified and merged into one block. This study presents a new algorithm that can accurately estimate the number of distinct address blocks where each address access is monitored only once. This new algorithm requires only three counters to keep the numbers of distinct addresses and one-bit truncated addresses, respectively, in two-tier counting architecture. Both time and space complexities are significantly improved because only three counters are required for cardinality estimation instead of traditional hash table or tree data structures. Experimental results show that the new scheme saves more than 50% memory space and runs two times faster than a tree-based existing algorithm;the relative error of estimation is less than 10%.
In this paper, the problem we study is how to maximize a monotone non-submodular function with cardinality constraint. Different from the previous streaming algorithms, this paper mainly considers the sliding window m...
详细信息
In this paper, the problem we study is how to maximize a monotone non-submodular function with cardinality constraint. Different from the previous streaming algorithms, this paper mainly considers the sliding window model. Based on the concept of diminishing-return ratio gamma, we propose a (1/3-gamma(2) - epsilon)-approximation algorithm with the memory O(k log(2)(k Phi 1/gamma/epsilon(2)), where Phi is the ratio between maximum and minimum values of any singleton element of function f. Then, we improve the approximation ratio to (1/2 gamma - epsilon) through the sub-windows at the expense of losing some memory. Our results 2 generalize the corresponding results for the submodular case.
We demonstrate how to evaluate stepwise hedge automata (Shas) with subhedge projection while completely projecting irrelevant subhedges. Since this requires passing finite state information top-down, we introduce the ...
详细信息
We demonstrate how to evaluate stepwise hedge automata (Shas) with subhedge projection while completely projecting irrelevant subhedges. Since this requires passing finite state information top-down, we introduce the notion of downward stepwise hedge automata. We use them to define in-memory and streaming evaluators with complete subhedge projection for Shas. We then tune the evaluators so that they can decide on membership at the earliest time point. We apply our algorithms to the problem of answering regular XPath queries on Xml streams. Our experiments show that complete subhedge projection of Shas can indeed speed up earliest query answering on Xml streams so that it becomes competitive with the best existing streaming tools for XPath queries.
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heav...
详细信息
ISBN:
(纸本)9781450342100
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph *** Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using ~O(n) space, where factors of ε and log(n) are suppressed in the ~O notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using ~O(n) space.
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can ...
详细信息
ISBN:
(纸本)9783959771160
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can process Ω(n2) updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [23] and Ai, Hu, Li and Woodruff [3] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo p for integers p ≥ 2, and to approximation instead of exact computation.
Information on network host connectivity patterns are important for network monitoring and traffic engineering. In this paper, an efficient streaming algorithm is proposed to estimate cardinality distributions includi...
详细信息
ISBN:
(纸本)9781605580050
Information on network host connectivity patterns are important for network monitoring and traffic engineering. In this paper, an efficient streaming algorithm is proposed to estimate cardinality distributions including connectivity distributions, e.g. percent of hosts with any given number of distinct communicating peers or flows.
We ask the question: how can Web sites and data aggregators continually release updated statistics, and meanwhile preserve each individual user's privacy? Suppose we are given a stream of 0's and 1's. We p...
详细信息
We ask the question: how can Web sites and data aggregators continually release updated statistics, and meanwhile preserve each individual user's privacy? Suppose we are given a stream of 0's and 1's. We propose a differentially private continual counter that outputs at every time step the approximate number of 1's seen thus far. Our counter construction has error that is only poly-log in the number of time steps. We can extend the basic counter construction to allow Web sites to continually give top-k and hot items suggestions while preserving users' privacy.
暂无评论