The Paper presents an improved algorithm for finding frequent elements over a sliding *** makes no assumption on the distribution of the input items' *** main structure composes of only a few maps and short *** us...
详细信息
The Paper presents an improved algorithm for finding frequent elements over a sliding *** makes no assumption on the distribution of the input items' *** main structure composes of only a few maps and short *** using hash table and other data structure to manage counters,the running time is dramatically *** thorough experiment and logical analysis,the running time has reduced to constant.
We initiate a systematic study of linear sketching over F2. For a given Boolean function treated as f: Fn2 → F2 a randomized F2-sketch is a distribution M over d × n matrices with elements over F2 such that Mx s...
详细信息
ISBN:
(纸本)9783959770699
We initiate a systematic study of linear sketching over F2. For a given Boolean function treated as f: Fn2 → F2 a randomized F2-sketch is a distribution M over d × n matrices with elements over F2 such that Mx suffices for computing f(x) with high probability. Such sketches for d ≪ n can be used to design small-space distributed and streaming *** by these applications we study a connection between F2-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that F2-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree F2-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that F2-sketching is optimal (up to constant factors) for uniformly distributed ***, we show that (non-uniform) streaming algorithms that have to process random updates over F2 can be constructed as F2-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in n and holds for streams of length Õ(n) constructed through uniformly random updates.
This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-...
详细信息
This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a smaller number of passes. In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, the mixing time M, and other related quantities such as the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of root epsilon p+ epsilon in (O) over tilde(root M/alpha) passes and (O) over tilde (min(n alpha + 1/epsilon root M/alpha + (1/epsilon)M alpha, alpha n root M alpha + (1/epsilon)root M/alpha)) space, for any alpha is an element of (0, 1]. Specifically, for epsilon = M/n, alpha = M-1/2, we can compute the approximate PageRank values in (O) over tilde (nM(-1/4)) space and (O) over tilde (M-3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes. We also give an approach to approximate the PageRank values in just (O) over tilde (1) passes although this requires (O) over tilde (nM) space.
In this work, we consider a novel problem of maximizing monotone k-submodular functions under the individual knapsack constraint () over the ground set V, which has been found numerous applications in machine learning...
详细信息
ISBN:
(纸本)9798400708916
In this work, we consider a novel problem of maximizing monotone k-submodular functions under the individual knapsack constraint () over the ground set V, which has been found numerous applications in machine learning, including data summarization and information propagation. We propose an approximation algorithm that has approximation ratio and takes O(nklog (n)/ϵ) query complexity, where ϵ is an input parameter. Alongside theoretical analysis, we conduct extensive experiments on our proposed algorithm via some applications, such as Influence Maximization and Sensor Placement. The experimental results demonstrate that our algorithm gives competitive solution quality with state-of-the-art techniques but significantly reduces the required queries.
One of the issues raised in streaming data is concept drift detection. In fact, the process of concept drift conies from natural tendency events in the real world to change over time. For example, in data receiving fr...
详细信息
One of the issues raised in streaming data is concept drift detection. In fact, the process of concept drift conies from natural tendency events in the real world to change over time. For example, in data receiving from credit card transactions, detect when transactions rise suddenly, can help in identifying the fraud. In this paper regards to the importance of concept drift in streaming data, a solution to accurate diagnosis and timely is presented. This solution is based on ensemble algorithm and "streaming ensemble algorithm" (SEA) algorithm that SEA algorithm is used as one of the most commonly stream algorithms. This approach uses a deep belief network as a basic model in the SEA algorithm. In the method which is presented in this paper, we used the change of classification error on new data for concept drift detection. Analyzing the results shows that the proposed method compared with similar algorithms, in addition to a significant reduction in the runtime, improved F_measure criteria.
Given a large matrix A ∈ ℝn×d, we consider the problem of computing a sketch matrix B ∈ ℝl×d which is significantly smaller than but still well approximates A. We consider the problems in the streaming mod...
详细信息
Given a large matrix A ∈ ℝn×d, we consider the problem of computing a sketch matrix B ∈ ℝl×d which is significantly smaller than but still well approximates A. We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space, and we are interested in minimizing the covariance error ║ATA - BTB║2: The popular Frequent Directions algorithm of Liberty (2013) and its variants achieve optimal space-error tradeoffs. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the question by proving that the time complexity of this problem is equivalent to that of matrix multiplication up to lower order terms. Specifically, we provide new space-optimal algorithms with faster running times and also show that the running times of our algorithms can be improved if and only if the state-of-the-art running time of matrix multiplication can be improved significantly.
We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science...
详细信息
We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under (ε, δ)-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for ϵ-differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under (ε, δ)-differential privacy.
暂无评论