In this paper, we survey algorithms for sparse recovery problems that are based on sparse random matrices. Such matrices has several attractive properties: they support algorithms with low computational complexity, an...
详细信息
In this paper, we survey algorithms for sparse recovery problems that are based on sparse random matrices. Such matrices has several attractive properties: they support algorithms with low computational complexity, and make it easy to perform incremental updates to signals. We discuss applications to several areas, including compressive sensing, data stream computing, and group testing.
We consider online algorithms. Typically, the model is investigated with respect to the competitive ratio. In this paper, we explore two-way automata and one-way automata as models for online algorithms. We focus on q...
详细信息
We consider online algorithms. Typically, the model is investigated with respect to the competitive ratio. In this paper, we explore two-way automata and one-way automata as models for online algorithms. We focus on quantum and classical online algorithms. We show that there are problems that can be solved more efficiently by two-way automata with quantum and classical states than by classical two-way automata in the case of sublogarithmic memory (sublinear size), even if classical automata get advice bits. Additionally, we show that there are problems that can be solved more efficiently by oneway quantum automata than by classical one-way automata in the case of sublogarithmic memory (resp., sublinear size) and in the case of logarithmic memory (resp., linear size) even if classical automata get advice bits. (C) 2022 Elsevier B.V. All rights reserved.
We present a low-constant approximation for the metric k-median problem on insertiononly streams using O(epsilon(-3)k log n) space. In particular, we present a streaming (O(epsilon(-3)k log n), 2+epsilon)-bicriterion ...
详细信息
We present a low-constant approximation for the metric k-median problem on insertiononly streams using O(epsilon(-3)k log n) space. In particular, we present a streaming (O(epsilon(-3)k log n), 2+epsilon)-bicriterion solution that reports cluster weights. Running the offline approximation algorithm due to Byrka et al. (2015) on this bicriterion solution yields a (17.66 + epsilon)-approximation (Guha et al., 2003;Charikar et al., 2003;Braverman et al., 2011). Our result matches the best-known space requirements for streaming k-median clustering while significantly improving the approximation accuracy. We also provide a lower bound, showing that any polylog(n)-space streaming algorithm that maintains an (alpha, beta)-bicriterion must have beta >= 2. Our technique breaks the stream into segments defined by jumps in the optimal clustering cost, which increases monotonically as the stream progresses. By storing an accurate summary of recent segments of the stream and a lower-space summary of older segments, our algorithm maintains a (O(epsilon(-3)k log n), 2 + epsilon)-bicriterion solution for the entirety of the stream. In addition to our main result, we introduce a novel construction that we call a candidate set. This is a collection of points that, with high probability, contains k points that yield a near-optimal k-median cost. We present an algorithm called monotone faraway sampling (MFS) for constructing a candidate set in a single pass over a data stream. We show that using this candidate set in tandem with a coreset speeds up the search for a solution set of k cluster centers upon termination of the data stream. While coresets of smaller asymptotic size are known, comparative simplicity of MFS makes it appealing as a practical technique. (c) 2021 Elsevier B.V. All rights reserved.
Consider a stream of d-dimensional rows (points in R-d) arriving sequentially. An epsilon-coreset is a positively weighted subset that approximates their sum of squared distances to any linear subspace of R-d, up to a...
详细信息
Consider a stream of d-dimensional rows (points in R-d) arriving sequentially. An epsilon-coreset is a positively weighted subset that approximates their sum of squared distances to any linear subspace of R-d, up to a 1 +/- epsilon factor. Unlike other data summarizations, such a coreset: (1) can be used to minimize faster any optimization function that uses this sum, such as regularized or constrained regression, (2) preserves input sparsity;(3) easily interpretable;(4) avoids numerical errors;(5) applies to problems with constraints on the input, such as subspaces that are spanned by few input points. Our main result is the first algorithm that returns such an epsilon-coreset using finite and constant memory during the streaming, i.e., independent of n, the number of rows seen so far. The coreset consists of O(d log(2) d/epsilon(2)) weighted rows, which is nearly optimal according to existing lower bounds of Omega(d/epsilon(2)). We support our findings with experiments on theWikipedia dataset benchmarked against state-of-the-art algorithms.
Sorting is a classic problem and one to which many others reduce easily. In the streaming model, however, we are allowed only one pass over the input and sublinear memory, so in general we cannot sort. In this paper w...
详细信息
Sorting is a classic problem and one to which many others reduce easily. In the streaming model, however, we are allowed only one pass over the input and sublinear memory, so in general we cannot sort. In this paper we show that, to determine the sorted order of a multiset s of size If containing sigma >= 2 distinct elements using one pass and o(n log sigma) bits of memory, it is generally necessary and sufficient that its entropy H = o(log sigma). Specifically, if S = {S-1.....S-n) and S-i1.....S-in is the stable sort of s, then we can compute i(1).....i(n) in one pass using O((H + 1)n) time and O(Hn) bits of memory, with a simple combination of classic techniques. On the other hand, in the worst case it takes that much memory to compute any sorted ordering of s in one pass. (C) 2008 Elsevier B.V. All rights reserved.
Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis [5] presented an algorithm for weighted sampling without replacement from data streams....
详细信息
Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis [5] presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0,1]. Cohen and Kaplan [3] used similar methods for their bottom-k sketches. Efraimidis and Spirakis ask as an open question whether using finite precision arithmetic impacts the accuracy of their algorithm. In this paper we show a method to avoid this problem by providing a precise reduction from k-sampling without replacement to k-sampling with replacement. We call the resulting method Cascade Sampling. (C) 2015 Published by Elsevier B.V.
Consider an incoming sequence of vectors, all belonging to an unknown subspace S, and each with many missing entries. In order to estimate S, it is common to partition the data into blocks and iteratively update the e...
详细信息
Consider an incoming sequence of vectors, all belonging to an unknown subspace S, and each with many missing entries. In order to estimate S, it is common to partition the data into blocks and iteratively update the estimate of S with each new incoming measurement block. In this letter, we investigate a rather basic question: Is it possible to identify S by averaging the range of the partially observed incoming measurement blocks on the Grassmannian? We show that, in general, the span of the incoming blocks is in fact a biased estimator of S when data suffer from erasures, and we find an upper bound for this bias. We reach this conclusion by examining the defining optimization program for the Frechet expectation on the Grassmannian, and with the aid of a sharp perturbation bound and standard large deviation results.
Tools that generate informative and efficient statistical summaries of nodes' activities in a given network have become crucial for robust behavioral anomaly detection. Yet, addressing network abnormalities and th...
详细信息
Tools that generate informative and efficient statistical summaries of nodes' activities in a given network have become crucial for robust behavioral anomaly detection. Yet, addressing network abnormalities and threats should not be done at the expense of users' privacy. In this study we illustrate the use of SKETURE, a packet analysis tool leveraging a sketch-based architecture, in summarizing the behavior of nodes in a real campus network for a whole month, without breaching users' privacy. Moreover, we share some insights into this network which were compiled using SKETURE.
More and more applications require real-time processing of massive, dynamically generated, ordered data;order is an essential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet...
详细信息
More and more applications require real-time processing of massive, dynamically generated, ordered data;order is an essential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet the needs of such applications, as they are not equipped with the appropriate instruments for answering queries over massive, highly dynamic, ordered data sets. In this vision paper, we argue that some data management techniques should be exported to the context of semantic technologies, by integrating ordering with reasoning, and by using methods which are inspired by stream and rank-aware data management. We systematically explore the problem space, and point both to problems which have been successfully approached and to problems which still need fundamental research, in an attempt to stimulate and guide a paradigm shift in semantic technologies.
We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where...
详细信息
We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is massive and cannot fit into memory and thus can only be scanned a few times. Our algorithm can compute an approximate value of the optimal total completion time in one pass and output the schedule with the approximate value in two passes.(c) 2023 Elsevier B.V. All rights reserved.
暂无评论