检索结果-内蒙古大学图书馆

56th Annual ACM Symposium on Theory of Computing (STOC)

作者： Braverman, Mark Garg, Sumegha Li, Qian Wang, Shuo Woodruff, David P. Zhang, Jiapeng Princeton Univ Princeton NJ 08544 USA Rutgers State Univ Piscataway NJ USA Shenzhen Res Inst Big Data Shenzhen lnternat Ctr Ind & Appl Math Shenzhen Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China Carnegie Mellon Univ Pittsburgh PA USA Univ Southern Calif Los Angeles CA USA

ISBN: (纸本)9798400703836

We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams: (1) In the coin problem, one sees a stream of = i.i.d. uniformly random bits and one would like to compute the majority with constant advantage. We show that any constant-pass algorithm must use Omega(log n) bits of memory, significantly extending an earlier Omega(log n) bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first Omega(log n) bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. (2) In the needle problem, one either sees a stream of = i.i.d. uniform samples from a domain [C], or there is a randomly chosen "needle" alpha is an element of[l] for which each item independently is chosen to equal alpha with probability p, and is otherwise uniformly random in [C]. The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every p < 1/root n log(3) n, resolving an open question of Lovett and Zhang (FOCS, 2023);even for 1-pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for l(p)-norm estimation, l(p)-point query and heavy hitters, and compressed sensing problems.

关键词： Information Complexity streaming algorithms

来源：评论

学校读者我要写书评

暂无评论

Online Learning From Evolving Feature Spaces With Deep Variational Models

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2024年第8期36卷 4144-4162页

作者： Lian, Heng Wu, Di Hou, Bo-Jian Wu, Jian He, Yi Old Dominion Univ Dept Comp Sci Norfolk VA 23529 USA Southwest Univ Coll Comp & Informat Sci Chongqing 400715 Peoples R China Univ Penn Ctr Biomed Image Comp & Analyt Philadelphia PA 19104 USA

In this paper, we explore a novel online learning setting, where the online learners are presented with "doubly-streaming" data. Namely, the data instances constantly streaming in are described by feature spaces that over-time evolve, with new features emerging and old features fading away. The main challenge of this problem lies in the fact that the newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A seemingly plausible idea to overcome the challenge is to establish a relationship between the old and new feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional feature spaces that entail very complex feature interplay. Specifically. a tradeoff between onlineness, which biases shallow learners, and expressiveness, which requires deep models, is inevitable. Motivated by this, we propose a novel paradigm, named Online Learning Deep models from Data of Double Streams ((OLDS)-S-3), where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building an intermediate feature mapping relationship. A key trait of (OLDS)-S-3 is to treat the model capacityas a learnable semantics, aiming to yield optimal model depth and parameters jointly in accordance with the complexity and non-linearity of the input data streams in an online fashion. To ablate its efficacy and applicability, two variants of (OLDS)-S-3 are proposed namely, OLD-Linear that learns the relationship by a linear function;and OLD-FD learns that two consecutive feature spaces pre-and-post evolution with fixed deep depth. Besides, instead of re-starting the entire learning process from scratch, (OLDS)-S-3 learns multiple newly emerging feature spaces in a lifelong manner, retaining the knowledge from the learned and vanished feature space t

关键词： Data models Streams Convergence Training Adaptation models Data mining Image reconstruction Data streams online learning streaming algorithms open feature spaces

来源：评论

学校读者我要写书评

暂无评论

GRAPHZEPPELIN: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)

引用

ACM TRANSACTIONS ON DATABASE SYSTEMS 2024年第3期49卷 1-31页

作者： Tench, David West, Evan Zhang, Victor Bender, Michael A. Chowdhury, Abiyaz Delayo, Daniel Dellas, J. Ahmed Farach-colton, Martin Seip, Tyler Zhang, Kenny Rutgers State Univ 57 US Highway 1 New Brunswick NJ 08901 USA SUNY Stony Brook 100 Nicolls Rd Stony Brook NY 11794 USA MongoDB 1633 Broad Way New York NY 10019 USA

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GRAPHZEPPELIN, uses new linear sketching data structures (CUBESKETCH) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GRAPHZEPPELIN is optimized for massive dense graphs: GRAPHZEPPELIN can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GRAPHZEPPELIN vastly increases the scale of graphs that can be processed.

关键词： Linear sketching streaming algorithms external memory

来源：评论

学校读者我要写书评

暂无评论

Sketching Approximability of All Finite CSPs

引用

JOURNAL OF THE ACM 2024年第2期71卷 1-74页

作者： Chou, Chi-Ning Golovnev, Alexander Sudan, Madhu Velusamy, Santhoshini Harvard Univ Sch Engn & Appl Sci Cambridge MA 02138 USA Georgetown Univ Dept Comp Sci Washington DC USA Toyota Technol Inst Chicago IL USA

A constraint satisfaction problem (CSP), Max-CSP(F), is specified by a finite set of constraints F subset of {[q](k) -> {0, 1}} for positive integers q and k. An instance of the problem on n variables is given bym applications of constraints from F to subsequences of the n variables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the (gamma , beta)-approximation version of the problem for parameters 0 <= beta < gamma <= 1, the goal is to distinguish instances where at least. fraction of the constraints can be satisfied from instances where at most beta fraction of the constraints can be satisfied. In this work, we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family F and every beta < gamma, we show that either a linear sketching algorithm solves the problem in polylogarithmic space or the problem is not solvable by any sketching algorithm in o(root n) space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case of q = k = 2, where we get a dichotomy, and the case when the satisfying assignments of the constraints of F support a distribution on [q](k) with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variables q = 2, binary constraints k = 2, and singleton families vertical bar|F vertical bar| = 1 and only considered the setting where constraints are placed on literals rather than variables. Our positive results show wide applicability of bias-based algorithms used previously by [47] and [41], which we extend to include richer norm estimation algorithms, by giving a systematic way to di

关键词： streaming algorithms communication lower bound inapproximability constraint satisfaction problem

来源：评论

学校读者我要写书评

暂无评论

An Improved Frequent Directions Algorithm for Low-Rank Approximation via Block Krylov Iteration

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024年第7期35卷 9428-9442页

作者： Wang, Chenhao Yi, Qianxin Liao, Xiuwu Wang, Yao Xi An Jiao Tong Univ Ctr Intelligent Decis Making & Machine Learning Sch Management Xian 710049 Peoples R China

Frequent Directions, as a deterministic matrix sketching technique, has been proposed for tackling low-rank approximation problems. This method has a high degree of accuracy and practicality, but experiences a lot of computational cost for large-scale data. Several recent works on the randomized version of Frequent Directions greatly improve the computational efficiency, but unfortunately sacrifice some precision. To remedy such issue, this paper aims to find a more accurate projection subspace to further improve the efficiency and effectiveness of the existing Frequent Directions techniques. Specifically, by utilizing the power of Block Krylov Iteration and random projection technique, this paper presents a fast and accurate Frequent Directions algorithm named as r-BKIFD. The rigorous theoretical analysis shows that the proposed r-BKIFD has a comparable error bound with original Frequent Directions, and the approximation error can be arbitrarily small when the number of iterations is chosen appropriately. Extensive experimental results on both synthetic and real data further demonstrate the superiority of r-BKIFD over several popular Frequent Directions algorithms, both in terms of computational efficiency and accuracy.

关键词： Block Krylov iteration frequent directions (FDs) low-rank approximation randomized sketching streaming algorithms

来源：评论

学校读者我要写书评

暂无评论

PES: Priority Edge Sampling in streaming Triangle Estimation

引用

IEEE TRANSACTIONS ON BIG DATA 2022年第2期8卷 470-481页

作者： Etemadi, Roohollah Lu, Jianguo Univ Windsor Sch Comp Sci Windsor ON N9B 3P4 Canada

The number of triangles (hereafter denoted by Delta) is an important metric to analyze massive graphs. It is also used to compute clustering coefficient in networks. This paper proposes a new algorithm called PES (Priority Edge Sampling) to estimate the number of triangles in the streaming model where we need to minimize the memory window. PES combines edge sampling and reservoir sampling. Compared with the state-of-the-art streaming algorithms, PES outperforms consistently. The results are verified extensively in 48 large real-world networks in different domains and structures. The performance ratio can be as large as 11. More importantly, the ratio grows with data size almost exponentially. This is especially important in the era of big data-while we can tolerate existing algorithms for smaller datasets, our method is indispensable when sampling very large data. In addition to empirical comparisons, we also proved that the estimator is unbiased, and derived the variance.

关键词： Graph sampling triangles streaming algorithms variance

来源：评论

学校读者我要写书评

暂无评论

Set Cover in the One-pass Edge-arrival streaming Model 23

Set Cover in the One-pass Edge-arrival Streaming Model

引用

42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS)

作者： Khanna, Sanjeev Konrad, Christian Alexandru, Cezar-Mihail Univ Penn Dept Comp & Informat Sci Philadelphia PA 19104 USA Univ Bristol Dept Comp Sci Bristol Avon England

ISBN: (纸本)9798400701276

We study the Set Cover problem in the one-pass edge-arrival streaming model. In this model, the input stream consists of a sequence of tuples (S, u), indicating that element u is contained in set S. This setting captures the streaming Dominating Set problem and is more general and harder to solve than the Set Cover set-arrival setting, where entire sets with all their elements arrive in the stream one-by-one. We prove the following results (n is the size of the universe, m is the number of sets): (1) A work by [Khanna, Konrad, ITCS'22] on streaming Dominating Set implies a one-pass (O) over tilde (root n)-approximation algorithm with space (O) over tilde (m) for edge-arrival Set Cover in adversarially ordered streams. We show that this space bound is best possible up to poly-log factors in that every alpha-approximation algorithm, for alpha = Omega(root n), requires space (Omega) over tilde (mn(2)/alpha(4)) in adversarially ordered streams, even if the algorithm is only required to output an alpha-approximation of the size of an optimal cover. (2) As our main result, we give a one-pass (O) over tilde (root n)-approximation algorithm with space (O) over tilde (m/root n) for edge-arrival Set Cover in random order streams. This result together with the lower bound mentioned above establishes a strong separation between the adversarial and random order settings. (3) Finally, in adversarial order streams, we showthat non-trivial algorithms with space o(m) can be achieved at the expense of increased approximation factors (O) over tilde (root n), which is in contrast to the set-arrival setting, where space (O) over tilde (n) is enough for a Theta(root n)-approximation, and space Omega(n) is needed for an o(n/log n)-approximation. We give an alpha-approximation algorithm for one-pass edge-arrival Set Cover with space (O) over tilde (mn/alpha(2)), for every alpha = (Omega) over tilde(root n).

关键词： streaming algorithms Set Cover Random Order Lower Bounds

来源：评论

学校读者我要写书评

暂无评论

Pseudorandom Hashing for Space-bounded Computation with Applications in streaming 64

Pseudorandom Hashing for Space-bounded Computation with Appl...

引用

64th Annual IEEE Symposium on the Foundations of Computer Science (FOCS)

作者： Kacham, Praneeth Pagh, Rasmus Thorup, Mikkel Woodruff, David P. Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Copenhagen Copenhagen Denmark

ISBN: (纸本)9798350318944

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example: Andoni's F-p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of l(p) sampling in a turnstile stream. Both of our algorithms use (O) over tilde (d(1-2/p)) bits of space and have O(1) update time. For 0 < p < 2, the 1 +/- epsilon approximate F-p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(epsilon(-2) log d) bits of space but has an update time of O(log(2)(1/epsilon) log log(1/epsilon)). Using HashPRG, we show that if 1/root d <= epsilon <= 1/d(c) for an arbitrarily small constant c > 0, then we can obtain a 1 +/- epsilon approximate F-p estimation algorithm that uses the optimal O(epsilon(-2) log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation. CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asym

关键词： Fast Update time Pseudorandom Generators streaming algorithms

来源：评论

学校读者我要写书评

暂无评论

streaming Dictionary Matching with Mismatches

引用

ALGORITHMICA 2022年第4期84卷 896-916页

作者： Gawrychowski, Pawel Starikovskaya, Tatiana Univ Wroclaw PL-50137 Wroclaw Poland PSL Res Univ Ecole Normale Super DIENS F-75005 Paris France

In the k-mismatch problem we are given a pattern of length n and a text and must find all locations where the Hamming distance between the pattern and the text is at most k. A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only O(k log n/k) space and O(log n/k (root k log k + log(3) n)) time per letter (Clifford, Kociumaka, Porat, SODA 2019). In this work, we consider a strictly harder problem called dictionary matching with k mismatches. In this problem, we are given a dictionary of d patterns, where the length of each pattern is at most n, and must find all substrings of the text that are within Hamming distance k from one of the patterns. We develop a streaming algorithm for this problem with O(kd log(k) d polylog n) space and O(k log(k) d polylog n +vertical bar output vertical bar) time per position of the text. The algorithm is randomised and outputs correct answers with high probability. On the lower bound side, we show that any streaming algorithm for dictionary matching with k mismatches requires Omega(kd) bits of space.

关键词： streaming algorithms Pattern matching Hamming distance

来源：评论

学校读者我要写书评

暂无评论

On Regularity Lemma and Barriers in streaming and Dynamic Matching 2023

On Regularity Lemma and Barriers in Streaming and Dynamic Ma...

引用

55th Annual ACM Symposium on Theory of Computing (STOC) part of the ACM Federated Computing Research Conference (FCRC)

作者： Assadi, Sepehr Behnezhad, Soheil Khanna, Sanjeev Li, Huan Rutgers State Univ New Brunswick NJ 08903 USA Northeastern Univ Boston MA USA Univ Penn Philadelphia PA USA

ISBN: (纸本)9781450399135

We present a new approach for finding matchings in dense graphs by building on Szemeredi's celebrated Regularity Lemma. This allows us to obtain non-trivial albeit slight improvements over longstanding bounds for matchings in streaming and dynamic graphs. In particular, we establish the following results for n-vertex graphs: (i) A deterministic single-pass streaming algorithm that finds a (1 - o(1))-approximate matching in o(n2) bits of space. This constitutes the first single-pass algorithm for this problem in sublinear space that improves over the 1/2-approximation of the greedy algorithm. (ii) A randomized fully dynamic algorithm that with high probability maintains a (1 - o(1))-approximate matching in o(n) worst-case update time per each edge insertion or deletion. The algorithm works even against an adaptive adversary. This is the first o(n) update-time dynamic algorithm with approximation guarantee arbitrarily close to one. Given the use of regularity lemma, the improvement obtained by our algorithms over trivial bounds is only by some (log* n)T(1) factor. Nevertheless, in each case, they show that the "right" answer to the problem is not what is dictated by the previous bounds. Finally, in the streaming model, we also present a randomized (1o(1))-approximation algorithm whose space can be upper bounded by the density of certain Ruzsa-Szemeredi (RS) graphs. While RS graphs by now have been used extensively to prove streaming lower bounds, ours is the first to use them as an upper bound tool for desigining improved streaming algorithms.

关键词： Maximum Matching streaming algorithms Dynamic algorithms Regularity Lemma

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：