Designing efficient algorithms for forecasting of the state of the environment is one of the most important challenges in the field of time series analysis and accurate prediction. With the exponential rate of develop...
详细信息
ISBN:
(纸本)9781728155661
Designing efficient algorithms for forecasting of the state of the environment is one of the most important challenges in the field of time series analysis and accurate prediction. With the exponential rate of development of remote sensing and with the availability of fast computing platforms, it has now become possible to effectively and efficiently make use of vulnerability indicators forest stands.
With the large-scale adoption of Advanced Metering Infrastructure (AMI), power systems are now characterized by a wealth of information that can be exploited for better monitoring, management, and control. On the othe...
详细信息
ISBN:
(纸本)9781728128221
With the large-scale adoption of Advanced Metering Infrastructure (AMI), power systems are now characterized by a wealth of information that can be exploited for better monitoring, management, and control. On the other hand, specific techniques have to be employed to face the challenges brought by this large amount of data (Big Data). Traditional load modeling methodologies do not use the streams of data generated by AMI, providing static load profiles. In this work, an adaptive streaming algorithm is described to model any load through a Markov Chain. The proposed algorithm is able to cluster the load curves with a minimal computational effort, allowing realtime load modeling. The presented procedure's performance is evaluated by experimental validation and compared with two reference methodologies (Dynamical Clustering and k-Means) in terms of accuracy and computational time.
We present a new streaming algorithm for the k-MISMATCH problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming d...
详细信息
We present a new streaming algorithm for the k-MISMATCH problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most k from the pattern. Our algorithm is enhanced with an important new feature called ERROR Correcting, and its complexities for k = 1 and for a general k are comparable to those of the solutions for the k-MISMATCH problem by Porat and Porat (FOCS 2009) and Clifford et al. (SODA 2016). In parallel to our research, a yet more efficient algorithm for the k-MISMATCH problem with the ERROR CORRECTING feature was developed by Clifford et al. (SODA 2019). Using the new feature and recent work on streaming MULTIPLE PATTERN MATCHING we develop a series of streaming algorithms for pattern matching on weighted strings, which are a commonly used representation of uncertain sequences in molecular biology. We also show that these algorithms are space-optimal up to polylog factors. A preliminary version of this work was published at DCC 2017 conference[24]. (C) 2019 Elsevier Inc. All rights reserved.
Network administrators constantly monitor network traffic for congestion and attacks. They need to perform a large number of measurements on the traffic simultaneously, to detect different types of anomalies such as h...
详细信息
ISBN:
(纸本)9781450379557
Network administrators constantly monitor network traffic for congestion and attacks. They need to perform a large number of measurements on the traffic simultaneously, to detect different types of anomalies such as heavy hitters or super-spreaders. Existing techniques often focus on a single statistic (e.g., traffic volume) or traffic attribute (e.g., destination IP). However, performing numerous heterogeneous measurements within the constrained memory architecture of modern network devices poses significant challenges, due to the limited number of memory accesses allowed per packet. We propose BeauCoup, a system based on the coupon collector problem, that supports multiple distinct counting queries simultaneously while making only a small constant number of memory accesses per packet. We implement BeauCoup on PISA commodity programmable switches, satisfying the strict memory size and access constraints while using a moderate portion of other data-plane hardware resources. Evaluations show BeauCoup achieves the same accuracy as other sketch-based or sampling-based solutions using 4x fewer memory access.
Sketches have successfully provided accurate and fine-grained measurements (e.g., flow size and heavy hitters) which are imperative for network management. In particular, Count-Min (CM) sketch is widely utilized in ma...
详细信息
ISBN:
(纸本)9781450379489
Sketches have successfully provided accurate and fine-grained measurements (e.g., flow size and heavy hitters) which are imperative for network management. In particular, Count-Min (CM) sketch is widely utilized in many applications due to its simple design and ease of implementation. There have been many efforts to build monitoring frameworks based on Count-Min sketch. However, these frameworks either support very specific measurement tasks or they cannot be implemented on high-speed programmable hardware (PISA). In this work, we propose FCM, a framework that is designed to support generic network measurement with high accuracy. Our key contribution is FCM-Sketch, a data structure that has a lightweight implementation on the emerging PISA programmable switches. FCM-Sketch can also be used as a substitute for CM-Sketch in applications that use CM-Sketch. We have implemented FCM-Sketch on a commodity programmable switch (Barefoot Tofino) using the P4 language. Our evaluation shows that FCM-Sketch can reduce the errors in many measurement tasks by 50% to 80% compared to CM-Sketch and other state-of-the-art approaches.
This paper presents a systematic study of the space complexity of estimating the Schatten p-norms of an n x n matrix in the turnstile streaming model. Both kinds of space complexities, bit complexity and sketching dim...
详细信息
This paper presents a systematic study of the space complexity of estimating the Schatten p-norms of an n x n matrix in the turnstile streaming model. Both kinds of space complexities, bit complexity and sketching dimension, are considered. Furthermore, two sketching models, general linear sketching and bilinear sketching, are considered. When p is not an even integer, we show that any one-pass algorithm with constant success probability requires near-linear space in terms of bits. This lower bound holds even for sparse matrices, i.e., matrices with O(1) nonzero entries per row and per column. However, when p is an even integer, we give for sparse matrices an upper bound which, up to logarithmic factors, is the same as estimating the pth moment of an n-dimensional vector. These results considerably strengthen lower bounds in previous work for arbitrary (not necessarily sparse) matrices. Similar near-linear lower bounds are obtained for Ky Fan norms, SVD entropy, eigenvalue shrinkers, and M-estimators, many of which could have been solvable in logarithmic space prior to this work. The results for general linear sketches give separations in the sketching complexity of Schatten p-norms with the corresponding vector p-norms, and rule out a table-lookup nearest-neighbor search for p = 1, making progress on a question of Andoni. The results for bilinear sketches are tight for the rank problem and nearly tight for p >= 2;the latter is the first general subquadratic upper bound for sketching the Schatten norms.
Coresets are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properti...
详细信息
ISBN:
(纸本)9781450362016
Coresets are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can ...
详细信息
ISBN:
(纸本)9783959771160
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can process Omega(n(2)) updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [23] and Ai, Hu, Li and Woodruff [3] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo p for integers p >= 2, and to approximation instead of exact computation.
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can ...
详细信息
ISBN:
(纸本)9783959771160
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can process Ω(n2) updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [23] and Ai, Hu, Li and Woodruff [3] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo p for integers p ≥ 2, and to approximation instead of exact computation.
In a social network, influence maximization is the problem of identifying a set of users that own the maximum influence ability across the network. In this paper, a novel credit distribution (CD) based model, termed a...
详细信息
ISBN:
(纸本)9781538646588
In a social network, influence maximization is the problem of identifying a set of users that own the maximum influence ability across the network. In this paper, a novel credit distribution (CD) based model, termed as the multi-action CD (mCD) model, is introduced to quantify the influence ability of each user. Compared to existing models, the new model can work with practical datasets where one type of action is recorded for multiple times. Based on this model, influence maximization is formulated as a submodular maximization problem under a knapsack constraint, which is NP-hard. An efficient streaming algorithm is developed to achieve (1/3 - epsilon) approximation of the optimality. Experiments conducted on real Twitter dataset demonstrate that the mCD model enjoys high accuracy compared to the conventional CD model in estimating the total number of people who get influenced in a social network. Furthermore, compared to the greedy algorithm, the proposed single-pass streaming algorithm achieves similar performance in terms of influence maximization, while running several orders of magnitude faster.
暂无评论