检索结果-内蒙古大学图书馆

Computing the multi-string BWT and LCP array in external memory

THEORETICAL COMPUTER SCIENCE 2021年 862卷 42-58页

作者： Bonizzoni, Paola Vedova, Gianluca Della Pirola, Yuri Previtali, Marco Rizzi, Raffaella Univ Milano Bicocca Dipartimento Informat Sistemist & Comunicaz Milan Italy

Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multi-string generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set of strings is instrumental to compute the suffix-prefix overlaps among strings, which is an essential step for many genome assembly algorithms. In a previous paper, we presented an in-memory divide- and-conquer method for building the BWT and LCP where we merge partial BWTs with a forward approach to sort suffixes. In this paper, we propose an alternative backward strategy to develop an external memory method to simultaneously build the BWT and the LCP array on a collection of mstrings of different lengths. The algorithm over a set of strings having constant length khas O(mkl) time and I/O volume, using O(k + m) main memory, where lis the maximum value in the LCP array. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Burrows-Wheeler transform Longest common prefix array external-memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Timely Reporting of Heavy Hitters Using external memory

引用

ACM TRANSACTIONS ON DATABASE SYSTEMS 2021年第4期46卷 14-14页

作者： Singh, Shikha Pandey, Prashant Bender, Michael A. Berry, Jonathan W. Farach-Colton, Martin Johnson, Rob Kroeger, Thomas M. Phillips, Cynthia A. Williams Coll Dept Comp Sci Williamstown MA 01267 USA VMware Res 3425 Hillview Ave Palo Alto CA 94304 USA SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA Sandia Natl Labs Mailstop 1327POB 5800 Albuquerque NM 87185 USA Rutgers State Univ Dept Comp Sci Piscataway NJ 08854 USA

Given an input stream S of size N, a phi-heavy hitter is an item that occurs at least phi N times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = phi N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Omega(N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., approximate to 100K observations per second.

关键词： Dictionary data structure streaming algorithms external-memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Timely Reporting of Heavy Hitters using external memory 20

Timely Reporting of Heavy Hitters using External Memory

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Pandey, Prashant Singh, Shikha Bender, Michael A. Berry, Jonathan W. Farach-Colton, Martin Johnson, Rob Kroeger, Thomas M. Phillips, Cynthia A. Carnegie Mellon Univ Pittsburgh PA 15213 USA Wellesley Coll Wellesley MA 02181 USA SUNY Stony Brook Stony Brook NY 11794 USA Sandia Natl Labs Livermore CA 94550 USA Rutgers State Univ New Brunswick NJ USA VMware Res Palo Alto CA USA

ISBN: (纸本)9781450367356

Given an input stream of size N, a phi-heavy hitter is an item that occurs at least phi N times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = phi Nth occurrence (and hence becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams, and with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Omega(N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable trade-off between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., approximate to 100K observations per second.

关键词： Dictionary data structure streaming algorithms external-memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design 19

Small Refinements to the DAM Can Have Big Consequences for D...

引用

31st ACM Symposium on Parallelism in algorithms and Architecturess (SPAA)

作者： Bender, Michael A. Conway, Alex Farach-Colton, Martin Jannen, William Jiao, Yizheng Johnson, Rob Knorr, Eric McAllister, Sara Mukherjee, Nirjhar Pandey, Prashant Porter, Donald E. Yuan, Jun Zhan, Yang SUNY Stony Brook Stony Brook NY 11794 USA Rutgers State Univ New Brunswick NJ USA VMware Res Palo Alto CA USA Williams Coll Williamstown MA 01267 USA Univ North Carolina Chapel Hill Chapel Hill NC USA Harvey Mudd Coll Claremont CA 91711 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA Pace Univ New York NY 10038 USA

ISBN: (纸本)9781450361842

Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-Access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the halfbandwidth point, where the latency and bandwidth of the hardware are equal, the DAM approximates the IO cost on any hardware to within a factor of 2. Furthermore, the DAM explains the popularity of B-trees in the 70s and the current popularity of B-epsilon-trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all B-epsilon-trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this paper, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and B-epsilon-trees. Furthermore, the models predict that the B-tree is highly sensitive to variations in the node size whereas B-epsilon-trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, B-epsilon-trees can be optimized so that all operations are simultaneously optimal, even up to lower order terms. In the PDAM model, B-epsilon-trees (or B-trees) can be organized so that both sequential and concurrent workloads are handled efficie

关键词： LSM SSD external-memory algorithms NVME write-optimization PDAM BTree DAM

来源：评论

学校读者我要写书评

暂无评论

A topological sorting algorithm for large graphs

引用

ACM Journal of Experimental Algorithmics 2012年第PP3.1–3.21期17卷 3.1–3.21页

作者： Deepak Ajwani Adan Cosgaya-Lozano Norbert Zeh University College Cork Ireland Dalhousie University Canada

We present an I/O-efficient algorithm for topologically sorting directed acyclic graphs, called IterTS. In the worst case, our algorithm is extremely inefficient and performs O(n ċ sort(m)) I/Os. However, our experiments show that IterTS achieves good performance in practice. To evaluate IterTS, we compared its running time to those of three competitors: PeelTS, an I/O-efficient implementation of the standard strategy of iteratively removing sources and sinks; ReachTS, an I/O-efficient implementation of a recent parallel divide-and-conquer algorithm based on reachability queries; and SeTS, a standard DFS-based topological sorting built on top of a semiexternal DFS algorithm. In our evaluation on various types of input graphs, IterTS consistently outperformed PeelTS and ReachTS by at least an order of magnitude in most cases. SeTS outperformed IterTS on most graphs whose vertex sets fit in memory. However, IterTS often came close to the running time of SeTS on these inputs and, more importantly, SeTS was not able to process graphs whose vertex sets were beyond the size of main memory, while IterTS was able to process such inputs efficiently.

关键词： external-memory algorithms graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Pruning spanners and constructing well-separated pair decompositions in the presence of memory hierarchies

引用

JOURNAL OF DISCRETE algorithms 2010年第3期8卷 259-272页

作者： Gieseke, Fabian Gudmundsson, Joachim Vahrenhold, Jan Tech Univ Dortmund Fac Comp Sci LS 11 D-44227 Dortmund Germany NICTA ATP Sydney NSW 2015 Australia

Given a geometric graph G = (S, E) in R-d with constant dilation t, and a positive constant epsilon, we show how to construct a (1 + epsilon)-spanner of G with O(| S|) edges using O(sort(| E|)) memory transfers in the cache-oblivious model of computation. The main building block of our algorithm, and of independent interest in itself, is a new cacheoblivious algorithm for constructing a well- separated pair decomposition which builds such a data structure for a given point set S C R-d using O(sort(| S|)) memory transfers. (C)2010 Elsevier B. V. All rights reserved.

关键词： external-memory algorithms Cache-oblivious algorithms Geometric graphs Spanners Well-separated pair decomposition

来源：评论

学校读者我要写书评

暂无评论

I/O-efficient algorithms for computing planar geometric spanners

引用

COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS 2008年第3期40卷 252-271页

作者： Maheshwari, Anil Smid, Michiel Zeh, Norbert Fac Comp Sci Halifax NS B3H 1W5 Canada Carleton Univ Sch Comp Sci Ottawa ON K1S 5B6 Canada

关键词： external-memory algorithms computational geometry geometric spanners shortest paths

来源：评论

学校读者我要写书评

暂无评论

An external-memory depth-first search algorithm for general grid graphs

引用

THEORETICAL COMPUTER SCIENCE 2007年第1-3期374卷 170-180页

作者： Her, Jun-Ho Ramakrishna, R. S. Gwangju Inst Sci & Technol Dept Informat & Commun Kwangju 500712 South Korea

Graph data in modern scientific and engineering applications are often too large to fit in the computer's main memory. Input/output (I/O) complexity is a major research issue in this context. Minimization of the number of I/O operations (in external memory graph algorithms) is the main focus of current research while classical (internal memory) graph algorithms were designed to minimize temporal complexity. In this paper, we propose an external memory depth first search algorithm for general grid graphs. The I/O-complexity of the algorithm is O(sort(N) log(2) N/M), where N = vertical bar V vertical bar + vertical bar E vertical bar, sort(N) = theta(N/M log(M/B) N/B) is the sorting I/O-complexity, M is the memory size, and B is the block size. The best known algorithm for this class of graph is the standard (internal memory) DFS algorithm with appropriate block (sub-grid) I/O-access. Its I/O-complexity is O(N/root B). Recently, the authors proposed an O(sort(N)) algorithm for solid grid graphs. (c) 2007 Elsevier B.V. All rights reserved.

关键词： I/O-complexity external-memory algorithms graph algorithms

来源：评论

学校读者我要写书评

暂无评论

I/O-efficient well-separated pair decomposition and applications

引用

ALGORITHMICA 2006年第4期45卷 585-614页

作者： Govindarajan, Sathish Lukovszki, Tamas Maheshwari, Anil Zeh, Norbert Duke Univ Dept Comp Sci Durham NC 27708 USA Univ Paderborn Heinz Nixdorf Inst Paderborn Germany Univ Paderborn Dept Comp Sci Paderborn Germany Carleton Univ Sch Comp Sci Ottawa ON K1S 5B6 Canada Dalhousie Univ Fac Comp Sci Halifax NS B3H 1W5 Canada

We present an external-memory algorithm to compute a well-separated pair decomposition (WSPD) of a given point set S in R-d in O(sort( N)) I/Os, where N is the number of points in S and sort( N) denotes the I/O-complexity of sorting N items. ( Throughout this paper we assume that the dimension d is fixed.) As applications of the WSPD, we show how to compute a linear-size t-spanner for S within the same I/O-bound and how to solve the K-nearest-neighbour and K-closest-pair problems in O( sort(K N)) and O( sort( N + K)) I/Os, respectively.

关键词： external-memory algorithms computational geometry well-separated pair decomposition spanners closest-pair problem proximity problems

来源：评论

学校读者我要写书评

暂无评论

external-memory depth-first search algorithm for solid grid graphs

引用

INFORMATION PROCESSING LETTERS 2005年第4期93卷 177-183页

作者： Her, JH Ramakrishna, RS Gwangju Inst Sci & Technol Dept Informat & Commun Kwangju South Korea

In this paper. we propose an external memory depth first search algorithm for solid grid graphs, a subclass of grid graphs. The I/O-complexity of the algorithm, is O(sort(N)), where N = \V\ + \E\, sort(N) = Theta(N/B log (M/B) N/B) is the sorting I/O-complexity. M is the memory size, and B is the block size. Since grid graphs might be nonplanar (if diagonal edges intersect), they are beyond the reach of existing planar depth first search algorithms. The best known algorithm for this class of graph is the standard (internal memory) DFS algorithm with appropriate block (sub-grid) I/O-access. Its I/O-complexity is O(N/rootB). (C) 2004 Elsevier B.V. All rights reserved.

关键词： I/O-complexity external-memory algorithms graph algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：