检索结果-内蒙古大学图书馆

A parallel subdomain by subdomain implementation of the implicitly restarted Arnoldi/Lanczos method

COMPUTATIONAL MECHANICS 2011年第5期48卷 563-577页

作者： Ainsworth, G. O., Jr. Ribeiro, F. L. B. Magluta, C. COPPE Univ Fed Rio de Janeiro Programa Engn Civil BR-21945970 Rio De Janeiro Brazil

This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, addressing issues such as memory requirements and computational costs related to the generation and storage of the Krylov basis. The presented implementation is suitable for distributed memory architectures, especially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were tested. compressed data structures in the formats CSRC and CSRC/CSR were used to store the coefficient matrices. The parallelization of numerical linear algebra operations present in both Krylov and implicitly restarted methods are discussed. Numerical examples are shown, in order to point out the efficiency and applicability of the proposed method.

关键词： Arnoldi/Lanczos eigensolvers Finite element analysis Parallel computing compressed data structures

来源：评论

学校读者我要写书评

暂无评论

Wee LCP

引用

INFORMATION PROCESSING LETTERS 2010年第8-9期110卷 317-320页

作者： Fischer, Johannes Karlsruhe Inst Technol D-76128 Karlsruhe Germany

We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer LCP-queries asymptotically in the same time that is needed to retrieve an entry from the suffix array. This yields the smallest compressed suffix tree with sub-logarithmic navigation time. (C) 2010 Elsevier B.V. All rights reserved.

关键词： data sructures Text indexing compressed data structures Suffix arrays Longest common prefixes

来源：评论

学校读者我要写书评

暂无评论

Practical approaches to reduce the space requirement of Lempel-Ziv-based compressed text indices

引用

ACM Journal of Experimental Algorithmics 2010年第1期15卷 1.1–1.54页

作者： Arroyuelo, Diego Navarro, Gonzalo Yahoo Research Chile Blanco Encalada 2120 Santiago Chile Department of Computer Science University of Chile Blanco Encalada 2120 Santiago Chile

Given a text T[1..n] over an alphabet of size σ, the full-text search problem consists in locating the occ occurrences of a given pattern P[1..m] in T. compressed full-text self-indices are space-efficient representations of the text that provide direct access to and indexed search on it. The LZ-index of Navarro is a compressed full-text self-index based on the LZ78 compression algorithm. This index requires about 5 times the size of the compressed text (in theory, 4nH_k(T) + o(nlogσ) bits of space, where H_k(T) is the k-th order empirical entropy of T). In practice, the average locating complexity of the LZ-index is O(σmlog_σ n + occ σ^ml2), where occ is the number of occurrences of P. It can extract text substrings of length l in O(l) time. This index outperforms competing schemes both to locate short patterns and to extract text snippets. However, the LZ-index can be up to 4 times larger than the smallest existing indices (which use nH_k(T) + o(n logσ) bits in theory), and it does not offer space/time tuning options. This limits its applicability. In this article, we study practical ways to reduce the space of the LZ-index. We obtain new LZ-index variants that require 2(1 + ε)nH_k(T) + o(nlogσ) bits of space, for any 0 < e < 1. They have an average locating time of O(1/ε(mlogn + occ σ^m/2)), while extracting takes O(l) time. We perform extensive experimentation and conclude that our schemes are able to reduce the space of the original LZ-index by a factor of 2/3, that is, around 3 times the compressed text size. Our schemes are able to extract about 1 to 2 MB of the text per second, being twice as fast as the most competitive alternatives. Pattern occurrences are located at a rate of up to 1 to 4 million per second. This constitutes the best space/time trade-off when indices are allowed to use 4 times the size of the compressed text or more. © 2010 ACM 1084-6654/2010/12-ART1.5 $10.00.

关键词： compressed data structures Indexed text search Lempel-Ziv compression

来源：评论

学校读者我要写书评

暂无评论

Faster entropy-bounded compressed suffix trees

引用

THEORETICAL COMPUTER SCIENCE 2009年第51期410卷 5354-5364页

作者： Fischer, Johannes Makinen, Veli Navarro, Gonzalo Univ Chile Dept Comp Sci Santiago Chile Univ Helsinki Dept Comp Sci FIN-00014 Helsinki Finland Univ Tubingen Ctr Bioinformat ZBIT D-72074 Tubingen Germany

Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest. (C) 2009 Elsevier B.V. All rights reserved.

关键词： Suffix trees compressed data structures Range minimum queries

来源：评论

学校读者我要写书评

暂无评论

Rank/select on dynamic compressed sequences and applications

引用

THEORETICAL COMPUTER SCIENCE 2009年第43期410卷 4414-4422页

作者： Gonzalez, Rodrigo Navarro, Gonzalo Univ Chile Dept Comp Sci Santiago Chile

Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated via insertions and deletions of symbols. Two current solutions stand out as the best in the tradeoff of space versus time (when considering all the operations). One solution, by Makinen and Navarro, achieves compressed space (i.e., nH(0) + o(n log sigma) bits) and O(log n log sigma) worst-case time for all the operations, where n is the sequence length, sigma is the alphabet size, and H-0 is the zero-order entropy of the sequence. The other solution, by Lee and Park, achieves O(log n(1 + log sigma/log log n)) amortized time and uncompressed space, i.e. n log(2) sigma + O(n) + o(n log sigma) bits. In this paper we show that the best of both worlds can be achieved. We combine the solutions to obtain nH(0) + o(n log sigma) bits of space and O(log n(1 + log sigma/log log n)) worst-case time for all the operations. Apart from the best current solution to the problem, we obtain several byproducts of independent interest applicable to partial sums, text indexes, suffix arrays, the Burrows-Wheeler transform, and others. (C) 2009 Elsevier B.V. All rights reserved.

关键词： compressed data structures Empirical entropy Text indexing

来源：评论

学校读者我要写书评

暂无评论

Storage and Retrieval of Individual Genomes

引用

13th Annual International Conference on Research Computational Molecular Biology

作者： Makinen, Veli Navarro, Gonzalo Siren, Jouni Valimaki, Niko Univ Helsinki Dept Comp Sci Helsinki Finland Univ Chile Dept Comp Sci Santiago Chile

ISBN: (纸本)9783642020070

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the, differences can be expressed by lists of basic edit, operations. Flexible and efficient data analysis oil a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log sigma) bits, which very soon inhibits in-memory analyses. Recent, advances in full-text self-indexing reduce the space of suffix tree to O(N log sigma) bits, where sigma is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only oil the length n of the base sequence and the number 8 of variations ill its repeated copies. That is, the space reduction factor is no longer constant, but depends on N/n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.

关键词： Comparative genomics full-text indexing suffix tree compressed data structures

来源：评论

学校读者我要写书评

暂无评论

Rank and select revisited and extended

引用

THEORETICAL COMPUTER SCIENCE 2007年第3期387卷 332-347页

作者： Makinen, Veli Navarro, Gonzalo Univ Chile Dept Comp Sci Ctr Web Res Santiago Chile Univ Helsinki Dept Comp Sci FIN-00014 Helsinki Finland

The deep connection between the Burrows-Wheeler transform (BWT) and the so-called rank and select data structures for symbol sequences is the basis of most successful approaches to compressed text indexing. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. It has been shown that improvements to rank/select algorithms, in combination with the BWT, turn into improved compressed text indexes. This paper is devoted to alternative implementations and extensions of rank and select data structures. First, we show that one can use gap encoding techniques to obtain constant time rank and select queries in essentially the same space as what is achieved by the best current direct solution (and sometimes less). Second, we extend symbol rank and select to substring rank and select, giving several space/time trade-offs for the problem. An application of these queries is in position-restricted substring searching, where one can specify the range in the text where the search is restricted to, and only occurrences residing in that range are to be reported. In addition, arbitrary occurrences are reported in text position order. Several byproducts of our results display connections with searchable partial sums, Chazelle's two-dimensional data structures, and Grossi et al.'s wavelet trees. (c) 2007 Elsevier B.V. All rights reserved.

关键词： succinct data structures compressed data structures gap encoding range searching position-restricted substring searching wavelet trees substring rank and select

来源：评论

学校读者我要写书评

暂无评论

compressed suffix arrays and suffix trees with applications to text indexing and string matching

引用

SIAM JOURNAL ON COMPUTING 2005年第2期35卷 378-407页

作者： Grossi, R Vitter, JS Univ Pisa Dipartimento Informat I-56127 Pisa Italy Purdue Univ Dept Comp Sci W Lafayette IN 47907 USA Duke Univ Durham NC 27706 USA

The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Sigma. The text T can be represented in n lg |Sigma| bits by encoding each symbol with lg |Sigma| bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require O( n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Omega(n) memory words, each of Omega(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Omega(lg| Sigma| n), which is significant when Sigma is of constant size, such as in ASCII or UNICODE. On the other hand, these indexes support fast searching, either in O(m lg |Sigma|) time or in O(m+ lg n) time, plus an output-sensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m/lg(|Sigma|) n + lg(|Sigma|)(epsilon) n) search time in the worst case, for any constant 0 < epsilon <= 1, using at most (epsilon(-1) + O(1)) n lg |Sigma| bits of storage. Our result thus presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice. As a concrete example, the compressed su. x array for a typical 100 MB ASCII file can require 30 - 40 MB or less, while the raw su. x array requires 500 MB. Our theoretical bounds improve both time and space of previous indexing schemes. Listing the

关键词： compression text indexing text retrieval compressed data structures suffix arrays suffix trees string searching pattern matching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：