This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos use...
详细信息
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, addressing issues such as memory requirements and computational costs related to the generation and storage of the Krylov basis. The presented implementation is suitable for distributed memory architectures, especially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were tested. compressed data structures in the formats CSRC and CSRC/CSR were used to store the coefficient matrices. The parallelization of numerical linear algebra operations present in both Krylov and implicitly restarted methods are discussed. Numerical examples are shown, in order to point out the efficiency and applicability of the proposed method.
We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are suf...
详细信息
We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer LCP-queries asymptotically in the same time that is needed to retrieve an entry from the suffix array. This yields the smallest compressed suffix tree with sub-logarithmic navigation time. (C) 2010 Elsevier B.V. All rights reserved.
Given a text T[1..n] over an alphabet of size σ, the full-text search problem consists in locating the occ occurrences of a given pattern P[1..m] in T. compressed full-text self-indices are space-efficient representa...
详细信息
Suffix trees are among the most important datastructures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research st...
详细信息
Suffix trees are among the most important datastructures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest. (C) 2009 Elsevier B.V. All rights reserved.
Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations, trees, graphs, and so...
详细信息
Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated via insertions and deletions of symbols. Two current solutions stand out as the best in the tradeoff of space versus time (when considering all the operations). One solution, by Makinen and Navarro, achieves compressed space (i.e., nH(0) + o(n log sigma) bits) and O(log n log sigma) worst-case time for all the operations, where n is the sequence length, sigma is the alphabet size, and H-0 is the zero-order entropy of the sequence. The other solution, by Lee and Park, achieves O(log n(1 + log sigma/log log n)) amortized time and uncompressed space, i.e. n log(2) sigma + O(n) + o(n log sigma) bits. In this paper we show that the best of both worlds can be achieved. We combine the solutions to obtain nH(0) + o(n log sigma) bits of space and O(log n(1 + log sigma/log log n)) worst-case time for all the operations. Apart from the best current solution to the problem, we obtain several byproducts of independent interest applicable to partial sums, text indexes, suffix arrays, the Burrows-Wheeler transform, and others. (C) 2009 Elsevier B.V. All rights reserved.
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version con...
详细信息
ISBN:
(纸本)9783642020070
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the, differences can be expressed by lists of basic edit, operations. Flexible and efficient data analysis oil a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log sigma) bits, which very soon inhibits in-memory analyses. Recent, advances in full-text self-indexing reduce the space of suffix tree to O(N log sigma) bits, where sigma is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only oil the length n of the base sequence and the number 8 of variations ill its repeated copies. That is, the space reduction factor is no longer constant, but depends on N/n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.
The deep connection between the Burrows-Wheeler transform (BWT) and the so-called rank and select datastructures for symbol sequences is the basis of most successful approaches to compressed text indexing. Rank of a ...
详细信息
The deep connection between the Burrows-Wheeler transform (BWT) and the so-called rank and select datastructures for symbol sequences is the basis of most successful approaches to compressed text indexing. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. It has been shown that improvements to rank/select algorithms, in combination with the BWT, turn into improved compressed text indexes. This paper is devoted to alternative implementations and extensions of rank and select datastructures. First, we show that one can use gap encoding techniques to obtain constant time rank and select queries in essentially the same space as what is achieved by the best current direct solution (and sometimes less). Second, we extend symbol rank and select to substring rank and select, giving several space/time trade-offs for the problem. An application of these queries is in position-restricted substring searching, where one can specify the range in the text where the search is restricted to, and only occurrences residing in that range are to be reported. In addition, arbitrary occurrences are reported in text position order. Several byproducts of our results display connections with searchable partial sums, Chazelle's two-dimensional datastructures, and Grossi et al.'s wavelet trees. (c) 2007 Elsevier B.V. All rights reserved.
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as...
详细信息
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Sigma. The text T can be represented in n lg |Sigma| bits by encoding each symbol with lg |Sigma| bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require O( n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Omega(n) memory words, each of Omega(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Omega(lg| Sigma| n), which is significant when Sigma is of constant size, such as in ASCII or UNICODE. On the other hand, these indexes support fast searching, either in O(m lg |Sigma|) time or in O(m+ lg n) time, plus an output-sensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m/lg(|Sigma|) n + lg(|Sigma|)(epsilon) n) search time in the worst case, for any constant 0 < epsilon <= 1, using at most (epsilon(-1) + O(1)) n lg |Sigma| bits of storage. Our result thus presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice. As a concrete example, the compressed su. x array for a typical 100 MB ASCII file can require 30 - 40 MB or less, while the raw su. x array requires 500 MB. Our theoretical bounds improve both time and space of previous indexing schemes. Listing the
暂无评论