This paper presents a taxonomy of some exact, right-to-left, string-matching algorithms. The taxonomy is based on results obtained by using logic program transformation over a naive and nondeterministic specification....
详细信息
ISBN:
(纸本)9783642119989
This paper presents a taxonomy of some exact, right-to-left, string-matching algorithms. The taxonomy is based on results obtained by using logic program transformation over a naive and nondeterministic specification. A derivation of the search part and some notes about the preprocessing part of each algorithm is presented. The derivations show several design decisions behind each algorithm, and allow us to organize the algorithms within a taxonomic tree, giving us a better understanding of these algorithms and possible mechanical procedures to derive them.
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1] with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. The pr...
详细信息
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1] with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the...
详细信息
ISBN:
(纸本)9781605587226
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1] with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2... Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r). Copyright 2010 ACM.
The paper proposes improving methods to advance the matching rate for multi-pattern string matching algorithm Wu-Manber, First, string abstract value matching method advances the precision of the first matching, and r...
详细信息
We present a new algorithm for exact multiple string matching. Our algorithm is based on filtration combining BNDM and q-grams. We have tested it with experiments and compared it with other algorithms, e.g. DFA, AC-BM...
详细信息
The Lecroq algorithm is the best current solutions of single pattern string matching for binary text. In this paper, we simplify Lecroq and propose a more practical variant which is named S-Lecroq. We analysis the com...
详细信息
We study parallel and distributed compressed indexes. Compressed indexes are a new and functional way to index text strings. They exploit the compressibility of the text, so that their size is a function of the compre...
详细信息
ISBN:
(纸本)9783642135088
We study parallel and distributed compressed indexes. Compressed indexes are a new and functional way to index text strings. They exploit the compressibility of the text, so that their size is a function of the compressed text size. Moreover, they support a considerable amount of functions, more than many classical indexes. We make use of this extended functionality to obtain, in a shared-memory parallel machine, near-optimal speedups for solving several stringology problems. We also show how to distribute compressed indexes across several machines.
The suffix tree is an extremely important data structure for stringology, with a wealth of applications in bioinformatics. Classical implementations require much space, which renders them useless for large problems. R...
详细信息
ISBN:
(纸本)9783642131929
The suffix tree is an extremely important data structure for stringology, with a wealth of applications in bioinformatics. Classical implementations require much space, which renders them useless for large problems. Recent research has yielded two implementations offering widely different space-time tradeoffs. However, each of them has practicality problems regarding either space or time requirements. In this paper we implement a recent theoretical proposal and show it yields an extremely interesting structure that lies in between, offering both practical times and affordable space. The implementation is by no means trivial and involves significant algorithm engineering.
With the advent of new sequencing technologies able to produce an enormous quantity of short gnomic sequences, new tools able to search for them inside a references sequence genome have emerged. Because of chemical re...
详细信息
ISBN:
(纸本)9783642130885
With the advent of new sequencing technologies able to produce an enormous quantity of short gnomic sequences, new tools able to search for them inside a references sequence genome have emerged. Because of chemical reading errors or of the variability between organisms, one is interested in finding not only exact occurrences, but also occurrences with up to k mismatches. The contribution of this paper is twofold. On one hand, we present a generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m). On the other hand, we show how to employ this idea in conjunction with an index over the text, allowing to search a pattern, with up to k mismatches, in time proportional to its length. This novel tool-rNA (randomized Numerical Aligner) outperforms available tools like SOAP2, BWA, and BOWTIE, processing up to 10 times more patterns per second on texts of (practically) significant lengths.
Given a d-dimensional array A with N entries, the Range Minimum Query (RMQ) asks for the minimum element within a contiguous subarray of A The 1D RMQ problem has been studied intensively because of its relevance to th...
详细信息
ISBN:
(纸本)9780898717013
Given a d-dimensional array A with N entries, the Range Minimum Query (RMQ) asks for the minimum element within a contiguous subarray of A The 1D RMQ problem has been studied intensively because of its relevance to the Nearest Common Ancestor problem and its important use in stringology. If constant-tune query answering is required, linear time and space preprocessing algorithms were known for the ID case, but not for the higher dimensional cases In this paper, we give the first linear-time preprocessing algorithm for arrays with fixed dimension, such that any range minimum query can be answered in constant time
暂无评论