We present bounds for the average case of the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore-Horspool (BMH) algorithm for random text. Experimental results in both random and English text suggests that the bou...
详细信息
We study Boyer-Moore-type string searching algorithms. First, we analyze the Horspool's variant. The searching time is linear. An exact expression of the linearity constant is derived and is proven to be asymptoti...
详细信息
ISBN:
(纸本)0898712513
We study Boyer-Moore-type string searching algorithms. First, we analyze the Horspool's variant. The searching time is linear. An exact expression of the linearity constant is derived and is proven to be asymptotically i/c, where c is the cardinality of the alphabet. We exhibit a stationary process and reduce the problem to a word enumeration. The same technique applies to other variants of the Boyer-Moore algorithm. We also study Boyer-Moore automata, a notion that we formalize. This approach appears to be faster than any other known algorithm, in both, the worst and average case number of inspections. A lower bound in the maximal number of states of these automata is presented, and the concept of potential of a transition is introduced to improve the worst and average case behaviour of these machines. We show that looking at the rightmost unknown character, as suggested by Knuth et al, is not necessarily optimal.
作者:
Saleh, TajErgin, Fatma CorutMalkawi, MalekAlhajj, RedaDepartment of Computer Engineering
Marmara University Istanbul Turkey Department of Computer Engineering Istanbul Medipol University Istanbul Turkey Department of Computer Science University of Calgary Alberta Canada Department of Heath Informatics University of Southern Denmark Odense Denmark
string search algorithms play an important role in many research areas such as data mining and bioinformatics. While there exist a number of algorithms that handles the topic, we are exploring the the Knuth-Morris-Pra...
详细信息
The stringsearching task can be classified as a classic information processing task. Users either encounter the solution of this task while working with text processors or browsers, employing standard built-in tools,...
详细信息
string Matching is a technique of searching a pattern in a text. It is the basic concept to extract the fruitful information from large volume of text, which is used in different applications like text processing, inf...
详细信息
Many network security applications rely on string matching to detect intrusions, viruses, spam, and so on. Since software implementation may not keep pace with the high-speed demand, turning to hardware-based solution...
详细信息
Many algorithms. e.g. in the field of string matching, are based on handling many counters, which can be performed in parallel, even on a sequential machine, using bit-parallelism. The recently presented technique of ...
详细信息
ISBN:
(纸本)9783642009815
Many algorithms. e.g. in the field of string matching, are based on handling many counters, which can be performed in parallel, even on a sequential machine, using bit-parallelism. The recently presented technique of nested counters (Matryoshka counters) [1] is to handle small counters most of the time, and refer to larger counters periodically, when the small counters may g et full, to prevent overflow. In this work, we present several non-trivial applications of Matryoshka counters in string matching algorithms, improving their worst- or average-case time complexities. The set of problems comprises (delta, alpha)-matching, matching with k insertions, episode matching, and matching under Levenshtein distance.
Index structures like the suffix tree or the suffix array are of utmost importance in stringology, most notably in exact string matching. In the last decade, research on compressed index structures has flourished beca...
详细信息
ISBN:
(纸本)9783642037832
Index structures like the suffix tree or the suffix array are of utmost importance in stringology, most notably in exact string matching. In the last decade, research on compressed index structures has flourished because the main problem in many applications is the space consumption of the index. It is possible to simulate the matching of a pattern against a suffix tree on an enhanced suffix array by using range minimum queries or the so-called child table. In this paper, we show that the Super-Cartesian tree of the LCP-array (with which the suffix array is enhanced) very naturally explains the child table. More important, however, is the fact that the balanced parentheses representation of this tree constitutes a very natural compressed form of the child table which admits to locate all occ occurrences of pattern P of length m in O(m log vertical bar Sigma vertical bar + occ) time, where Sigma is the underlying alphabet. Our compressed child table uses less space than previous solutions to the problem. An implementation is available.
string matching is a useful concept in pattern recognition that is constantly receiving attention from both theoretical and practical points of view. In this paper we propose a generalized version of the string matchi...
详细信息
In standard string matching, each symbol matches only itself. In other string matching problems, e.g., the string matching with “don’t-cares” problem, a symbol may match several symbols. In general, an arbitrary ma...
详细信息
暂无评论