This paper presents a boyer-moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new a...
详细信息
This paper presents a boyer-moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions-a generalization of the boyer-moore and Commentz-Walter algorithms. Like the boyer-moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter's d(1) and d(2) shift functions. In certain cases, the boyer-moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (respectively Aho-Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaracs (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280-294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho-Corasick algorithm. (C) 2003 Elsevier B.V. All rights reserved.
Quantum leap matching is introduced as a generic pattern matching strategy for the single keyword exact pattern matching problem, that can be used on top of existing boyer-moore-style string matching algorithms. The c...
详细信息
ISBN:
(纸本)9788001057872
Quantum leap matching is introduced as a generic pattern matching strategy for the single keyword exact pattern matching problem, that can be used on top of existing boyer-moore-style string matching algorithms. The cost of the technique is minimal: an additional shift table (of one dimension, for shifts in the opposite direction to the parent algorithm's shifts), and the replacement of a simple table lookup assignment statement in the original algorithm with a similar conditional assignment. Together with each of the conventional shift table lookups, the additional shift table is typically also indexed on the text character that is at a distance of z away from the current sliding window. Under conditions that are identified, the returned values from the two shift tables allow a "quantum leap" of distance more than the length of the keyword for the next matching attempt. If the conditions are not met, then there is a fall back is to the traditional shift. Quick Search (by Sunday) is used as a case study to illustrate the technique. The performance of the derived "Quantum Leap Quick Search" algorithm is compared against Quick Search. When searching for shorter patterns over natural language and genomic texts, the technique improves on Quick Search's time for most values of z. Improvements are also sometimes seen for various values of z on larger patterns. Most interestingly, under best case conditions it performs, on average, at about three times faster than Quick Search.
暂无评论