版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Pretoria Dept Comp Sci ZA-0002 Pretoria South Africa Eindhoven Univ Technol Dept Comp Sci NL-5600 MB Eindhoven Netherlands
出 版 物:《SCIENCE OF COMPUTER PROGRAMMING》 (计算机程序设计科学)
年 卷 期:2003年第48卷第2-3期
页 面:99-117页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:string pattern matching regular expressions Boyer-Moore algorithms Commentz-Walter algorithms algorithm generalizations
摘 要:This paper presents a Boyer-Moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions-a generalization of the Boyer-Moore and Commentz-Walter algorithms. Like the Boyer-Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter s d(1) and d(2) shift functions. In certain cases, the Boyer-Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (respectively Aho-Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaracs (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280-294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho-Corasick algorithm. (C) 2003 Elsevier B.V. All rights reserved.