Large-scale high-speed URL matching is a key operation in many network security systems and surveillance applications in Wireless Sensor Networks. Classic stringmatching algorithms are unsuitable for large-scale URL ...
详细信息
Large-scale high-speed URL matching is a key operation in many network security systems and surveillance applications in Wireless Sensor Networks. Classic stringmatching algorithms are unsuitable for large-scale URL filtering due to speed or memory consumption. This paper proposes an extend Wu-Manber algorithm (XWM) which takes advantage of the encoding characteristics of the URL greatly to improve the matching performance of the algorithm. It first adopts the pattern string window selection method to optimize Wu-Manber's hash process, and then combines hash tables and associative containers to optimize the string comparison process. The experimental results on actual 10 million patterns show that XWM can achieve speeds that are twice as fast as traditional algorithms, especially when the shortest pattern string length is longer, it is more advantageous.
multi-string matching is one of the most important components in data mining task. New applications in many technology fields require high performance stringmatching algorithms. This paper first presents a new string...
详细信息
ISBN:
(纸本)9780769539232
multi-string matching is one of the most important components in data mining task. New applications in many technology fields require high performance stringmatching algorithms. This paper first presents a new string searching approach based on a data structure called prefix tree. The innovative algorithm eliminates the functional overlap of the table HASH and Prefix Function. Then we make a little improvement on the prefix tree and present a second algorithm that is faster and more space-saving. It is demonstrated analytically that the two algorithms inherit the optimality and are very competitive in practice. On tests of both real life and synthetic data, our algorithms are also efficient and especially effective for various string pattern and large alphabet sets.
暂无评论