Large-scale high-speed URL matching is a key operation in many network security systems and surveillance applications in Wireless Sensor Networks. Classic string matching algorithms are unsuitable for large-scale URL ...
详细信息
Large-scale high-speed URL matching is a key operation in many network security systems and surveillance applications in Wireless Sensor Networks. Classic string matching algorithms are unsuitable for large-scale URL filtering due to speed or memory consumption. This paper proposes an extend wu-manber algorithm (XWM) which takes advantage of the encoding characteristics of the URL greatly to improve the matching performance of the algorithm. It first adopts the pattern string window selection method to optimize wu-manber's hash process, and then combines hash tables and associative containers to optimize the string comparison process. The experimental results on actual 10 million patterns show that XWM can achieve speeds that are twice as fast as traditional algorithms, especially when the shortest pattern string length is longer, it is more advantageous.
In this study, the authors propose a new multi-pattern matching algorithm, called BLAST (B-LAyered bad-character Shift Tables with a single-byte search unit), which considers space-time tradeoff in the context of shif...
详细信息
In this study, the authors propose a new multi-pattern matching algorithm, called BLAST (B-LAyered bad-character Shift Tables with a single-byte search unit), which considers space-time tradeoff in the context of shift values during the search. Here, the term bad character' is a character that causes a mismatch. While checking multiple bytes in scanning the text at a time, the BLAST algorithm overcomes the reduction of the average shift value in a typical search, which is caused by the dependency on the multi-byte search unit (MBSU) and the large frequency of the last character of the given patterns. From the theoretical analysis, the authors validate the correctness of the BLAST algorithm. Also, from the experimental results across different setups, the authors show that the BLAST algorithm provides the faster search time than the other algorithms. For example, the authors obtain an enhancement by as much as 212.41% on average for various numbers of attack patterns and attack traffic conditions compared with that of the modified wu-manber algorithm. In addition, it is shown that the BLAST algorithm drastically reduces the amount of memory required for constructing the shift table based on a MBSU from 64 KB to 1 KB.
A new algorithm based on the wu-manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances i...
详细信息
A new algorithm based on the wu-manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances in an aggressive manner. After each test, the algorithm examines the character next to the scan window to maximize the shift distance. This idea is consistent with that of the quick-search (QS) algorithm. Experimental results on four alphabets show that the new algorithm is more efficient than wu-manber and other recent algorithms, particularly on short pattern sets and large alphabet. (C) 2009 Elsevier B.V. All rights reserved.
In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or andwu-manber algorithms. ...
详细信息
In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or andwu-manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit (GPU);we achieve this speed-up here because inclusive-scan operations not only eliminate duplicate searches between threads but also realize a GPU-friendly memory access pattern that maximizes memory read/write throughput. To realize our ideas, we first define two binary operators and then present a proof regarding the associativity of these operators, which is necessary for the parallelization of the inclusive-scan operations. Finally, we integrate the inclusive-scan scheme into a previous segmentation-based scheme to maximize search throughput, identifying the best tradeoff point between synchronization cost and duplicate work. Through our experiments, we compared our proposed method with previous segmentation-based methods and indexing-based sequence aligners. For online string matching, our proposed method performed 6.7-16.7 times faster than previous methods, achieving a search throughput of up to 1.88 terabits per second (Tbps) on a GeForce GTX TITAN X GPU. We therefore conclude that our proposed method is quite effective for decreasing the runtimes of online string matching of short patterns.
Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and i...
详细信息
Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typically encodes non-deterministic finite automaton (NFA) states or value differences between adjacent cells of a dynamic programming matrix in the form of bit arrays. wu-manber (WM) is a well-known bit-parallel APM algorithm, which simulates an NFA and gains parallel efficiency by performing multiple state updates within a machine word. An important parameter is the machine word size (e.g. 32 or 64 bits for CPUs). Due to increasing vector capabilities, efficient mapping of bit-parallel APM algorithms onto modern high performance computing architectures is an interesting research topic. Prominent examples are Xeon Phi coprocessors and CUDA-enabled GPUs, which provide words of size 512 bits (by means of vector registers) and 1024 bits (by means of warps), respectively. In this paper, we investigate mappings of the WM algorithm onto these two accelerator types. Both architectures are able to achieve around two orders-of-magnitude speedups compared to a single-threaded CPU implementation. Moreover, our tile-based implementation on a GeForce Titan graphics card runs up to 2.9 x faster than our implementation on an Intel Xeon Phi 5110P. Source code is available at http://***. (C) 2015 Elsevier B.V. All rights reserved.
Multi-pattern matching algorithms are broadly used in many fields of computer science. However, the performance of the existing algorithms seriously degrades with the increasing of the number of patterns. In this pape...
详细信息
ISBN:
(纸本)9781479974344
Multi-pattern matching algorithms are broadly used in many fields of computer science. However, the performance of the existing algorithms seriously degrades with the increasing of the number of patterns. In this paper, an improved multi-pattern matching algorithm based on the framework of the wu-manber (WM) algorithm is proposed to effectively deal with the large pattern sets. The WM algorithm is improved in two aspects. Firstly, the lengths of lists in the HASH table are balanced to reduce the number of candidate patterns;Secondly, a data structure called the "INDEX table" based on binary search is designed to reduce the time for finding candidate patterns. Experimental results show that our algorithm is efficient for large-scale pattern sets.
The exact string matching algorithms are among the important study topics in computer science due to their various applications in many fields such as medicine, bioinformatics, and biology. New algorithms have been de...
详细信息
The exact string matching algorithms are among the important study topics in computer science due to their various applications in many fields such as medicine, bioinformatics, and biology. New algorithms have been developed recently, and the string matching on the text has been accelerated. The string matching algorithms are divided into two parts, single and multiple. . The string matching algorithms are divided into two parts, single and multiple. The multiple exact string matching algorithms involve finding d number patterns (P) in a given text T. In this study, the wu-manber algorithm, one of the hash-based multiple exact string matching algorithms, is discussed. Although the wu-manber algorithm is effective, it has some limitations, such as hash collisions. In our study, a new approach has is proposed for these limitations. In the proposed approach, unlike the traditional wu-manber algorithm, the searching in the sequences is performed by q-gram hash comparison, using the hash function that removes hash collisions in DNA sequences. The proposed approach has been compared with the multiple exact string matching algorithms with the well-known algorithms in the literature on E. Coli and Human Chromosome1 datasets. As a result of the experimental studies, better results have been achieved in terms of performance metrics such as the average runtime, the average number of character and hash comparisons in the proposed approach compared to the wu-manber algorithm. Also, the proposed approach is shown to be more efficient than well-known algorithms, such as Aho Corasick (AC) and Commentz Walter (CW).
暂无评论