Given a pattern x of length m and a text y of length n, both over a totally ordered alphabet, the order-preserving pattern matching (OPPM) problem consists in finding all substrings of the text with the same relative ...
详细信息
Given a pattern x of length m and a text y of length n, both over a totally ordered alphabet, the order-preserving pattern matching (OPPM) problem consists in finding all substrings of the text with the same relative order as the pattern. The OPPM problem, which might be viewed as an approximate variant of the well-known exact pattern matching problem, has gained attention in recent years. This interesting problem finds applications in such diverse fields as time series analysis (as share prices on stock markets or weather data analysis) and musical melody matching, just to mention a few. In this paper we present two new filtering approaches which turn out to be much more effective in practice than the previously presented methods, reducing the number of false positives up to 99%. We also present a new efficient approach inspired by the well-known Skip Search algorithm for the exact string matching problem. It makes use of efficient SIMD SSE instructions for speeding up the searching phase. experimental results show that our proposed algorithms are up to twice faster than previous solutions. (C) 2018 Elsevier B.V. All rights reserved.
We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment,...
详细信息
We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as gen eralized string matching.
In this article we present two efficient variants of the BOM string matching algorithm which are more efficient and flexible than the original algorithm. We also present bit-parallel versions of them obtaining an effi...
详细信息
In this article we present two efficient variants of the BOM string matching algorithm which are more efficient and flexible than the original algorithm. We also present bit-parallel versions of them obtaining an efficient variant of the BNDM algorithm. Then we compare the newly presented algorithms with some of the most recent and effective string matching algorithms. It turns out that the new proposed variants are very flexible and achieve very good results, especially in the case of large alphabets.
The disjoint-set data structure is used to maintain a collection of non-overlapping sets of elements from a finite universe. algorithms that operate on this data structure are often referred to as UNION-FIND algorithm...
详细信息
ISBN:
(纸本)9783642131929
The disjoint-set data structure is used to maintain a collection of non-overlapping sets of elements from a finite universe. algorithms that operate on this data structure are often referred to as UNION-FIND algorithms. They are used in numerous practical applications and are also available in several software libraries. This paper presents an extensive experimental study comparing the time required to execute 55 variations of UNION-FIND algorithms. The study includes all the classical algorithms, several recently suggested enhancements, and also different combinations and optimizations of these. Our results clearly show that a somewhat forgotten simple algorithm developed by Rem in 1976 is the fastest, in spite of the fact that its worst-case time complexity is inferior to that of the commonly accepted "best" algorithms.
We present a set of new instances of the maximum weight independent set problem. These instances are derived from a real-world vehicle routing problem and are challenging to solve in part because of their large size. ...
详细信息
Given a pattern x of length m and a text y of length n, both over an ordered alphabet, the order-preserving pattern matching problem consists in finding all substrings of the text with the same relative order as the p...
详细信息
ISBN:
(数字)9783319411682
ISBN:
(纸本)9783319411682;9783319411675
Given a pattern x of length m and a text y of length n, both over an ordered alphabet, the order-preserving pattern matching problem consists in finding all substrings of the text with the same relative order as the pattern. The OPPM, which might be viewed as an approximate variant of the well known exact pattern matching problem, has gained attention in recent years. This interesting problem finds applications in a lot of fields as from time series analysis, like share prices on stock markets or weather data analysis, to musical melody matching. In this paper we present two new filtering approaches which turn out to be much more effective in practice than the previously presented methods by reducing the number of false positives up to 99%. From our experimental results it turns out that our proposed solutions are up to 2 times faster than the previous solutions.
We present a new efficient algorithm for the delta-approximate matching problem with a-bounded gaps. The delta-approximate matching problem, recently introduced in connection with applications in music retrieval, gene...
详细信息
ISBN:
(纸本)3540259201
We present a new efficient algorithm for the delta-approximate matching problem with a-bounded gaps. The delta-approximate matching problem, recently introduced in connection with applications in music retrieval, generalizes the exact string matching problem by relaxing the notion of matching. Here we consider the case in which matchings may contain bounded gaps. An extensive comparison with the only (to our knowledge) other solution existing in the literature for the same problem, due to Crochemore et al., indicates that our algorithm is more efficient, especially in the cases of large alphabets and long patterns. In addition, our algorithm computes the total number of approximate matchings for each position of the text, requiring only O(m alpha)-space, where m, is the length of the pattern.
Bidimensionality theory provides a general framework for developing subexponential fixed parameter algorithms for NP-hard problems. In this framework, to solve an optimization problem in a graph G, the branchwidth bw(...
详细信息
ISBN:
(数字)9783642255915
ISBN:
(纸本)9783642255908
Bidimensionality theory provides a general framework for developing subexponential fixed parameter algorithms for NP-hard problems. In this framework, to solve an optimization problem in a graph G, the branchwidth bw(G) is first computed or estimated. If bw(G) is small then the problem is solved by a branch-decomposition based algorithm which typically runs in polynomial time in the size of G but in exponential time in bw(G). Otherwise, a large bw(G) implies a large grid minor of G and the problem is computed or estimated based on the grid minor. A representative example of such algorithms is the one for the longest path problem in planar graphs. Although many subexponential fixed parameter algorithms have been developed based on bidimensionality theory, little is known on the practical performance of these algorithms. We report a computational study on the practical performance of a bidimensionality theory based algorithm for the longest path problem in planar graphs. The results show that the algorithm is practical for computing/estimating the longest path in a planar graph. The tools developed and data obtained in this study may be useful in other bidimensional algorithm studies.
Multiple exact string matching is one of the fundamental problems in computer science and finds applications in many other fields, among which computational biology and intrusion detection. It turns out that short pat...
详细信息
ISBN:
(纸本)9788001053300
Multiple exact string matching is one of the fundamental problems in computer science and finds applications in many other fields, among which computational biology and intrusion detection. It turns out that short patterns appear in many instances of such problems and, in most cases, sensibly affect the performances of the algorithms. Recent solutions in the field of string matching try to exploit the power of the word RAM model to speed-up the performances of classical algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time. This study presents a first preliminary attempt to develop a filter based exact multiple string matching algorithm for searching set of short patterns by taking benefit from Intel's SSE (streaming SIMD extensions) technology. Our experimental results on small, medium, and large alphabet text files show that the proposed algorithm is competitive in the case of short patterns against other efficient solutions, which are known to be among the fastest in practice.
We present a new efficient algorithm for exact matching in encoded DNA sequences and on binary strings. Our algorithm combines a multi-pattern version of the BNDM algorithm and a simplified version of the COMMENTZ-WAL...
详细信息
ISBN:
(纸本)9783642024405
We present a new efficient algorithm for exact matching in encoded DNA sequences and on binary strings. Our algorithm combines a multi-pattern version of the BNDM algorithm and a simplified version of the COMMENTZ-WALTER algorithm. We performed also experimental comparisons with the most efficient algorithms presented in the literature. experimental results show that the newly presented algorithm outperforms existing solutions in most cases.
暂无评论