Calculating the length l of a longest common subsequence (LCS) of two strings, A of length n and B of length m, is a classic research topic, with many known worst-case oriented results. We present three algorithms for...
详细信息
Calculating the length l of a longest common subsequence (LCS) of two strings, A of length n and B of length m, is a classic research topic, with many known worst-case oriented results. We present three algorithms for LCS length calculation with respectively O(mn 1g 1g n/ lg(2) n), O(mn/ lg(2) n + r) and O(n + r) time complexity, where the second one works for r = o(mn/(lg n lg lg n)), and the third one for r = Theta(mn/ lg(k) n), for a real constant 1 <= k <= 3, and l = O(n/(lg(k-1) n(lg lg n)(2))), where r is the number of matches in the dynamicprogramming matrix. We also describe conditions for a given problem sufficient to apply our techniques, with several concrete examples presented, namely the edit distance, the longest common transposition-invariant subsequence (LCTS) and the merged longest common subsequence (MerLCS) problems. (C) 2015 Elsevier B.V. All rights reserved.
Calculating the length of a longest common subsequence (LCS) of two strings, A of length n and B of length m, is a classic research topic, with many worstcase oriented results known. We present two algorithms for LCS ...
详细信息
We show how to chain maximal exact matches (MEMs) between a query string Q and a labeled directed acyclic graph (DAG) G = (V, E) to solve the longest common subsequence (LCS) problem between Q and G. We obtain our res...
详细信息
ISBN:
(纸本)9783031439797;9783031439803
We show how to chain maximal exact matches (MEMs) between a query string Q and a labeled directed acyclic graph (DAG) G = (V, E) to solve the longest common subsequence (LCS) problem between Q and G. We obtain our result via a new symmetric formulation of chaining in DAGs that we solve in O(m + n + k(2)|V| + |E| + kN log N) time, where m = |Q|, n is the total length of node labels, k is the minimum number of paths covering the nodes of G and N is the number of MEMs between Q and node labels, which we show encode full MEMs.
Given two strings of lengths m and n, with m <= n, the longest common subsequence problem consists of computing a common subsequence of maximum length by deleting symbols from both strings. While the O(mn) algorith...
详细信息
Given two strings of lengths m and n, with m <= n, the longest common subsequence problem consists of computing a common subsequence of maximum length by deleting symbols from both strings. While the O(mn) algorithm devised in 1974 is optimal in the most general setting, algorithms that depend on parameters other than m and n have been proposed since then. In the word RAM model, let w be the word size, s be the alphabet size, d be the number of dominant symbol matches between the strings, and p be the length of the longest common subsequence. Fast algorithms for this problem have complexities O(mn/ log n), O(mn/w), O(ns + min(p(n - p), pm)), O(n logs + d log log min(d, mn/d)), O(ns+min(ds, pm)), and O(ns+s!2s+d logs). In this work, we present an O(n(s+log & lowast;n)+ min(d logs, pm)) algorithm when s is an element of O(w), and also an O(n(s + log & lowast;n) + d) algorithm when s <= w which uses bitwise instructions that became recently available in modern processors.
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols ...
详细信息
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of the major concepts known from LCS research (dynamicprogramming, sparse dynamic programming, tabulation). Our algorithms make use of a linear-time and linear-space preprocessing finding the occurrences of all the substrings of length k from one sequence in the other sequence. (C) 2014 Elsevier B.V. All rights reserved.
Given a set of weighted hyper-rectangles in a k-dimensional space, the chaining problem is to identify a set of colinear and non-overlapping hyper-rectangles of total maximal weight. This problem is used in a number o...
详细信息
ISBN:
(纸本)9783642156458
Given a set of weighted hyper-rectangles in a k-dimensional space, the chaining problem is to identify a set of colinear and non-overlapping hyper-rectangles of total maximal weight. This problem is used in a number of applications in bioinformatics, string processing, and VLSI design. In this paper, we present parallel versions of the chaining algorithm for bioinformatics applications, running on multi-core and computer cluster architectures. Furthermore, we present experimental results of our implementations on both architectures.
The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent ...
详细信息
The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time O(r . height(F) . height(G) . lg lg vertical bar G vertical bar), where r is the number of pairs (upsilon is an element of F, omega is an element of G) such that upsilon and omega) have the same label. Our second algorithm runs in time O(Lr lg r . lg lg vertical bar G vertical bar), where L is the size of the LCS of F and G. For this algorithm we present a novel three-dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an O(rh lg lg vertical bar G vertical bar) time algorithm, where h = height(F) + height(G). (C) 2009 Elsevier B.V. All rights reserved.
We develop efficient dynamicprogramming algorithms for pattern matching with general gaps and character classes. We consider patterns of the form p(0) g(a(0),b(0))p(1)g(a(1),b(1)) ... p(m-1), where p(i) subset of Sig...
详细信息
We develop efficient dynamicprogramming algorithms for pattern matching with general gaps and character classes. We consider patterns of the form p(0) g(a(0),b(0))p(1)g(a(1),b(1)) ... p(m-1), where p(i) subset of Sigma, Sigma is some finite alphabet, and g(a(i) ,b(i)) denotes a gap of length a(i) ...b(i) between symbols p(i) and p(i+1). The text symbol t(j) matches p(i) iff t(j) is an element of p(i) . Moreover, we require that if p(i) matches t(j) , then p(i+1) should match one of the text symbols t(j+ai+1) ... t(j+bi+1). Either or both of a(i) and b (i) can be negative. We also consider transposition invariant matching, i.e., the matching condition becomes t(j) is an element of p(i) + tau, for some constant tau determined by the algorithms. We give algorithms that have efficient average and worst case running times. The algorithms have important applications in music information retrieval and computational biology. We give experimental results showing that the algorithms work well in practice.
We propose new algorithms for (delta, gamma, alpha)-matching. In this string matching problem we axe given a pattern P = p(0)p(1)...p(m-1) and a text T = t(0)t(1)...t(n-1) over some integer alphabet Sigma = {0...sigma...
详细信息
We propose new algorithms for (delta, gamma, alpha)-matching. In this string matching problem we axe given a pattern P = p(0)p(1)...p(m-1) and a text T = t(0)t(1)...t(n-1) over some integer alphabet Sigma = {0...sigma - 1}. The pattern symbol p(i) delta-matches the text symbol t(j) iff vertical bar p(i) - t(j)vertical bar <= delta. The pattern P (delta, gamma)-matches some text substring t(j)... t(j+m-1) iff for all i it holds that vertical bar p(i) - t(j+1)vertical bar <= delta and Sigma vertical bar p(i) - t(j+i)vertical bar <= gamma. Finally, in (delta, gamma, alpha)-matching we also permit at most alpha-symbol gaps between each matching text symbol. The only known previous algorithm runs in O(nm) time. We give several algorithms that improve the average case up to O(n) for small alpha, and the worst case to O(min{nm, vertical bar M vertical bar alpha}) or O(nm log(gamma)/w), where M = {(i, j) vertical bar vertical bar p(i) - t(j)vertical bar <= delta} and w is the number of bits in a machine word. The proposed algorithms can be easily modified to solve several other related problems, we explicitly consider e.g. character classes (instead of delta-matching), (Delta-limited) k-mismatches (instead of gamma-matching) and more general gaps, including negative ones. These find important applications in computational biology. We conclude with experimental results showing that the algorithms are very efficient in practice.
A peak is a pair of real values (x, y), where x is the time when peak of height y is registered. In the peak alignment problem, we are given two sequences of peaks, and our task is to align the sequences allowing some...
详细信息
A peak is a pair of real values (x, y), where x is the time when peak of height y is registered. In the peak alignment problem, we are given two sequences of peaks, and our task is to align the sequences allowing some basic edit operations on the peaks. We study an instance of the peak alignment problem that arises in the analysis of Mass Spectrometry data in Systems Biology. There the measurement technique guarantees that two peaks (x, y), (x', y') can only be considered the same if x is close enough toe, and y is close enough to y'. We review some methods to do alignment under such restrictions on matches. (C) 2007 Elsevier B.V. All rights reserved.
暂无评论