Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the scor...
详细信息
Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min(t is an element of U){d(A + t, B)}, where A + t = (a(1) + t)(a(2) + t)...(a(m) + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common subsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance. (c) 2004 Elsevier Inc. All rights reserved.
We present a sparse dynamic programming algorithm that, given two strings s and t, a gap penalty l, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(...
详细信息
We present a sparse dynamic programming algorithm that, given two strings s and t, a gap penalty l, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(p vertical bar M vertical bar log vertical bar t vertical bar), where M = {(i,j)vertical bar s(i) = t(j)} is the set of matches of characters in the two sequences. The algorithm is easily adapted to handle bounded length subsequences and different gap-penalty schemes, including penalizing by the total length of gaps and the number of gaps as well as incorporating character-specific match/gap penalties. The new algorithm is empirically evaluated against a full dynamicprogramming approach and a trie-based algorithm both on synthetic and newswire article data. Based on the experiments, the full dynamicprogramming approach is the fastest on short strings, and on long strings if the alphabet is small. On large alphabets, the new sparse dynamic programming algorithm is the most efficient. On medium-sized alphabets the trie-based approach is best if the maximum number of allowed gaps is strongly restricted.
Given a pattern string P = p1p2 ... pm and K parallel text strings T = {T-k = t(1)(k) ... t(n)(k) |1 0 such that P can be split into kappa pieces P = P-1 ... P-kappa, where each P-i has an occurrence in some text tra...
详细信息
Given a pattern string P = p1p2 ... pm and K parallel text strings T = {T-k = t(1)(k) ... t(n)(k) |1 <= k <= K} over an integer alphabet S, our task is to find the smallest integer kappa > 0 such that P can be split into kappa pieces P = P-1 ... P-kappa, where each P-i has an occurrence in some text track T-ki and these partial occurrences retain the order. We study some variations of this minimum splitting problem, such as splittings with limited gaps and transposition invariance, and show how to use sparse dynamic programming to solve the variations efficiently. In particular, we show that the minimum splitting problem can be interpreted as a shortest path problem on line segments. (C) 2004 Elsevier B.V. All rights reserved.
Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that tre...
详细信息
Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using different methods and that the resulting trees should be compared for consensus. It has become necessary to automate this process as the number of species under consideration has grown. We study one formalization of the problem: the maximum agreement-subtree (MAST) problem. The MAST problem is as follows: given a set A and two rooted trees T-0 and T-1 leaf-labeled by the elements of A, find a maximum-cardinality subset B of A such that the topological restrictions of T-0 and T-1 to B are isomorphic. In this paper, we will show that this problem reduces to unary weighted bipartite matching (UWBM) with an O(n(1+o(1))) additive overhead. We also show that UWBM reduces linearly to MAST. Thus our algorithm is optimal unless UWBM can be solved in near linear time. The overall running time of our algorithm is O(n(1.5)log n), improving on the previous best algorithm, which runs in O(n(2)). We also derive an O(nc(root log n))-time algorithm for the case of bounded degrees, whereas the previously best algorithm runs in O(n(2)), as in the unbounded case.
In the constructive programming community it is commonplace to see formal developments of textbook algorithms. In the algorithm design community, on the other hand, it may be well known that the textbook solution to a...
详细信息
In the constructive programming community it is commonplace to see formal developments of textbook algorithms. In the algorithm design community, on the other hand, it may be well known that the textbook solution to a problem is not the most efficient possible. However, in presenting the more efficient solution, the algorithm designer will usually omit some of the implementation details, thus creating an algorithm gap between the abstract algorithm and its concrete implementation. This is in contrast to the formal development, which usually proceeds all the way to the complete concrete implementation of the less efficient solution. We claim that the algorithm designer is forced to omit some of the details by the relative expressive poverty of the Pascal-like languages typically used to present the solution. The greater expressiveness provided by a functional language would allow the whole story to be told in a reasonable amount of space. In this paper we use a functional language to present the development of a sophisticated algorithm all the way to the final code. We hope to bridge the algorithm gap between abstract and concrete implementations, and thereby facilitate communication between the constructive programming and algorithm design communities. (C) 1999 Elsevier Science B.V. All rights reserved.
Here we reported a new mothod for large-scale sequence *** it,we could get a approximate but accurate enough global or local alignment *** the method,sparse dynamic programming was used to refine the alignment space,t...
详细信息
Here we reported a new mothod for large-scale sequence *** it,we could get a approximate but accurate enough global or local alignment *** the method,sparse dynamic programming was used to refine the alignment space,thus, computational time is *** also used hashing techenique to search for short gapfree alignment fragments,which is the basic ensemble of sparsedynamic *** examples has been aligned by the program by our method.
暂无评论