检索结果-内蒙古大学图书馆

LCS approximation via embedding into locally non-repetitive strings

INFORMATION AND COMPUTATION 2011年第4期209卷 705-716页

作者： Landau, G. M. Levy, A. Newman, I. Shenkar Coll Dept Software Engn Ramat Gan Israel Univ Haifa Dept Comp Sci IL-31905 Haifa Israel Univ Haifa CRI IL-31905 Haifa Israel NYU Poly MetroTech Ctr 6 Dept Comp Sci & Engn Brooklyn NY 11201 USA

A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least n(epsilon) for some constant epsilon > 0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity 0(n(2-epsilon)) for some constant epsilon) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools. (C) 2011 Elsevier Inc. All rights reserved.

关键词： string algorithms LCS approximation Embedding

来源：评论

学校读者我要写书评

暂无评论

Fully Compressed Suffix Trees

引用

ACM TRANSACTIONS ON algorithms 2011年第4期7卷 1–34页

作者： Russo, Luis M. S. Navarro, Gonzalo Oliveira, Arlindo L. INESC ID P-1000029 Lisbon Portugal Univ Tecn Lisboa Inst Super Tecn P-1049001 Lisbon Portugal Univ Chile Dept Comp Sci Santiago Chile

Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Theta(nlogn) bits of space, for a string of size n. This is considerably more than the nlog(2) sigma bits needed for the string itself, where s is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Theta(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when s is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Theta(n)bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.

关键词： Text processing pattern matching string algorithms suffix tree data compression compressed index

来源：评论

学校读者我要写书评

暂无评论

Finding Shuffle Words That Represent Optimal Scheduling of Shared Memory Access

Finding Shuffle Words That Represent Optimal Scheduling of S...

引用

5th International Conference on Language and Automata Theory and Applications

作者： Reidenbach, Daniel Schmid, Markus L. Univ Loughborough Dept Comp Sci Loughborough LE11 3TU Leics England

ISBN: (纸本)9783642212536

In the present paper, we introduce and study the problem of computing, for any given finite set of words, a shuffle word with a minimum so-called scope coincidence degree. The scope coincidence degree is the maximum number of different symbols that parenthesise any position in the shuffle word. This problem is motivated by an application of a new automaton model and can be regarded as the problem of scheduling shared memory accesses of some parallel processes in a way that minimises the number of memory cells required. We investigate the complexity of this problem and show that it can be solved in polynomial time.

关键词： string algorithms Shuffle Memory Access Scheduling

来源：评论

学校读者我要写书评

暂无评论

Finding a longest open reading frame of an alternatively spliced gene

Finding a longest open reading frame of an alternatively spl...

引用

IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

作者： Moore, Neil Jaromczyk, Jerzy Wl. Univ Kentucky Dept Comp Sci Lexington KY 40506 USA

ISBN: (纸本)9781457716133

This paper provides a deterministic algorithm for finding a longest open reading frame (ORF) among all alternative splicings of a given DNA sequence. Finding protein encoding regions is a fundamental problem in genomic DNA sequence analysis and long ORFs generally provide good predictions of such regions. Although the number of splice variants is exponential in the number of optionally spliced regions, we are able to in many cases obtain quadratic or even linear performance. This efficiency is achieved by limiting the size of the search space for potential ORFs: by properly pruning the search space we can reduce the number of frames considered at any one time while guaranteeing that a longest open reading frame must be among the considered frames.

关键词： string algorithms protein coding regions open reading frames (ORF) alternative splicing

来源：评论

学校读者我要写书评

暂无评论

Finding the maximum suffix with fewer comparisons

引用

JOURNAL OF DISCRETE algorithms 2011年第3期9卷 279-286页

作者： Franceschini, Gianni Hagerup, Torben Univ Roma La Sapienza Dipartimento Informat Rome Italy Univ Augsburg Inst Informat D-86135 Augsburg Germany

It is shown how to compute the lexicographically maximum suffix of a string of n >= 2 characters over a totally ordered alphabet using at most (4/3) n - 5/3 three-way character comparisons. The best previous bound, which has stood unchallenged for more than 25 years, is (3/2) n - O(1) comparisons. We also prove an interesting property of an algorithm for computing the maximum suffix both with respect to a total order < and with respect to its inverse order >. (C) 2011 Elsevier B.V. All rights reserved.

关键词： string algorithms Maximum suffix Character comparisons

来源：评论

学校读者我要写书评

暂无评论

A filtering algorithm for k-mismatch with don't cares

引用

INFORMATION PROCESSING LETTERS 2010年第22期110卷 1021-1025页

作者： Clifford, Raphael Porat, Ely Univ Bristol Dept Comp Sci Bristol BS8 1UB Avon England Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel

We present a filtering based algorithm for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Theta(nm(1/3)k(1/3) log(2/3) m) time. (C) 2010 Elsevier B.V. All rights reserved.

关键词： algorithms Design of algorithms Combinatorial problems string algorithms Don't cares Pattern matching Filtering Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

Pattern matching with don't cares and few errors

引用

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 2010年第2期76卷 115-124页

作者： Clifford, Raphael Efremenko, Klim Porat, Ely Rothschild, Amir Univ Bristol Dept Comp Sci Bristol BS8 1UB Avon England Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel Weizmann Inst Sci IL-76100 Rehovot Israel

We present solutions for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give a Theta (n(k+log m log k) log n) time randomised algorithm which finds the correct answer with high probability. We then present a new deterministic Theta(nk(2) log(2) M) time solution that uses tools originally developed for group testing. Taking our derandomisation approach further we develop an approach based on k-selectors that runs in Theta(nk polylog m) time. Further, in each case the location of the mismatches at each alignment is also given at no extra cost. (C) 2009 Elsevier Inc. All rights reserved.

关键词： Pattern matching string algorithms Randomised algorithms Group testing

来源：评论

学校读者我要写书评

暂无评论

Weighted LCS

引用

JOURNAL OF DISCRETE algorithms 2010年第3期8卷 273-281页

作者： Amir, Amihood Gotthilf, Zvi Shalom, B. Riva Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel Johns Hopkins Univ Dept Comp Sci Baltimore MD 21218 USA

The Longest Common Subsequence (LCS) of two strings A, B is a well studied problem having a wide range of applications. When each symbol of the input strings is assigned a positive weight the problem becomes the Heaviest Common Subsequence (HCS) problem. In this paper we consider a different version of weighted LCS on Position Weight Matrices (PWM). The Position Weight Matrix was introduced as a tool to handle a set of sequences that are not identical, yet, have many local similarities. Such a weighted sequence is a 'statistical image' of this set where we are given the probability of every symbol's occurrence at every text location. We consider two possible definitions of LCS on PWM. For the first, we solve the LCS problem of z sequences in time O(zn(z+ 1)). For the second, we consider the log-probability version of the problem, prove NP-hardness and provide an approximation algorithm. (C) 2010 Elsevier B.V. All rights reserved.

关键词： string algorithms Approximation algorithms NP-hard problem Position weight matrix

来源：评论

学校读者我要写书评

暂无评论

Finding the Maximum Suffix with Fewer Comparisons

Finding the Maximum Suffix with Fewer Comparisons

引用

7th International Conference on algorithms and Complexity

作者： Franceschini, Gianni Hagerup, Torben Univ Roma La Sapienza Dipartimento Informat Rome Italy Univ Augsburg Inst Informat Augsburg Germany

ISBN: (纸本)9783642130724

It is shown how to compute the lexicographically maximum suffix of a string of n >= 2 characters over a totally ordered alphabet using at most (4/3)n - 5/3 three-way character comparisons. The best previous hot id. which has stood unchallenged for more than 25 years, is (3/2)n - O(1) comparisons. We also prove an interesting property of an algorithm for computing the maximum suffix both with respect to a total order < and with respect to its inverse order >.

关键词： string algorithms maximum suffix character comparisons

来源：评论

学校读者我要写书评

暂无评论

Searching for Jumbled Patterns in strings

Searching for Jumbled Patterns in Strings

引用

14th Prague stringology Conference (PSC)

作者： Cicalese, Ferdinando Fici, Gabriele Liptak, Zsuzsanna Univ Salerno Dipartimento Informat & Applicaz Salerno Italy

ISBN: (纸本)9788001044032

The Parikh vector of a string s over a finite ordered alphabet Sigma = {a1 , ...., a sigma} is defined as the vector of multiplicities of the characters, i.e. p(s) = (p(1) , ...., p(sigma)), where p(i) = vertical bar{j vertical bar s(j) = a(i)}vertical bar. Parikh vector q occurs in s if s has a substring 1 with AI) = q. The problem of searching for a query q in a text s of length ti can be solved simply and optimally with a sliding window approach in O(n) time. We present two new algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm finds all occurrences of a given Parikh vector in a text (over a fixed alphaheL of size sigma >= 2) t-trid appeaN to have a uh lirii ar eXpected time COrriplexity. The second algorithm only decides whether a given Parikh vector appears in a binary text;it iteratively constructs a linear size data structure which then allows answering queries in constant time, for many queries even during the construction phase.

关键词： Parikh vectors permuted strings pattern matching string algorithms average case analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：