It is shown how to compute the lexicographically maximum suffix of a string of n >= 2 characters over a totally ordered alphabet using at most (4/3) n - 5/3 three-way character comparisons. The best previous bound,...
详细信息
It is shown how to compute the lexicographically maximum suffix of a string of n >= 2 characters over a totally ordered alphabet using at most (4/3) n - 5/3 three-way character comparisons. The best previous bound, which has stood unchallenged for more than 25 years, is (3/2) n - O(1) comparisons. We also prove an interesting property of an algorithm for computing the maximum suffix both with respect to a total order < and with respect to its inverse order >. (C) 2011 Elsevier B.V. All rights reserved.
The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is kn...
详细信息
The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is known that, when S is over an alphabet of size sigma and is given in an online manner, then the palindromic tree of S can be constructed in O(n log sigma) time with O(n) space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most d, we present two versions of an algorithm which maintains the palindromic tree of size O(d) for every sliding window S[i..j] over S, where 1 <= j - i + <= d. The first version works in O(n log sigma') time with O(d) space where sigma' <= d is the maximum number of distinct characters in the windows, and the second one works in O(n + d sigma) time with (d + 2)sigma + O(d) space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window. (C) 2021 The Author(s). Published by Elsevier B.V.
We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. The 40-year-old quadratic-time dynamic programming algorithm has recently been shown to be near-optimal by ...
详细信息
We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. The 40-year-old quadratic-time dynamic programming algorithm has recently been shown to be near-optimal by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Kunnemann [FOCS'15] assuming the Strong Exponential Time Hypothesis. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive O(n(epsilon/2))-approximation algorithm with running time ($) over tilde (n(2-epsilon)) has been known, for any constant 0 < epsilon <= 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n(0.497956))-approximation in expectation;improving upon the naive O(root n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n(2-epsilon)) computes an <($)over tilde>(n(2 epsilon/5))-approximation with high probability, for any 0 < epsilon <= 1. Our result (1) gives an <($)over tilde>(n(0.4))-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n(2-epsilon)), improving upon the naive bound of O(n(epsilon/2)) for any epsilon, and (3) instead of only in expectation, succeeds with high probability.
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber an...
详细信息
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(n log n) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient. (C) 2004 Elsevier B.V. All rights reserved.
We give an O(nlogn)-time, O(n)-space algorithm for factoring a string into the minimum number of palindromic substrings. Thatis, given a string S[1..n], in O(nlogn) time our algorithm returns the minimum number of pal...
详细信息
We give an O(nlogn)-time, O(n)-space algorithm for factoring a string into the minimum number of palindromic substrings. Thatis, given a string S[1..n], in O(nlogn) time our algorithm returns the minimum number of palindromes S-1,..., S-l such that S = S-1 ,...,S-l Wealso show that the time complexity is O(n) on average and Omega(n logn) in the worst case. The last result is based on a characterization of the palindromic structure of Zimin words. (C) 2014 Published by Elsevier B.V.
We revisit the following version of the Gapped string Indexing problem, where the goal is to preprocess a text T[1..n] to enable efficient reporting of all occ occurrences of a gapped pattern P=P-1[alpha..beta]P-2 in ...
详细信息
We revisit the following version of the Gapped string Indexing problem, where the goal is to preprocess a text T[1..n] to enable efficient reporting of all occ occurrences of a gapped pattern P=P-1[alpha..beta]P-2 in T. An occurrence of P in T is defined as a pair (i,j) where substrings T[i..i+|P-1|) and T[j..j+|P-2|) match P-1 and P-2, respectively, with a gap j-(i+|P-1|) lying within the interval [alpha..beta]. This problem has significant applications in computational biology and text mining. A hardness result on this problem suggests that any index with polylogarithmic query time must occupy near quadratic space. In a recent study [STACS 2024], Bille et al. presented a sub-quadratic space index using space O(n(2-delta/3)), where 0 <=delta <= 1 is a parameter fixed at the time of index construction. Its query time is O(|P-1|+|P-2|+n(delta)center dot(1+occ)), which is sub-linear per occurrence when delta<1. We show how to achieve a gap-sensitive query time of O(|P-1|+|P-2|+n(delta)center dot(1+occ(1-delta)) + Sigma(g is an element of[alpha..beta)]occ(g)center dot g(delta)) using the same space, where occ(g) denotes the number of occurrences with gap g. This is faster when there are many occurrences with small gaps.
The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those g...
详细信息
The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity.
The shortest unique substring (SUS) problem is an active line of research in the field of string algorithms and has several applications in bioinformatics and information retrieval. The initial version of the problem ...
详细信息
The shortest unique substring (SUS) problem is an active line of research in the field of string algorithms and has several applications in bioinformatics and information retrieval. The initial version of the problem was proposed by Pei et al. [ICDE'13]. Over the years, many variants and extensions have been pursued, which include positional-SUS, interval-SUS, approximate-SUS, palindromic-SUS, range-SUS, etc. In this article, we highlight some of the key results and summarize the recent developments in this area.
We present two modifications of Duval's algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best fo...
详细信息
We present two modifications of Duval's algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a significant number of characters of the string. Moreover, it can be engineered to have linear time complexity in the worst case. When there is a run-length encoded string R of length rho, the other algorithm computes the Lyndon factorization of R in O (rho) time and in constant space. It is shown by experimental results that the new variations are faster than Duval's original algorithm in many scenarios.
A fundamental problem on strings in the realm of approximate string matching is pattern matching with mismatches: Given a text t, a pattern p, and a number k, determine whether some substring of t has Hamming distance...
详细信息
A fundamental problem on strings in the realm of approximate string matching is pattern matching with mismatches: Given a text t, a pattern p, and a number k, determine whether some substring of t has Hamming distance at most k to p;such a substring is called a k-match. As real-world texts often come in compressed form, we study the case of searching for a small pattern p in a text t that is compressed by a straight-line program. This grammar compression is popular in the string community, since it is mathematically elegant and unifies many practically relevant compression schemes such as the Lempel-Ziv family, dictionary methods, and others. We denote by m the length of p and by n the compressed size of t. While exact pattern matching, that is, the case k = 0, is known to be solvable in near-linear time O(n + m) [Jez TALG'15], despite considerable interest in the string community, the fastest known algorithm for pattern matching with mismatches runs in time O(n√m poly(k)) [Gawrychowski, Straszak ISAAC'13], which is far from linear even for very small k. In this paper, we obtain an algorithm for pattern matching with mismatches running in time O((n + m) poly(k)). This is near-linear in the input size for any constant (or slightly superconstant) k. We obtain analogous running time for counting and enumerating all k-matches. Our algorithm is based on a new structural insight for approximate pattern matching, essentially showing that either the number of k-matches is very small or both text and pattern must be almost periodic. While intuitive and simple for exact matches, such a characterization is surprising when allowing k mismatches.
暂无评论