检索结果-内蒙古大学图书馆

Finding the maximum suffix with fewer comparisons

JOURNAL OF DISCRETE algorithms 2011年第3期9卷 279-286页

作者： Franceschini, Gianni Hagerup, Torben Univ Roma La Sapienza Dipartimento Informat Rome Italy Univ Augsburg Inst Informat D-86135 Augsburg Germany

It is shown how to compute the lexicographically maximum suffix of a string of n >= 2 characters over a totally ordered alphabet using at most (4/3) n - 5/3 three-way character comparisons. The best previous bound, which has stood unchallenged for more than 25 years, is (3/2) n - O(1) comparisons. We also prove an interesting property of an algorithm for computing the maximum suffix both with respect to a total order < and with respect to its inverse order >. (C) 2011 Elsevier B.V. All rights reserved.

关键词： string algorithms Maximum suffix Character comparisons

来源：评论

学校读者我要写书评

暂无评论

Palindromic trees for a sliding window and its applications

引用

INFORMATION PROCESSING LETTERS 2022年 173卷

作者： Mieno, Takuya Watanabe, Kiichi Nakashima, Yuto Inenaga, Shunsuke Bannai, Hideo Takeda, Masayuki Kyushu Univ Dept Informat Fukuoka Japan Japan Soc Promot Sci Tokyo Japan Japan Sci & Technol Agcy PRESTO Kawaguchi Saitama Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Bunkyo City Japan

The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is known that, when S is over an alphabet of size sigma and is given in an online manner, then the palindromic tree of S can be constructed in O(n log sigma) time with O(n) space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most d, we present two versions of an algorithm which maintains the palindromic tree of size O(d) for every sliding window S[i..j] over S, where 1 <= j - i + <= d. The first version works in O(n log sigma') time with O(d) space where sigma' <= d is the maximum number of distinct characters in the windows, and the second one works in O(n + d sigma) time with (d + 2)sigma + O(d) space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window. (C) 2021 The Author(s). Published by Elsevier B.V.

关键词： string algorithms Data structures Palindromes Sliding window

来源：评论

学校读者我要写书评

暂无评论

A Linear-Time n^0.4-Approximation for Longest Common Subsequence

引用

ACM TRANSACTIONS ON algorithms 2023年第1期19卷 9-9页

作者： Bringmann, Karl Cohen-Addad, Vincent Das, Debarati Saarland Univ Saarland Informat Campus E1 3 D-66123 Saarbrucken Germany Max Planck Inst Informat Saarland Informat Campus E1 3 D-66123 Saarbrucken Germany Sorbonne Univ LIP6 CNRS UPMC Univ Paris 06 Paris France Univ Copenhagen Basic AlgorithmRes Copenhagen BARC Copenhagen Denmark

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. The 40-year-old quadratic-time dynamic programming algorithm has recently been shown to be near-optimal by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Kunnemann [FOCS'15] assuming the Strong Exponential Time Hypothesis. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive O(n(epsilon/2))-approximation algorithm with running time ($) over tilde (n(2-epsilon)) has been known, for any constant 0 < epsilon <= 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n(0.497956))-approximation in expectation;improving upon the naive O(root n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n(2-epsilon)) computes an <($)over tilde>(n(2 epsilon/5))-approximation with high probability, for any 0 < epsilon <= 1. Our result (1) gives an <($)over tilde>(n(0.4))-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n(2-epsilon)), improving upon the naive bound of O(n(epsilon/2)) for any epsilon, and (3) instead of only in expectation, succeeds with high probability.

关键词： Longest common subsequence string algorithms approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Space efficient linear time construction of suffix arrays

引用

JOURNAL OF DISCRETE algorithms 2005年第2-4期3卷 143-156页

作者： Ko, Pang Aluru, Srinivas Iowa State Univ Dept Elect & Comp Engn Ames IA 50011 USA Iowa State Univ Laurence H Baker Ctr Bioinformat & Biol Stat Ames IA 50011 USA

We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(n log n) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient. (C) 2004 Elsevier B.V. All rights reserved.

关键词： Computational biology Pattern matching string algorithms Suffix array Suffix sorting

来源：评论

学校读者我要写书评

暂无评论

A subquadratic algorithm for minimum palindromic factorization

引用

JOURNAL OF DISCRETE algorithms 2014年第0期28卷 41-48页

作者： Fici, Gabriele Gagie, Travis Karkkainen, Juha Kempa, Dominik Univ Palermo Dipartimento Matemat & Informat Palermo Italy Univ Helsinki Dept Comp Sci Helsinki Finland

We give an O(nlogn)-time, O(n)-space algorithm for factoring a string into the minimum number of palindromic substrings. Thatis, given a string S[1..n], in O(nlogn) time our algorithm returns the minimum number of palindromes S-1,..., S-l such that S = S-1 ,...,S-l Wealso show that the time complexity is O(n) on average and Omega(n logn) in the worst case. The last result is based on a characterization of the palindromic structure of Zimin words. (C) 2014 Published by Elsevier B.V.

关键词： string algorithms Palindromes Factorization

来源：评论

学校读者我要写书评

暂无评论

Text Indexing for Faster Gapped Pattern Matching

引用

algorithms 2024年第12期17卷 537-537页

作者： Hossen, Md Helal Gibney, Daniel Thankachan, Sharma V. Univ Texas Dallas Dept Comp Sci Richardson TX 75080 USA North Carolina State Univ Coll Engn Dept Comp Sci Raleigh NC 27695 USA

We revisit the following version of the Gapped string Indexing problem, where the goal is to preprocess a text T[1..n] to enable efficient reporting of all occ occurrences of a gapped pattern P=P-1[alpha..beta]P-2 in T. An occurrence of P in T is defined as a pair (i,j) where substrings T[i..i+|P-1|) and T[j..j+|P-2|) match P-1 and P-2, respectively, with a gap j-(i+|P-1|) lying within the interval [alpha..beta]. This problem has significant applications in computational biology and text mining. A hardness result on this problem suggests that any index with polylogarithmic query time must occupy near quadratic space. In a recent study [STACS 2024], Bille et al. presented a sub-quadratic space index using space O(n(2-delta/3)), where 0 <=delta <= 1 is a parameter fixed at the time of index construction. Its query time is O(|P-1|+|P-2|+n(delta)center dot(1+occ)), which is sub-linear per occurrence when delta<1. We show how to achieve a gap-sensitive query time of O(|P-1|+|P-2|+n(delta)center dot(1+occ(1-delta)) + Sigma(g is an element of[alpha..beta)]occ(g)center dot g(delta)) using the same space, where occ(g) denotes the number of occurrences with gap g. This is faster when there are many occurrences with small gaps.

关键词： text indexing string algorithms gapped pattern matching

来源：评论

学校读者我要写书评

暂无评论

Searching and indexing genomic databases via kernelization

引用

FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY 2015年第0期3卷 12-12页

作者： Gagie, Travis Puglisi, Simon J. Univ Helsinki HIIT Helsinki Finland Univ Helsinki Dept Comp Sci POB 68Gustaf Hallstromin Katu 2b FI-00014 Helsinki Finland

The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity.

关键词： approximate pattern matching data compression genomic databases indexing kernelization random-access reading string algorithms

来源：评论

学校读者我要写书评

暂无评论

A Survey on Shortest Unique Substring Queries

引用

algorithms 2020年第9期13卷 224页

作者： Abedin, Paniz Kulekci, M. Oguzhan Thankachan, Shama, V Univ Cent Florida Dept Comp Sci Orlando FL 32816 USA Istanbul Tech Univ Informat Inst TR-34469 Istanbul Turkey

The shortest unique substring (SUS) problem is an active line of research in the field of string algorithms and has several applications in bioinformatics and information retrieval. The initial version of the problem was proposed by Pei et al. [ICDE'13]. Over the years, many variants and extensions have been pursued, which include positional-SUS, interval-SUS, approximate-SUS, palindromic-SUS, range-SUS, etc. In this article, we highlight some of the key results and summarize the recent developments in this area.

关键词： string algorithms shortest unique substring repeats compact data structures

来源：评论

学校读者我要写书评

暂无评论

Lyndon Factorization algorithms for Small Alphabets and Run-Length Encoded strings

引用

algorithms 2019年第6期12卷 124-124页

作者： Ghuman, Sukhpal Singh Giaquinta, Emanuele Tarhio, Jorma Sheridan Coll Fac Appl Sci & Technol 7899 McLaughlin Rd Brampton ON L6Y 5H9 Canada F Secure Corp POB 24 FI-00181 Helsinki Finland Aalto Univ Dept Comp Sci POB 15400 FI-00076 Aalto Finland

We present two modifications of Duval's algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a significant number of characters of the string. Moreover, it can be engineered to have linear time complexity in the worst case. When there is a run-length encoded string R of length rho, the other algorithm computes the Lyndon factorization of R in O (rho) time and in constant space. It is shown by experimental results that the new variations are faster than Duval's original algorithm in many scenarios.

关键词： Lyndon factorization string algorithms run-length encoding

来源：评论

学校读者我要写书评

暂无评论

Few Matches or Almost Periodicity: Faster Pattern Matching with Mismatches in Compressed Texts 19

Few Matches or Almost Periodicity: Faster Pattern Matching w...

引用

Annual ACM-SIAM Symposium on Discrete algorithms

作者： Karl Bringmann Marvin Kiinnemann Philip Wellnitz Saarland Informatics Campus (SIC) Saarbrücken Germany

A fundamental problem on strings in the realm of approximate string matching is pattern matching with mismatches: Given a text t, a pattern p, and a number k, determine whether some substring of t has Hamming distance at most k to p;such a substring is called a k-match. As real-world texts often come in compressed form, we study the case of searching for a small pattern p in a text t that is compressed by a straight-line program. This grammar compression is popular in the string community, since it is mathematically elegant and unifies many practically relevant compression schemes such as the Lempel-Ziv family, dictionary methods, and others. We denote by m the length of p and by n the compressed size of t. While exact pattern matching, that is, the case k = 0, is known to be solvable in near-linear time O(n + m) [Jez TALG'15], despite considerable interest in the string community, the fastest known algorithm for pattern matching with mismatches runs in time O(n√m poly(k)) [Gawrychowski, Straszak ISAAC'13], which is far from linear even for very small k. In this paper, we obtain an algorithm for pattern matching with mismatches running in time O((n + m) poly(k)). This is near-linear in the input size for any constant (or slightly superconstant) k. We obtain analogous running time for counting and enumerating all k-matches. Our algorithm is based on a new structural insight for approximate pattern matching, essentially showing that either the number of k-matches is very small or both text and pattern must be almost periodic. While intuitive and simple for exact matches, such a characterization is surprising when allowing k mismatches.

关键词： stringology string algorithms Approx. pattern matching Grammar compression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：