检索结果-内蒙古大学图书馆

Faster space-efficient STR-IC-LCS computation

THEORETICAL COMPUTER SCIENCE 2024年 1003卷

作者： Yonemoto, Yuki Nakashima, Yuto Inenaga, Shunsuke Bannai, Hideo Kyushu Univ Dept Informat Sci & Technol Fukuoka Japan Kyushu Univ Dept Informat Fukuoka Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Tokyo Japan

One of the most fundamental method for comparing two given strings A and B is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of A and B . In this paper, we deal with the STR-IC-LCS 1 problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string Z is said to be an STR-IC-LCS of three given strings A , B , and P , if Z is a longest string satisfying that (1) Z includes P as a substring and (2) Z is a common subsequence of A and B . We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in O ( n 2 ) time and O ((e + 1)( n - e + 1)) space, where e is the length of an LCS of A and B of length n . When e = O (1) or n - e = O (1), then this algorithm uses only linear O ( n ) space. Second, we present a faster algorithm that works in O ( nr / log r + n ( n - e+ 1)) time, where r is the length of P , while retaining the O ((e + 1)( n - e + 1)) space efficiency. Third, we give an alternative algorithm that runs in O ( nr / log r + n ( n - e ' +1)) time with O ((e ' + 1)( n - e ' + 1)) space, where e ' denotes the STR-IC-LCS length for input strings A , B , and P .

关键词： string algorithms Constrained longest common subsequence Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets

引用

THEORETICAL COMPUTER SCIENCE 2023年第1期973卷

作者： Fujishige, Yuta Tsujimaru, Yuki Inenaga, Shunsuke Bannai, Hideo Takeda, Masayuki Kyushu Univ Dept Informat Fukuoka Japan Fujistu Ltd Tokyo Japan Kyushu Univ Dept Elect Engn & Comp Sci Fukuoka Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Tokyo Japan

The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O (n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O (n) time for integer alphabets of polynomial size in n. In so doing, we first describe a folklore algorithm which, given the suffix tree for y, constructs the DAWG for the reversed string y in O (n) time. Then, we present our algorithm that builds the DAWG for y in O (n) time for integer alphabets, from the suffix tree for y. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O (n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our O (n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words (MAWs) of y can be computed in optimal, input- and output-sensitive O(n + |MAW(y)|) time and O (n) working space for integer alphabets.

关键词： string algorithms DAWGs Suffix trees Affix trees CDAWGs Minimal absent words

来源：评论

学校读者我要写书评

暂无评论

Finding top-k longest palindromes in substrings

引用

THEORETICAL COMPUTER SCIENCE 2023年 979卷

作者： Mitani, Kazuki Mieno, Takuya Seto, Kazuhisa Horiyama, Takashi Hokkaido Univ Grad Sch Informat Sci & Technol Kita 14Nishi 9Kita Ku Sapporo 0600814 Japan Univ Electrocommun Dept Comp & Network Engn 1-5-1 Chofugaoka Chofu 1828585 Japan Hokkaido Univ Fac Informat Sci & Technol Kita 14Nishi 9Kita Ku Sapporo 0600814 Japan

Palindromes are strings that read the same forward and backward. Problems of computing palindromic structures in strings have been studied for many years with the motivation of their application to biology. The longest palindrome problem is one of the most important and classical problems regarding palindromic structures, that is, to compute the longest palindrome appearing in a string T of length n. The problem can be solved in O(n) time by the famous algorithm of Manacher (1975) [27]. This paper generalizes the longest palindrome problem to the problem of finding the top -k longest palindromes in an arbitrary substring, including the input string T itself. The internal top -k longest palindrome query is, given a substring T[i..j] of T and a positive integer k as a query, to compute the top -k longest palindromes appearing in T[i..j]. This paper proposes a linear-size data structure that can answer internal top -k longest palindromes query in optimal O(k) time. Also, given the input string T, our data structure can be constructed in O(n log n) time. For k =1, the construction time is reduced to O(n).(c) 2023 Elsevier B.V. All rights reserved.

关键词： string algorithms Palindromes Internal queries Top k queries

来源：评论

学校读者我要写书评

暂无评论

All-pairs suffix/prefix in optimal time using Aho-Corasick space

引用

INFORMATION PROCESSING LETTERS 2022年 178卷

作者： Loukides, Grigorios Pissis, Solon P. Kings Coll London Dept Informat London England CWI Amsterdam Netherlands Vrije Univ Amsterdam Netherlands

The all-pairs suffix/prefix (APSP) problem is a classic problem in computer science with many applications in bioinformatics. Given a set {S1, ..., Sk} of k strings of total length n, we are asked to find, for each string Si, i & ISIN;[1, k], its longest suffix that is a prefix of string Sj, for all j =? i, j & ISIN;[1, k]. Several algorithms running in the optimal O(n + k2) time for solving APSP are known. All of these algorithms are based on suffix sorting and thus require space S2(n) in any case. We consider the parameterized version of the APSP problem, denoted by t-APSP, in which we are asked to output only the pairs whose suffix/prefix overlap is of length at least t. We give an algorithm for solving t-APSP that runs in the optimal O(n + |OUTPUTt|) time using O(n) space, where OUTPUTt is the set of output pairs. Our algorithm is thus optimal for the APSP problem as well by setting t = 0. Notably, our algorithm is fundamentally different from all optimal algorithms solving the APSP problem: it does not rely on sorting the suffixes of all input strings but on a novel traversal of the Aho-Corasick machine, and it thus requires space linear in the size of the machine.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).

关键词： algorithms Data structures string algorithms Aho-Corasick machine Failure transition tree

来源：评论

学校读者我要写书评

暂无评论

Fully Compressed Suffix Trees

引用

ACM TRANSACTIONS ON algorithms 2011年第4期7卷 53-53页

作者： Russo, Luis M. S. Navarro, Gonzalo Oliveira, Arlindo L. INESC ID P-1000029 Lisbon Portugal Univ Tecn Lisboa Inst Super Tecn P-1049001 Lisbon Portugal Univ Chile Dept Comp Sci Santiago Chile

Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Theta(nlogn) bits of space, for a string of size n. This is considerably more than the nlog(2) sigma bits needed for the string itself, where s is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Theta(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when s is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Theta(n)bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.

关键词： Text processing pattern matching string algorithms suffix tree data compression compressed index

来源：评论

学校读者我要写书评

暂无评论

On-line construction of position heaps

引用

JOURNAL OF DISCRETE algorithms 2013年 20卷 3-11页

作者： Kucherov, Gregory Univ Paris Est CNRS Lab Informat Gaspard Monge 5 Bd Descartes F-77454 Marne La Vallee France Ben Gurion Univ Negev Dept Comp Sci Beer Sheva Israel

We propose a simple linear- time on- line algorithm for constructing a position heap for a string (Ehrenfeucht et al., 2011 [8]). Our definition of position heap differs slightly from the one proposed in Ehrenfeucht et al. (2011) [8] in that it considers the suffixes ordered in the descending order of length. Our construction is based on classic suffix pointers and resembles Ukkonen's algorithm for suffix trees (Ukkonen, 1995 [17]). Using suffix pointers, the position heap can be extended into the augmented position heap that allows for a linear- time string matching algorithm (Ehrenfeucht et al., 2011 [8]). (C) 2012 Elsevier B.V. All rights reserved.

关键词： string algorithms Data structures Text index Position heap

来源：评论

学校读者我要写书评

暂无评论

algorithms FOR APPROXIMATE K-COVERING OF stringS

引用

International Journal of Foundations of Computer Science 2005年第6期16卷 1231-1251页

作者： LILI ZHANG F. BLANCHET-SADRI Department of Mathematical Sciences University of North Carolina P. O. Box 26170 Greensboro North Carolina 27402–6170 United States Department of Mathematical Sciences University of North Carolina P.O. Box 26170 Greensboro North Carolina 27402–6170 United States

Computing approximate patterns in strings or sequences has important applications in DNA sequence analysis, data compression, musical text analysis, and so on. In this paper, we introduce approximate k-covers and study them under various commonly used distance measures. We propose the following problem: "Given a string x of length n, a set U of m strings of length k, and a distance measure, compute the minimum number t such that U is a set of approximate k-covers for x with distance t". To solve this problem, we present three algorithms with time complexity O(km(n - k)), O(mn 2 ) and O(mn 2 ) under Hamming, Levenshtein and edit distance, respectively. A World Wide Web server interface has been established at for automated use of the programs.

关键词： strings k-Covers Approximate k-covers Distance measures string algorithms Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Computing the Burrows-Wheeler transform in place and in small space

引用

JOURNAL OF DISCRETE algorithms 2015年 32卷 44-52页

作者： Crochemore, Maxime Grossi, Roberto Karkkainen, Juha Landau, Gad M. Kings Coll London London WC2R 2LS England Univ Pisa Dipartimento Informat I-56100 Pisa Italy Univ Helsinki Dept Comp Sci FIN-00014 Helsinki Finland Univ Haifa Dept Comp Sci IL-31999 Haifa Israel NYU Poly Dept Comp Sci & Engn Brooklyn NY USA

We introduce the problem of computing the Burrows-Wheeler Transform (BWT) using small additional space. Our in-place algorithm does not need the explicit storage for the suffix sort array and the output array, as typically required in previous work. It relies on the combinatorial properties of the BWT, and runs in O(n(2)) time in the comparison model using O(1) extra memory cells, apart from the array of n cells storing the n characters of the input text. We then discuss the time-space trade-off when O(***(k)) extra memory cells are allowed with sigma(k) distinct characters, providing an O((n(2)/k + n) log k)-time algorithm to obtain (and invert) the BWT. For example in real systems where the alphabet size is a constant, for any arbitrarily small c > 0, the BWT of a text of n bytes can be computed in O(n epsilon(-1) log n) time using just epsilon n extra bytes. (C) 2015 Elsevier B.V. All rights reserved.

关键词： Burrows-Wheeler transform In-place algorithms string algorithms Suffix sorting

来源：评论

学校读者我要写书评

暂无评论

Near real-time suffix tree construction via the fringe marked ancestor problem

引用

JOURNAL OF DISCRETE algorithms 2013年 18卷 32-48页

作者： Breslauer, Dany Italiano, Giuseppe F. Univ Haifa Caesarea Rothschild Inst Interdisciplinary Applic Haifa Israel Univ Roma Tor Vergata Dipartimento Informat Sistemi & Prod Rome Italy

We contribute a further step towards the plausible real-time construction of suffix trees by presenting an on-line algorithm that spends only O(log logn) time processing each input symbol and takes O(n log logn) time in total, where n is the length of the input text. Our results improve on a previously published algorithm that takes O(logn) time per symbol and O(n logn) time in total. The improvements are obtained by adapting Weiner's suffix tree construction algorithm to use a new data structure for the fringe marked ancestor problem, a special case of the nearest marked ancestor problem, which may be of independent interest. (C) 2012 Elsevier B. V. All rights reserved.

关键词： string algorithms Suffix trees

来源：评论

学校读者我要写书评

暂无评论

Heuristic Algorithm for Generalized Function Matching

引用

Procedia Computer Science 2019年 159卷 1397-1405页

作者： Radu Stefan Mincu Department of Computer Science University of Bucharest Bucharest Romania

The problem of generalized function matching can be defined as follows: given a pattern p = p 1 ⋯ p m and a text t = t 1 ⋯ t n , find a mapping f : ∑ p →∑ t ⁎ ; and all text locations i such that f(p 1 )f(p 2 ) ⋯ f(p m )=t i ⋯ t j , a substring of t . By modifying the restrictions of the matching function f , one can obtain different matching problems, many of which have important applications. When f : ∑ p → ∑ t we are faced with problems found in the well-established field of combinatorial pattern matching. If the single character constraint is lifted and f : ∑ p →∑ t ⁎ we obtain generalized function matching as introduced by Amir and Nor (JDA 2007). If we further constrain f to be injective, then we arrive at generalized parametrized matching as defined by Clifford etal. (SPIRE 2009). There are a number of important applications for pattern matching in computational biology, text editors and data compression, to name a few. Therefore, many efficient algorithms have been developed for a wide variety of specific problems including finding tandem repeats in DNA sequences, optimizing embedded systems by reusing code etc. In this work we present a heuristic algorithm illustrating a practical approach to tackling a variant of generalized function matching where f : ∑ p → ∑ t + and demonstrate its performance on human-produced text as well as random strings.

关键词： string algorithms pattern matching heuristics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：