检索结果-内蒙古大学图书馆

I/O-efficient data structures for non-overlapping indexing

THEORETICAL COMPUTER SCIENCE 2021年 857卷 1-7页

作者： Hooshmand, Sahar Abedin, Paniz Kulekci, M. Oguzhan Thankachan, Sharma V. Univ Cent Florida Dept Comp Sci Orlando FL 32816 USA Istanbul Tech Univ Informat Inst Istanbul Turkey

The non-overlapping indexing problem is defined as follows: pre-process a given text T[1, n] of length n into a data structure such that whenever a pattern P [1, m] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best-known solution is by Cohen and Porat [ISAAC 2009]. The size of their structure is O (n) words and the query time is optimal O (m + nocc), where nocc is the output size. Later, Ganguly et al. [CPM 2015 and Algorithmica 2020] proposed a compressed space solution. We study this problem in the cache-oblivious model and present a new data structure of size O (n log n) words. It can answer queries in optimal O (m/B + log(B) n + nocc/B) I/O operations, where B is the block size. The space can be improved to O (n log(M/B) n) in the cache-aware model, where M is the size of main memory. Additionally, we study a generalization of this problem with an additional range [s, e] constraint. Here the task is to report the largest set of non-overlapping occurrences of P in T, that are within the range [s, e]. We present an O (n log(2) n) space data structure in the cache-aware model that can answer queries in optimal O (m/B + log(B) n + nocc([s,e]) B ) I/O operations, where nocc([s,e]) is the output size. (c) 2020 Elsevier B.V. All rights reserved.

关键词： Suffix trees Data structure string algorithms

来源：评论

学校读者我要写书评

暂无评论

Computing longest palindromic substring after single-character or block-wise edits

引用

THEORETICAL COMPUTER SCIENCE 2021年 859卷 116-133页

作者： Funakoshi, Mitsuru Nakashima, Yuto Inenaga, Shunsuke Bannai, Hideo Takeda, Masayuki Kyushu Univ Dept Informat Fukuoka Japan Japan Sci & Technol Agcy PRESTO Saitama Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Tokyo Japan

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. It is known that the length of the longest palindromic substrings (LPSs) of a given string T of length n can be computed in O (n) time by Manacher's algorithm [12]. In this paper, we consider the problem of finding the LPS after the string is edited. We present an algorithm that uses O (n) time and space for preprocessing, and answers the length of the LPSs in O(log(min{sigma, log n})) time after a single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O (n) time and space for preprocessing, and answers the length of the LPSs in O (l + log log n) time, after an existing substring in T is replaced by a string of arbitrary length l. (C) 2021 Elsevier B.V. All rights reserved.

关键词： Palindromes string algorithms Periodicity

来源：评论

学校读者我要写书评

暂无评论

Efficient computation of longest single-arm-gapped palindromes in a string

引用

THEORETICAL COMPUTER SCIENCE 2020年 812卷 160-173页

作者： Narisada, Shintaro Hendrian, Diptarama Narisawa, Kazuyuki Inenaga, Shunsuke Shinohara, Ayumi Tohoku Univ Grad Sch Informat Sci Sendai Miyagi Japan Kyushu Univ Dept Informat Fukuoka Japan

In this paper, we introduce new types of approximate palindromes called single-arm-gapped palindromes(shortly SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either wgucu(R)w(R) or wucu(R)gw(R), where w and u are non-empty strings, w(R) and u(R) are respectively the reversed strings of wand u, g is a string called a gap, and c is either a single character or the empty string. Here we call wu and u(R) w(R) the arm of the SAGP, and vertical bar uv vertical bar the length of the arm. We classify SAGPs into two groups: those which have ucu(R) as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute type-1 SAGPs with longest arms occurring in a given string, based on suffix arrays. Then, we propose a linear-time algorithm to compute all type-1 SAGPs with longest arms, based on suffix trees. Also, we show how to compute type-2 SAGPs with longest arms in linear time. We also perform some preliminary experiments to show practical performances of the proposed methods. (C) 2019 Elsevier B.V. All rights reserved.

关键词： string algorithms Palindromes Suffix trees Suffix arrays

来源：评论

学校读者我要写书评

暂无评论

Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets

引用

THEORETICAL COMPUTER SCIENCE 2023年第1期973卷

作者： Fujishige, Yuta Tsujimaru, Yuki Inenaga, Shunsuke Bannai, Hideo Takeda, Masayuki Kyushu Univ Dept Informat Fukuoka Japan Fujistu Ltd Tokyo Japan Kyushu Univ Dept Elect Engn & Comp Sci Fukuoka Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Tokyo Japan

The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O (n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O (n) time for integer alphabets of polynomial size in n. In so doing, we first describe a folklore algorithm which, given the suffix tree for y, constructs the DAWG for the reversed string y in O (n) time. Then, we present our algorithm that builds the DAWG for y in O (n) time for integer alphabets, from the suffix tree for y. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O (n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our O (n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words (MAWs) of y can be computed in optimal, input- and output-sensitive O(n + |MAW(y)|) time and O (n) working space for integer alphabets.

关键词： string algorithms DAWGs Suffix trees Affix trees CDAWGs Minimal absent words

来源：评论

学校读者我要写书评

暂无评论

Finding top-k longest palindromes in substrings

引用

THEORETICAL COMPUTER SCIENCE 2023年 979卷

作者： Mitani, Kazuki Mieno, Takuya Seto, Kazuhisa Horiyama, Takashi Hokkaido Univ Grad Sch Informat Sci & Technol Kita 14Nishi 9Kita Ku Sapporo 0600814 Japan Univ Electrocommun Dept Comp & Network Engn 1-5-1 Chofugaoka Chofu 1828585 Japan Hokkaido Univ Fac Informat Sci & Technol Kita 14Nishi 9Kita Ku Sapporo 0600814 Japan

Palindromes are strings that read the same forward and backward. Problems of computing palindromic structures in strings have been studied for many years with the motivation of their application to biology. The longest palindrome problem is one of the most important and classical problems regarding palindromic structures, that is, to compute the longest palindrome appearing in a string T of length n. The problem can be solved in O(n) time by the famous algorithm of Manacher (1975) [27]. This paper generalizes the longest palindrome problem to the problem of finding the top -k longest palindromes in an arbitrary substring, including the input string T itself. The internal top -k longest palindrome query is, given a substring T[i..j] of T and a positive integer k as a query, to compute the top -k longest palindromes appearing in T[i..j]. This paper proposes a linear-size data structure that can answer internal top -k longest palindromes query in optimal O(k) time. Also, given the input string T, our data structure can be constructed in O(n log n) time. For k =1, the construction time is reduced to O(n).(c) 2023 Elsevier B.V. All rights reserved.

关键词： string algorithms Palindromes Internal queries Top k queries

来源：评论

学校读者我要写书评

暂无评论

A Linear-Time n^0.4-Approximation for Longest Common Subsequence

引用

ACM TRANSACTIONS ON algorithms 2023年第1期19卷 9-9页

作者： Bringmann, Karl Cohen-Addad, Vincent Das, Debarati Saarland Univ Saarland Informat Campus E1 3 D-66123 Saarbrucken Germany Max Planck Inst Informat Saarland Informat Campus E1 3 D-66123 Saarbrucken Germany Sorbonne Univ LIP6 CNRS UPMC Univ Paris 06 Paris France Univ Copenhagen Basic AlgorithmRes Copenhagen BARC Copenhagen Denmark

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. The 40-year-old quadratic-time dynamic programming algorithm has recently been shown to be near-optimal by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Kunnemann [FOCS'15] assuming the Strong Exponential Time Hypothesis. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive O(n(epsilon/2))-approximation algorithm with running time ($) over tilde (n(2-epsilon)) has been known, for any constant 0 < epsilon <= 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n(0.497956))-approximation in expectation;improving upon the naive O(root n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n(2-epsilon)) computes an <($)over tilde>(n(2 epsilon/5))-approximation with high probability, for any 0 < epsilon <= 1. Our result (1) gives an <($)over tilde>(n(0.4))-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n(2-epsilon)), improving upon the naive bound of O(n(epsilon/2)) for any epsilon, and (3) instead of only in expectation, succeeds with high probability.

关键词： Longest common subsequence string algorithms approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard 38

Finding an Optimal Alphabet Ordering for Lyndon Factorizatio...

引用

38th International Symposium on Theoretical Aspects of Computer Science (STACS)

作者： Gibney, Daniel Thankachan, Sharma, V Univ Cent Florida Dept Comp Sci Orlando FL 32816 USA

ISBN: (纸本)9783959771801

This work establishes several strong hardness results on the problem of finding an ordering on a string's alphabet that either minimizes or maximizes the number of factors in that string's Lyndon factorization. In doing so, we demonstrate that these ordering problems are sufficiently complex to model a wide variety of ordering constraint satisfaction problems (OCSPs). Based on this, we prove that (i) the decision versions of both the minimization and maximization problems are NP-complete, (ii) for both the minimization and maximization problems there does not exist a constant approximation algorithm running in polynomial time under the Unique Game Conjecture and (iii) there does not exist an algorithm to solve the minimization problem in time poly(vertical bar T vertical bar) . 2 degrees((sigma log sigma)) for a string T over an alphabet of size sigma under the Exponential Time Hypothesis (essentially the brute force approach of trying every alphabet order is hard to improve significantly).

关键词： Lyndon Factorization string algorithms Burrows-Wheeler Transform

来源：评论

学校读者我要写书评

暂无评论

The Maximum Equality-Free string Factorization Problem: Gaps vs. No Gaps 46th

The Maximum Equality-Free String Factorization Problem: Gaps...

引用

46th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM)

作者： Mincu, Radu Stefan Popa, Alexandru Univ Bucharest Dept Comp Sci Bucharest Romania Natl Inst Res & Dev Informat Bucharest Romania

ISBN: (纸本)9783030389192;9783030389185

A factorization of a string w is a partition of w into substrings u(1),..., u(k) such that w = u(1)u(2) ... u(k). Such a partition is called equality-free if no two factors are equal: u(i) not equal u(j), for all i, j with i not equal j. The maximum equality-free factorization problem is to decide, for a given string w and integer k, whether w admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their application in DNA self-assembly. Condon et al. (CPM 2012) study a version of the problem and show that it is NP-complete to decide if there exists an equality-free factorization with an upper bound on the length of the factors. At STACS 2015, Fernau et al. show that the maximum equality-free factorization problem with a lower bound on the number of factors is NP-complete. Shortly after, Schmid (CiE 2015) presents results concerning the Fixed Parameter Tractability of the problems. In this paper we approach equality free factorizations from a practical point of view i.e. we wish to obtain good solutions on given instances. To this end, we provide approximation algorithms, heuristics, Integer Programming models, an improved FPT algorithm and we also conduct experiments to analyze the performance of our proposed algorithms. Additionally, we study a relaxed version of the problem where gaps are allowed between factors and we design a constant factor approximation algorithm for this case. Surprisingly, after extensive experiments we conjecture that the relaxed problem has the same optimum as the original.

关键词： string factorization Equality-free string algorithms Heuristics

来源：评论

学校读者我要写书评

暂无评论

All-pairs suffix/prefix in optimal time using Aho-Corasick space

引用

INFORMATION PROCESSING LETTERS 2022年 178卷

作者： Loukides, Grigorios Pissis, Solon P. Kings Coll London Dept Informat London England CWI Amsterdam Netherlands Vrije Univ Amsterdam Netherlands

The all-pairs suffix/prefix (APSP) problem is a classic problem in computer science with many applications in bioinformatics. Given a set {S1, ..., Sk} of k strings of total length n, we are asked to find, for each string Si, i & ISIN;[1, k], its longest suffix that is a prefix of string Sj, for all j =? i, j & ISIN;[1, k]. Several algorithms running in the optimal O(n + k2) time for solving APSP are known. All of these algorithms are based on suffix sorting and thus require space S2(n) in any case. We consider the parameterized version of the APSP problem, denoted by t-APSP, in which we are asked to output only the pairs whose suffix/prefix overlap is of length at least t. We give an algorithm for solving t-APSP that runs in the optimal O(n + |OUTPUTt|) time using O(n) space, where OUTPUTt is the set of output pairs. Our algorithm is thus optimal for the APSP problem as well by setting t = 0. Notably, our algorithm is fundamentally different from all optimal algorithms solving the APSP problem: it does not rely on sorting the suffixes of all input strings but on a novel traversal of the Aho-Corasick machine, and it thus requires space linear in the size of the machine.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).

关键词： algorithms Data structures string algorithms Aho-Corasick machine Failure transition tree

来源：评论

学校读者我要写书评

暂无评论

Palindromic trees for a sliding window and its applications

引用

INFORMATION PROCESSING LETTERS 2022年 173卷

作者： Mieno, Takuya Watanabe, Kiichi Nakashima, Yuto Inenaga, Shunsuke Bannai, Hideo Takeda, Masayuki Kyushu Univ Dept Informat Fukuoka Japan Japan Soc Promot Sci Tokyo Japan Japan Sci & Technol Agcy PRESTO Kawaguchi Saitama Japan Tokyo Med & Dent Univ M&D Data Sci Ctr Bunkyo City Japan

The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is known that, when S is over an alphabet of size sigma and is given in an online manner, then the palindromic tree of S can be constructed in O(n log sigma) time with O(n) space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most d, we present two versions of an algorithm which maintains the palindromic tree of size O(d) for every sliding window S[i..j] over S, where 1 <= j - i + <= d. The first version works in O(n log sigma') time with O(d) space where sigma' <= d is the maximum number of distinct characters in the windows, and the second one works in O(n + d sigma) time with (d + 2)sigma + O(d) space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window. (C) 2021 The Author(s). Published by Elsevier B.V.

关键词： string algorithms Data structures Palindromes Sliding window

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：