检索结果-内蒙古大学图书馆

On approximating string selection problems with outliers

THEORETICAL COMPUTER SCIENCE 2013年 498卷 107-114页

作者： Boucher, Christina Landau, Gad M. Levy, Avivit Pritchard, David Weimann, Oren Univ Calif San Diego Dept Comp Sci San Diego CA 92103 USA Univ Haifa Dept Comp Sci IL-31905 Haifa Israel NYU Polytech Inst Brooklyn NY 11201 USA Shenkar Coll Engn & Design IL-52526 Ramat Gan Israel Univ Haifa CRI IL-31905 Haifa Israel Univ Waterloo CEMC Waterloo ON N2L 3G1 Canada

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most strings problem is, given a set S of the same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old erroneous proof made previously in the literature. The Most strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k;we show it has no PTAS unless P = NP. We also observe Closest to k strings has no efficient PTAS (EPTAS) unless a parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult. (C) 2013 Elsevier B.V. All rights reserved.

关键词： string selection string algorithms

来源：评论

学校读者我要写书评

暂无评论

Constructing an indeterminate string from its associated graph

引用

THEORETICAL COMPUTER SCIENCE 2018年 710卷 88-96页

作者： Helling, Joel Ryan, P. J. Smyth, W. F. Soltys, Michael Calif State Univ Channel Isl Dept Comp Sci Camarillo CA 93012 USA McMaster Univ Dept Comp & Software Algorithms Res Grp Hamilton ON Canada Murdoch Univ Sch Engn & Informat Technol Murdoch WA Australia

As discussed at length in Christodoulakis et al. (2015) [3], there is a natural one-many correspondence between simple undirected graphs G with vertex set V = {l, 2,..., n) and indeterminate strings x = x[1..n] - that is, sequences of subsets of some alphabet Sigma. In this paper, given g, we consider the "reverse engineering" problem of computing a corresponding x on an alphabet Sigma(min) of minimum cardinality. This turns out to be equivalent to the NP-hard problem of computing the intersection number of G, thus in turn equivalent to the clique cover problem. We describe a heuristic algorithm that computes an approximation to Sigma(min) and a corresponding x. We give various properties of our algorithm, including some experimental evidence that on average it requires O(n(2) logn) time. We compare it with other heuristics, and state some conjectures and open problems. (C) 2017 Elsevier B.V. All rights reserved.

关键词： string algorithms Indeterminate strings Cliques Graph labeling

来源：评论

学校读者我要写书评

暂无评论

Two-dimensional prefix string matching and covering on square matrices

引用

ALGORITHMICA 1998年第4期20卷 353-373页

作者： Crochemore, M Iliopoulos, CS Korda, M Univ Paris 12 Inst Gaspard Monge F-93160 Noisy Le Grand France Univ London Kings Coll Dept Comp Sci London WC2R 2LS England Curtin Univ Technol Sch Comp Perth WA 6001 Australia

Two linear time algorithms are presented. One for determining, for every position in a given square matrix, the longest prefix of a given pattern (also a square matrix) that occurs at that position and one for computi... 详细信息

关键词： string algorithms pattern matching prefix matching periodicity

来源：评论

学校读者我要写书评

暂无评论

The Greedy Algorithm for the Minimum Common string Partition Problem

引用

ACM TRANSACTIONS ON algorithms 2005年第2期1卷 350-366页

作者： Chrobak, Marek Kolman, Petr Sgall, Jiri Univ Calif Riverside Dept Comp Sci Riverside CA 92521 USA Charles Univ Prague Fac Math & Phys Dept Appl Math CZ-11800 Prague 1 Czech Republic Acad Sci Czech Republ Math Inst CZ-11567 Prague 1 Czech Republic

In the Minimum Common string Partition problem (MCSP), we are given two strings on input, and we wish to partition them into the same collection of substrings, minimizing the number of the substrings in the partition. This problem is NP-hard, even for a special case, denoted 2-MCSP, where each letter occurs at most twice in each input string. We study a greedy algorithm for MCSP that at each step extracts a longest common substring from the given strings. We show that the approximation ratio of this algorithm is between Omega(n(0.43)) and O (n(0)(.6)(9)). In the case of 2-MCSP, we show that the approximation ratio is equal to 3. For 4-MCSP, we give a lower bound of Omega(log n).

关键词： string algorithms approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

string cadences

引用

THEORETICAL COMPUTER SCIENCE 2017年 698卷 4-8页

作者： Amir, Amihood Apostolico, Alberto Gagie, Travis Landau, Gad M. Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel Johns Hopkins Univ Dept Comp Sci Baltimore MD 21218 USA Georgia Inst Technol Sch Computat Sci & Engn Coll Comp Klaus Adv Comp Bldg266 Ferst Dr Atlanta GA 30332 USA Diego Portales Univ Sch Comp Sci & Telecommun Santiago Chile Univ Haifa Dept Comp Sci IL-31905 Haifa Israel NYU Tandon Sch Engn Dept Comp Sci & Engn MetroTech Ctr 2 Brooklyn NY 11201 USA

Cadences are syntactic regularities in strings, of the family of periods, squares, and repetitions. We say a string has a cadence if a certain character is repeated at regular intervals, possibly with intervening occurrences of that character. We call the cadence anchored if the first interval must be the same length as the others. Although cadences' combinatorial properties have been explored, little work was done regarding the efficiency of their discovery. Recently, implementations involving cadences appeared in works on phylogenetic reconstruction, periodic subgraph mining, and monitoring events in computer networks. In this paper we begin a systematic study of the efficiency of finding cadences. We first give some basic definitions;we then give a sub-quadratic algorithm for determining whether a string has any cadence consisting of at least three occurrences of a character, and a nearly linear algorithm for finding all anchored cadences;finally, we propose a data structure that captures many features of cadences and allows for the efficient detection of many types of cadences. In particular, all sub-cadences can be detected and reported in time proportional to the sum of their lengths. (C) 2017 Elsevier B.V. All rights reserved.

关键词： string algorithms string regularities Pattern mining Cadences

来源：评论

学校读者我要写书评

暂无评论

Binary jumbled string matching for highly run-length compressible texts

引用

INFORMATION PROCESSING LETTERS 2013年第17期113卷 604-608页

作者： Badkobeh, Golnaz Fici, Gabriele Kroon, Steve Liptak, Zsuzsanna Kings Coll London Dept Informat London England Univ Palermo Dipartimento Matemat & Informat I-90133 Palermo Italy Univ Stellenbosch Div Comp Sci ZA-7600 Stellenbosch South Africa Univ Verona Dipartimento Informat I-37100 Verona Italy

The Binary Jumbled string Matching Problem is defined as follows: Given a string s over {a, b} of length n and a query (x, y), with x, y non-negative integers, decide whether s has a substring t with exactly x a's and y b's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time O(n(2)/logn) (Burcsi et al., 2010 [1];Moosa and Rahman, 2010 [7]), or O(n(2)/log(2) n) in the word-RAM model (Moosa and Rahman, 2012 [8]). We propose an index constructed directly from the run-length encoding of s. The construction time of our index is O(n + rho(2) log rho), where O(n) is the time for computing the run-length encoding of s and rho is the length of this encoding-this is no worse than previous solutions if rho = O(n/logn) and better if rho = O(n/ logn). Our index L can be queried in O(log rho) time. While vertical bar L vertical bar = O(min(n, rho(2))) in the worst case, preliminary investigations have indicated that vertical bar L vertical bar may often be close to rho. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of s introduced in Fici and Liptak (2011) [6]. (C) 2013 Elsevier B.V. All rights reserved.

关键词： string algorithms Data structures Jumbled pattern matching Parikh vectors Prefix normal form Run-length encoding

来源：评论

学校读者我要写书评

暂无评论

Flipping letters to minimize the support of a string

引用

INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE 2008年第1期19卷 5-17页

作者： Lancia, Giuseppe Rinaldi, Franca Rizzi, Romeo Univ Udine Dipartimento Matemat & Informat I-33100 Udine Italy

Given a string s on an alphabet Sigma, a word-length k and a budget D, we want to determine the smallest number of distinct k-mers that can be left in s, if we are allowed to replace up to D letters of s. This problem has several parameters, and we discuss its complexity under all sorts of restrictions on the parameters values. We prove that some versions of the problem axe polynomial, while others are NP-hard. We also introduce some Integer Programming formulations to model the NP-hard cases.

关键词： De Bruijn graphs string algorithms parameterized complexity

来源：评论

学校读者我要写书评

暂无评论

Repetition Detection in a Dynamic string 27

Repetition Detection in a Dynamic String

引用

27th Annual European Symposium on algorithms (ESA)

作者： Amir, Amihood Boneh, Itai Charalampopoulos, Panagiotis Kondratovsky, Eitan Bar Ilan Univ Dept Comp Sci Ramat Gan Israel Kings Coll London Dept Informat London England Interdisciplinary Ctr Herzliya Efi Arazi Sch Comp Sci Herzliyya Israel

ISBN: (纸本)9783959771245

A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n(o)(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting.

关键词： string algorithms dynamic algorithms squares repetitions runs

来源：评论

学校读者我要写书评

暂无评论

Elastic-Degenerate string Matching with 1 Error 15th

Elastic-Degenerate String Matching with 1 Error

引用

15th Latin American Symposium on Theoretical Informatics

作者： Bernardini, Giulia Gabory, Esteban Pissis, Solon P. Stougie, Leen Sweering, Michelle Zuba, Wiktor Univ Trieste Trieste Italy CWI Amsterdam Netherlands Vrije Univ Amsterdam Netherlands INRIA Erable Villeurbanne France

ISBN: (纸本)9783031206238;9783031206245

An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an (O) over tilde (nm(omega-1)) + O(N)-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where omega denotes the matrix multiplication exponent and the (O) over tilde(center dot) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k(2) mG + kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k = 1, the existing algorithm runs in Omega(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm(2) + N) log m) or O(nm(3) + N) time under edit distance. For the decision version of the problem, we present a faster O(nm(2) root log m + N log log m)-time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.

关键词： string algorithms Approximate string matching Edit distance Degenerate strings Elastic-degenerate strings

来源：评论

学校读者我要写书评

暂无评论

The Maximum Equality-Free string Factorization Problem: Gaps vs. No Gaps 46th

The Maximum Equality-Free String Factorization Problem: Gaps...

引用

46th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM)

作者： Mincu, Radu Stefan Popa, Alexandru Univ Bucharest Dept Comp Sci Bucharest Romania Natl Inst Res & Dev Informat Bucharest Romania

ISBN: (纸本)9783030389192;9783030389185

A factorization of a string w is a partition of w into substrings u(1),..., u(k) such that w = u(1)u(2) ... u(k). Such a partition is called equality-free if no two factors are equal: u(i) not equal u(j), for all i, j with i not equal j. The maximum equality-free factorization problem is to decide, for a given string w and integer k, whether w admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their application in DNA self-assembly. Condon et al. (CPM 2012) study a version of the problem and show that it is NP-complete to decide if there exists an equality-free factorization with an upper bound on the length of the factors. At STACS 2015, Fernau et al. show that the maximum equality-free factorization problem with a lower bound on the number of factors is NP-complete. Shortly after, Schmid (CiE 2015) presents results concerning the Fixed Parameter Tractability of the problems. In this paper we approach equality free factorizations from a practical point of view i.e. we wish to obtain good solutions on given instances. To this end, we provide approximation algorithms, heuristics, Integer Programming models, an improved FPT algorithm and we also conduct experiments to analyze the performance of our proposed algorithms. Additionally, we study a relaxed version of the problem where gaps are allowed between factors and we design a constant factor approximation algorithm for this case. Surprisingly, after extensive experiments we conjecture that the relaxed problem has the same optimum as the original.

关键词： string factorization Equality-free string algorithms Heuristics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：