A factorization of a string w is a partition of w into substrings u(1),..., u(k) such that w = u(1)u(2) ... u(k). Such a partition is called equality-free if no two factors are equal: u(i) not equal u(j), for all i, j...
详细信息
ISBN:
(纸本)9783030389192;9783030389185
A factorization of a string w is a partition of w into substrings u(1),..., u(k) such that w = u(1)u(2) ... u(k). Such a partition is called equality-free if no two factors are equal: u(i) not equal u(j), for all i, j with i not equal j. The maximum equality-free factorization problem is to decide, for a given string w and integer k, whether w admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their application in DNA self-assembly. Condon et al. (CPM 2012) study a version of the problem and show that it is NP-complete to decide if there exists an equality-free factorization with an upper bound on the length of the factors. At STACS 2015, Fernau et al. show that the maximum equality-free factorization problem with a lower bound on the number of factors is NP-complete. Shortly after, Schmid (CiE 2015) presents results concerning the Fixed Parameter Tractability of the problems. In this paper we approach equality free factorizations from a practical point of view i.e. we wish to obtain good solutions on given instances. To this end, we provide approximation algorithms, heuristics, Integer Programming models, an improved FPT algorithm and we also conduct experiments to analyze the performance of our proposed algorithms. Additionally, we study a relaxed version of the problem where gaps are allowed between factors and we design a constant factor approximation algorithm for this case. Surprisingly, after extensive experiments we conjecture that the relaxed problem has the same optimum as the original.
A factorization f(1),..., f(m) of a string w of length n is called a repetition factorization of w if f(i) is a repetition, i.e., f(i) is a form of x(k) x' where x is a non-empty string, x' is a (possibly-empt...
详细信息
ISBN:
(纸本)9783031721991;9783031722004
A factorization f(1),..., f(m) of a string w of length n is called a repetition factorization of w if f(i) is a repetition, i.e., f(i) is a form of x(k) x' where x is a non-empty string, x' is a (possibly-empty) proper prefix of x, and k >= 2. Dumitran et al. [SPIRE 2015] presented an O(n)-time and space algorithm for computing an arbitrary repetition factorization of a given string of length n. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length n in O(n) time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in O(n log n) time.
string factorization is an important tool for partitioning data for parallel processing and other algorithmic techniques often found in the context of big data applications such as bioinformatics or compression. Duval...
详细信息
ISBN:
(纸本)9783030581114;9783030581121
string factorization is an important tool for partitioning data for parallel processing and other algorithmic techniques often found in the context of big data applications such as bioinformatics or compression. Duval's well-known algorithm uniquely factors a string over an ordered alphabet into Lyndon words, i.e., patterned strings which are strictly smaller than all of their cyclic rotations. While Duval's algorithm produces a pre-determined factorization, modern applications motivate the demand for factorizations with specific properties, e.g., those that minimize the number of factors or consist of factors with similar lengths. In this paper, we consider the problem of finding an alphabet ordering that yields a Lyndon factorization with such properties. We introduce a flexible evolutionary framework and evaluate it on biological sequence data. For the minimization case, we also propose a new problem-specific heuristic, Flexi-Duval, and a problem-specific mutation operator for Lyndon factorization. Our results show that our framework is competitive with Flexi-Duval for minimization and yields high quality and robust solutions for balancing where no problem-specific algorithm is available.
暂无评论