检索结果-内蒙古大学图书馆

INFORMATION PROCESSING LETTERS 2014年第4期114卷 174-178页

作者： Dhaliwal, Jasbir RMIT Univ Sch Comp Sci & Informat Technol Melbourne Vic Australia

Suffix array (SA) construction is a time-and-memory bottleneck in many string processing applications. In this paper we improve the runtime of a small-space - semi-external - SA construction algorithm by Karkkainen (TCS, 2007) [5]. We achieve a speedup in practice of 2-4 times, without increasing memory usage. Our main contribution is a way to implement the "pointer copying" heuristic, used in less space-efficient SA construction algorithms, in a memory-efficient way. (C) 2013 Elsevier B.V. All rights reserved.

关键词： Data structures Suffix array Burrows-Wheeler transform string algorithms

来源：评论

学校读者我要写书评

暂无评论

A heuristic for computing repeats with a factor oracle: Application to biological sequences

引用

INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS 2002年第12期79卷 1303-1315页

作者： Lefebvre, A Lecroq, T Univ Rouen Fac Sci CNRS ESA 6037 ABISS F-76821 Mont St Aignan France Univ Rouen Fac Sci LIFAR ABISS F-76821 Mont St Aignan France

We present in this article a linear time and space method for the computation of the length of a repeated suffix for each prefix of a given word p . Our method is based on the utilization of the factor oracle of p which is a new and very compact structure introduced in [1], used for representing all the factors of p . We exhibit applications where our method really speeds up the computation of repetitions in words.

关键词： combinatorics on word string algorithms repetitions factor oracle suffix link

来源：评论

学校读者我要写书评

暂无评论

Pattern matching with don't cares and few errors

引用

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 2010年第2期76卷 115-124页

作者： Clifford, Raphael Efremenko, Klim Porat, Ely Rothschild, Amir Univ Bristol Dept Comp Sci Bristol BS8 1UB Avon England Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel Weizmann Inst Sci IL-76100 Rehovot Israel

We present solutions for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give a Theta (n(k+log m log k) log n) time randomised algorithm which finds the correct answer with high probability. We then present a new deterministic Theta(nk(2) log(2) M) time solution that uses tools originally developed for group testing. Taking our derandomisation approach further we develop an approach based on k-selectors that runs in Theta(nk polylog m) time. Further, in each case the location of the mismatches at each alignment is also given at no extra cost. (C) 2009 Elsevier Inc. All rights reserved.

关键词： Pattern matching string algorithms Randomised algorithms Group testing

来源：评论

学校读者我要写书评

暂无评论

PATTERN MATCHING UNDER POLYNOMIAL TRANSFORMATION

引用

SIAM JOURNAL ON COMPUTING 2013年第2期42卷 611-633页

作者： Butman, Ayelet Clifford, Peter Clifford, Raphael Jalsenius, Markus Lewenstein, Noa Porat, Benny Porat, Ely Sach, Benjamin Holon Inst Technol Dept Comp Sci Holon Israel Univ Oxford Dept Stat Oxford OX1 3TG England Univ Bristol Dept Comp Sci Bristol BS8 1UB Avon England Netanya Acad Coll Dept Comp Sci IL-42365 Netanya Israel Bar Ilan Univ Dept Comp Sci IL-52900 Ramat Gan Israel Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England

We consider a class of pattern matching problems where a normalizing polynomial transformation can be applied at every alignment of the pattern and text. Normalized pattern matching plays a key role in fields as diverse as image processing and musical information processing, where application specific transformations are often applied to the input. By considering a wide range of such transformations, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n, where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon > 0, there cannot exist an O(nm(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that needs to be reported. We give a deterministic O(nk log k) time solution, which we then improve by careful use of randomization to O(n root k log k log n) time for sufficiently small k. Our randomized solution outputs the correct answer at every position with high probability.

关键词： string algorithms pattern matching normalization 3SUM-hardness

来源：评论

学校读者我要写书评

暂无评论

Noise-tolerant efficient inductive synthesis of regular expressions from good examples

引用

NEW GENERATION COMPUTING 1997年第1期15卷 105-140页

作者： Brazma, A Cerans, K Institute of Mathematics and Computer Science University of Latvia Riga Latvia

We present an almost linear time method of inductive synthesis restoring simple regular expressions from one representative (good) example. In particular, we consider synthesis of expressions of star-height one, where we allow one union operation under each iteration, and synthesis of expressions without union operations from examples that may contain mistakes. In both cases we provide sufficient conditions defining precisely the class of target expressions and the notion of good examples under which the synthesis algorithm works correctly, and present the proof of correctness. In the case of expressions with unions the proof is based on novel results in the combinatorics of words. A generalized algorithm that can synthesize simple expressions containing unions from noisy examples is implemented as a computer program. Computer experiments show that the algorithm is quite practical and may have applications in genome informatics.

关键词： algorithmic learning string algorithms regular expressions program synthesis computational biology

来源：评论

学校读者我要写书评

暂无评论

Lempel-Ziv index for q-grams

引用

ALGORITHMICA 1998年第1期21卷 137-154页

作者： Karkkainen, J Sutinen, E Univ Helsinki Dept Comp Sci FIN-00014 Helsinki Finland

We present a new sublinear-size index structure for finding all occurrences of a given q-gram in a text. Such a q-gram index is needed in many approximate pattern matching algorithms. All earlier q-gram indexes require at least O(n) space, where n is the length of the text. The new Lempel-Ziv index needs only O(n/log n) space while being as fast as previous methods. The new method takes advantage of repetitions in the text found by Lempel-Ziv parsing.

关键词： q-gram index approximate pattern matching text indexing Lempel-Ziv parsing string algorithms data compression

来源：评论

学校读者我要写书评

暂无评论

Pattern Masking for Dictionary Matching: Theory and Practice

引用

ALGORITHMICA 2024年第6期86卷 1948-1978页

作者： Charalampopoulos, Panagiotis Chen, Huiping Christen, Peter Loukides, Grigorios Pisanti, Nadia Pissis, Solon P. Radoszewski, Jakub Birkbeck Univ London Sch Comp & Math Sci London England Univ Birmingham Sch Comp Sci Birmingham England Australian Natl Univ Canberra Australia Kings Coll London Dept Informat London England Univ Pisa Pisa Italy CWI Amsterdam Netherlands Vrije Univ Amsterdam Netherlands Univ Warsaw Inst Informat Warsaw Poland

Data masking is a common technique for sanitizing sensitive data maintained in database systems which is becoming increasingly important in various application areas, such as in record linkage of personal data. This work formalizes the Pattern Masking for Dictionary Matching (PMDM) problem: given a dictionary D\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathscr {D}$$\end{document} of d strings, each of length l\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell $$\end{document}, a query string q of length l\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell $$\end{document}, and a positive integer z, we are asked to compute a smallest set K subset of{1, horizontal ellipsis ,l}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\subseteq \{1,\ldots ,\ell \}$$\end{document}, so that if q[i] is replaced by a wildcard for all i is an element of K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\in K$$\end{document}, then q matches at least z strings from D\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usep

关键词： string algorithms Dictionary matching Wildcards Record linkage Query term dropping

来源：评论

学校读者我要写书评

暂无评论

A fast algorithm for the generalized k-keyword proximity problem given keyword offsets

引用

INFORMATION PROCESSING LETTERS 2004年第3期91卷 115-120页

作者： Kim, SR Lee, I Park, K Konkuk Univ Div Internet & Media Seoul South Korea Konkuk Univ Ctr Aerosp Syst Integrat Technol Seoul South Korea Seoul Natl Univ Sch Engn & Comp Sci Seoul South Korea

When searching for information on the Web, it is often necessary to use one of the available search engines. Because the number of results are quite large for most queries, we need some measure of relevance with respect to the query. One of the most important relevance factors is the proximity score, i.e., how close the keywords appear together in a given document. A basic proximity score is given by the size of the smallest range containing all the keywords in the query. We generalize the proximity score to include many practically important cases and present an O(n log k)-time algorithm for the generalized problem, where k is the number of keywords and n is the number of occurrences of the keywords in a document. (C) 2004 Elsevier B.V. All rights reserved.

关键词： combinatorial problems design of algorithms string algorithms

来源：评论

学校读者我要写书评

暂无评论

Internal shortest absent word queries in constant time and linear space

引用

THEORETICAL COMPUTER SCIENCE 2022年 922卷 271-282页

作者： Badkobeh, Golnaz Charalampopoulos, Panagiotis Kosolobov, Dmitry Pissis, Solon P. Goldsmiths Univ London Dept Comp London England Reichman Univ Efi Arazi Sch Comp Sci Herzliyya Israel Ural Fed Univ Ekaterinburg Russia CWI Amsterdam Netherlands Vrije Univ Amsterdam Netherlands

Given a string Tof length nover an alphabet Sigma subset of{1, 2,..., n(O(1))} of size sigma, we are to preprocess Tso that given a range [i, j], we can return a representation of a shortest string over Sigma that is absent in the fragment T[i] . . . T[ j] of T. We present an O(n)-space data structure that answers such queries in constant time and can be constructed in O(n log(sigma) n) time. (C) 2022 Elsevier B.V. All rights reserved.

关键词： string algorithms Internal queries Shortest absent word Bit parallelism

来源：评论

学校读者我要写书评

暂无评论

Occurrence and substring heuristics for δ-matching

引用

FUNDAMENTA INFORMATICAE 2003年第1-2期56卷 1-21页

作者： Crochemore, M Iliopoulos, CS Lecroq, T Pinzon, YJ Plandowski, W Rytter, W Univ Rouen Fac Sci & Tech LIFAR ABISS F-76821 Mont St Aignan France Univ Marne la Vallee Inst Gaspard Monge F-77454 Marne La Vallee 2 France Kings Coll London Dept Comp Sci London WC2R 2LS England Curtin Univ Technol Sch Comp Bentley WA 6102 Australia Warsaw Univ Inst Informat PL-02097 Warsaw Poland New Jersey Inst Technol Dept Comp Sci Newark NJ 07102 USA

We consider a version of pattern matching useful in processing large musical data: delta-matching, which consists in finding matches which are delta-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols a, b is measured as \a- b\. We also consider (delta, gamma)-matching, where gamma is a bound on the total sum of the differences. We first consider "occurrence heuristics" by adapting exact string matching algorithms to the two notions of approximate string matching. The resulting algorithms are efficient in practice. Then we consider "substring heuristics". We present delta-matching algorithms fast on the average providing that the pattern is "non-flat" and the alphabet interval is large. The pattern is "flat" if its structure does not vary substantially. The algorithms, named delta-BM1, delta-BM2 and delta-BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only "occurrence heuristics" have been considered. Our substring heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use delta-versions of suffix tries and subword graphs. Surprisingly, in the context of delta-matching subword graphs appear to be superior compared with compact suffix trees.

关键词： string algorithms approximate string matching dynamic programming computer-assisted music analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：