检索结果-内蒙古大学图书馆

Data compression Conference (DCC)

作者： Adler, Enno Boettcher, Stefan Hartel, Rita Paderborn Univ Paderborn Germany

ISBN: (纸本)9798350385885;9798350385878

Neighborhood queries are the most common queries on graphs;thus, it is desirable to answer them efficiently on compressed data structures. Our full paper [1] presents a grammar-based compressor called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair. We applied ITR to network, version, and RDF graphs and we compared the performance of triple SPO queries on these compressed datasets generated by ITR and 4 state-of-the-art graph compression approaches. As shown in the figure below and in our full paper [1] , ITR outperforms the other graph compressors for all triple SPO queries except for the query-type ? P ?, while providing a compression size comparable to the other compressors. Thereby, ITR is 2 to 6 times faster than the fastest compared query-evaluation technique and up to 100 times faster than the slowest approach in our tests. © 2024 IEEE.

关键词： grammar-based compression graph compression hyperenge-replacement grammar neighborhood queries queries on compressed data self-indexes

来源：评论

学校读者我要写书评

暂无评论

Measuring the Similarity of Proteomes using grammar-based compression via Domain Combinations 11

Measuring the Similarity of Proteomes using Grammar-based Co...

引用

13th International Joint Conference on Biomedical Engineering Systems and Technologies

作者： Hayashida, Morihiro Koyano, Hitoshi Nacher, Jose C. Matsue Coll Natl Inst Technol Dept Elect Engn & Comp Sci Matsue Shimane Japan Tokyo Inst Technol Sch Life Sci & Technol Meguro Ku Tokyo Japan Toho Univ Fac Sci Dept Informat Sci Funabashi Chiba Japan

ISBN: (纸本)9789897583988

Revealing evolution of organisms is one of important biological research topics, and is also useful for understanding the origin of organisms. Hence, genomic sequences have been compared and aligned for finding conserved and functional regions. A protein can contain several domains, which are known as structural and functional units. In the previous work, a proteome, whole kinds of proteins in an organism, was regarded as a set of sequences of protein domains, and a grammar-based compression algorithm was developed for a proteome, where production rules in the grammar represented evolutionary processes, mutation and duplication. In this paper, we propose a similarity measure based on the grammar-based compression, and apply it to hierarchical clustering of seven organisms, Homo sapiens. Mus musculus, Drosophila melanogaster, Caenorhabditis elegans. Sacchammyces cerevisiae. Arabidopsis thaliana. and Escherichia coli. The results suggest that our similarity measure could classify the organisms very well.

关键词： grammar-based compression Kolmogorov Complexity Protein Domain Combination

来源：评论

学校读者我要写书评

暂无评论

Universal Tree Source Coding Using grammar-based compression

引用

IEEE TRANSACTIONS ON INFORMATION THEORY 2019年第10期65卷 6399-6413页

作者： Ganardi, Moses Hucke, Danny Lohrey, Markus Benkner, Louisa Seelbach Univ Siegen Dept Elect Engn & Comp Sci D-57076 Siegen Germany

The problem of universal source coding for binary trees is considered. Zhang, Yang, and Kieffer derived upper bounds on the average-case redundancy of codes based on directed acyclic graph (DAG) compression for binary tree sources with certain properties. In this paper, a natural class of binary tree sources is presented such that the demanded properties are fulfilled. Moreover, for both subclasses considered in the paper of Zhang, Yang, and Kieffer, their result is improved by deriving bounds on the maximal pointwise redundancy (or worst-case redundancy) instead of the average-case redundancy. Finally, using context-free tree grammars instead of DAGs, upper bounds on the maximal pointwise redundancy for certain binary tree sources are derived. This yields universal codes for new classes of binary tree sources.

关键词： grammar-based compression minimal DAG representation binary trees universal source coding lossless compression

来源：评论

学校读者我要写书评

暂无评论

Universal Tree Source Coding Using grammar-based compression

Universal Tree Source Coding Using Grammar-Based Compression

引用

IEEE International Symposium on Information Theory (ISIT)

作者： Ganardi, Moses Hucke, Danny Lohrey, Markus Benkner, Louisa Seelbach Univ Siegen Dept Elect Engn & Comp Sci D-57076 Siegen Germany

关键词： grammar-based compression minimal DAG representation binary trees universal source coding lossless compression

来源：评论

学校读者我要写书评

暂无评论

grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs

引用

BMC BIOINFORMATICS 2015年第1期16卷 1-13页

作者： Zhao, Yang Hayashida, Morihiro Cao, Yue Hwang, Jaewook Akutsu, Tatsuya Kyoto Univ Inst Chem Res Bioinformat Ctr Uji Kyoto Japan

Background: Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees. Results: Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs. Conclusions: We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

关键词： grammar-based compression Bisection-type tree grammar Glycan RNA secondary structure

来源：评论

学校读者我要写书评

暂无评论

Approximation of grammar-based compression via recompression

引用

THEORETICAL COMPUTER SCIENCE 2015年 592卷 115-134页

作者： Jez, Artur Univ Wroclaw Inst Comp Sci PL-50383 Wroclaw Poland

In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(glog(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from (1,..., N-C) for some constant c. Otherwise, additional cost of O(N log vertical bar Sigma vertical bar) is needed. Algorithms with such an approximation guarantee and running time are known, the novelty of this paper is a particular simplicity of the algorithm as well as the analysis of the algorithm, which uses a general technique of recompression recently introduced by the author. Furthermore, contrary to the previous results, this work does not use the LZ representation of the input string in the construction, nor in the analysis. (C) 2015 Elsevier B.V. All rights reserved.

关键词： grammar-based compression Construction of the smallest grammar SLP compression

来源：评论

学校读者我要写书评

暂无评论

How to Find Long Maximal Exact Matches and Ignore Short Ones 1

引用

28th International Conference on Developments in Language Theory (DLT)

作者： Gagie, Travis Dalhousie Univ Fac Comp Sci Halifax NS Canada

ISBN: (数字)9783031661594

ISBN: (纸本)9783031661587;9783031661594

Finding maximal exact matches (MEMs) between strings is an important task in bioinformatics, but it is becoming increasingly challenging as geneticists switch to pangenomic references. Fortunately, we are usually interested only in the relatively few MEMs that are longer than we would expect by chance. In this paper we show that under reasonable assumptions we can find all MEMs of length at least L between a pattern of length m and a text of length n in O(m) time plus extra O(log n) time only for each MEM of length at least nearly L using a compact index for the text, suitable for pangenomics.

关键词： Maximal exact matches pangenomics Burrows-Wheeler Transform grammar-based compression

来源：评论

学校读者我要写书评

暂无评论

Revisiting the Folklore Algorithm for Random Access to grammar-Compressed Strings 31st

Revisiting the Folklore Algorithm for Random Access to Gramm...

引用

31st International Symposium on String Processing and Information Retrieval (SPIRE)

作者： Cleary, Alan M. Winjum, Joseph Dood, Jordan Inenaga, Shunsuke Natl Ctr Genome Resources Santa Fe NM USA Montana State Univ Bozeman MT 59717 USA Hyalite Technol LLC Bozeman MT USA Kyushu Univ Dept Informat Fukuoka Japan

ISBN: (纸本)9783031721991;9783031722004

grammar-based compression is a widely-accepted model of string compression that allows for efficient and direct manipulations on the compressed data. Most, if not all, such manipulations rely on the primitive random access queries, a task of quickly returning the character at a specified position of the original uncompressed string without explicit decompression. While there are advanced data structures for random access to grammar-compressed strings that guarantee theoretical query time and space bounds, little has been done for the practical perspective of this important problem. In this paper, we revisit a wellknown folklore random access algorithm for grammars in the Chomsky normal form, modify it to work directly on general grammars, and show that this modified version is fast and memory efficient in practice.

关键词： grammar-based compression random access straight-line programs

来源：评论

学校读者我要写书评

暂无评论

Scalable Detection of Frequent Substrings by grammar-based compression

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2013年第3期E96D卷 457-464页

作者： Nakahara, Masaya Maruyama, Shirou Kuboyama, Tetsuji Sakamoto, Hiroshi Kyushu Inst Technol Iizuka Fukuoka 8208502 Japan Kyushu Univ Fukuoka 8190395 Japan Gakushuin Univ Tokyo 1718588 Japan JST PRESTO Kawaguchi Saitama 3320012 Japan

A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

关键词： pattern discovery grammar-based compression edit-sensitive parsing

来源：评论

学校读者我要写书评

暂无评论

Balancing Straight-line Programs

引用

JOURNAL OF THE ACM 2021年第4期68卷 1–40页

作者： Ganardi, Moses Jez, Artur Lohrey, Markus Max Planck Inst Software Syst MPI SWS Paul Ehrlich Str G 26 D-67663 Saarbrucken Germany Univ Wroclaw Ul Joliot Curie 15 PL-50383 Wroclaw Poland Univ Siegen Holderlinstr 3 D-57076 Siegen Germany

We show that a context-free grammar of size m that produces a single string w of length n (such a grammar is also called a string straight-line program) can be transformed in linear time into a context-free grammar for w of size O(m), whose unique derivation tree has depth O(log n). This solves an open problem in the area of grammar-based compression, improves many results in this area, and greatly simplifies many existing constructions. Similar results are shown for two formalisms for grammar-based tree compression: top dags and forest straight-line programs. These balancing results can be all deduced from a single meta-theorem stating that the depth of an algebraic circuit over an algebra with a certain finite base property can be reduced to O(log n) with the cost of a constant multiplicative size increase. Here, n refers to the size of the unfolding (or unravelling) of the circuit. In particular, this results applies to standard arithmetic circuits over (noncommutative) semirings.

关键词： grammar-based compression balancing straight-line programs random access problem

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：