检索结果-内蒙古大学图书馆

interpolative coding of integer sequences supporting log-time random access

INFORMATION PROCESSING & MANAGEMENT 2011年第5期47卷 742-761页

作者： Teuhola, J. Univ Turku Dept Informat Technol SF-20500 Turku Finland

Sequences of integers are common data types, occurring either as primary data or ancillary structures. The sizes of sequences can be large, making compression an interesting option. Effective compression presupposes variable-length coding, which destroys the regular alignment of values. Yet it would often be desirable to access only a small subset of the entries, either by position (ordinal number) or by content (element value), without having to decode most of the sequence from the start. Here such a random access technique for compressed integers is described, with the special feature that no auxiliary index is needed. The solution applies a method called interpolative coding, which is one of the most efficient non-statistical codes for integers. Indexing is avoided by address calculation guaranteeing sufficient space for codes even in the worst case. The additional redundancy, compared to regular interpolative coding, is only about 1 bit per source integer for uniform distribution. The time complexity of random access is logarithmic with respect to the source size for both position-based and content-based retrieval. According to experiments, random access is faster than full decoding when the number of accessed integers is not more than approximately 0.75. n/log(2)n for sequence length n. The tests also confirm that the method is quite competitive with other approaches to random access coding, suggested in the literature. (C) 2010 Elsevier Ltd. All rights reserved.

关键词： Source coding Data compression Random access interpolative coding Inverted index

来源：评论

学校读者我要写书评

暂无评论

Burrows-Wheeler post-transformation with effective clustering and interpolative coding

引用

SOFTWARE-PRACTICE & EXPERIENCE 2020年第9期50卷 1858-1874页

作者： Niemi, Arto Teuhola, Jukka Univ Turku Dept Future Technol Vesilinnantie 5 Turku 20500 Finland

Lossless compression methods based on the Burrows-Wheeler transform (BWT) are regarded as an excellent compromise between speed and compression efficiency: they provide compression rates close to the PPM algorithms, with the speed of dictionary-based methods. Instead of the laborious statistics-gathering process used in PPM, the BWT reversibly sorts the input symbols, using as the sort key as many following characters as necessary to make the sort unique. Characters occurring in similar contexts are sorted close together, resulting in a clustered symbol sequence. Run-length encoding and Move-to-Front (MTF) recoding, combined with a statistical Huffman or arithmetic coder, is then typically used to exploit the clustering. A drawback of the MTF recoding is that knowledge of the character that produced the MTF number is lost. In this paper, we present a new, competitive Burrows-Wheeler posttransform stage that takes advantage of interpolative coding-a fast binary encoding method for integer sequences, being able to exploit clusters without requiring explicit statistics. We introduce a fast and simple way to retain knowledge of the run characters during the MTF recoding and use this to improve the clustering of MTF numbers and run-lengths by applying reversible, stable sorting, with the run characters as sort keys, achieving significant improvement in the compression rate, as shown here by experiments on common text corpora.

关键词： lossless compression Burrows-Wheeler transform move-to-front interpolative coding

来源：评论

学校读者我要写书评

暂无评论

Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems

引用

INFORMATION PROCESSING & MANAGEMENT 2006年第2期42卷 407-428页

作者： Cheng, CS Shann, JJJ Chung, CP Natl Chiao Tung Univ Dept Comp Sci & Informat Engn Hsinchu 30050 Taiwan

This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this Clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present it method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier Clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： inverted index compression inverted file prefix-free coding interpolative coding fast decoding

来源：评论

学校读者我要写书评

暂无评论

Comparison between text compression algorithms inbiological sequences

引用

INFORMATION AND COMPUTATION 2020年第0期270卷 104466-000页

作者： Kounelis, Fotios Makris, Christos Univ Patras Comp Engn & Informat Dept Patras Greece

Inverted indexes are mainstream in Information Retrieval systems and many compression techniques have been proposed. The purpose of this paper is to explore the compression efficiency on a two-level inverted index tailored for n-gram indices. We use two compression techniques Optimal PForDelta and IPC. Both techniques are applied to a previous work of us, that has focused on developing a threshold to efficiently store subsequences inside a one or two level inverted index, based on their number of occurrences inside a biological sequence. We study the performance of these two compression algorithms over different fluctuations of the threshold. The compression ratio of the OptPFD is affected by the changes in the threshold and is also efficient as in text documents. Whereas, IPC has a different performance for each threshold and it is more stable, although it is much less efficient than in text documents. (C) 2019 Elsevier Inc. All rights reserved.

关键词： Inverted index Index compression interpolative coding Bioinformatics Information retrieval OptPFD compression

来源：评论

学校读者我要写书评

暂无评论

Generalized PCM coding of Images

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2012年第8期21卷 3801-3806页

作者： Prades Nebot, Jose Morbee, Marleen Delp, Edward J. Univ Politecn Valencia Inst Telecommun & Multimedia Applicat Valencia 46022 Spain Univ Ghent Dept Telecommun & Informat Proc B-9000 Ghent Belgium Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

Pulse-code modulation (PCM) with embedded quantization allows the rate of the PCM bitstream to be reduced by simply removing a fixed number of least significant bits from each codeword. Although this source coding technique is extremely simple, it has poor coding efficiency. In this paper, we present a generalized PCM (GPCM) algorithm for images that simply removes bits from each codeword. In contrast to PCM, however, the number and the specific bits that a GPCM encoder removes in each codeword depends on its position in the bitstream and the statistics of the image. Since GPCM allows the encoding to be performed with different degrees of computational complexity, it can adapt to the computational resources that are available in each application. Experimental results show that GPCM outperforms PCM with a gain that depends on the rate, the computational complexity of the encoding, and the degree of inter-pixel correlation of the image.

关键词： Binning interpolative coding pulse-code modulation quantization

来源：评论

学校读者我要写书评

暂无评论

Video compression with binary tree recursive motion estimation and binary tree residue coding

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2000年第7期9卷 1288-1292页

作者： Robinson, JA Druet, A Gosset, N Mem Univ Newfoundland Fac Engn & Appl Sci St Johns NF A1B 3X5 Canada

Binary tree predictive coding (BTPC) is an efficient general-purpose still-image compression scheme, competitive with JPEG for natural image coding and with GIF for graphics. We report in this paper the extension of BTPC to video compression using motion estimation and compensation techniques which are simple, efficient, nonlinear and predictive. The new methods, binary tree recursive motion estimation coding (BTRMEC), and binary tree residue coding (BTRC) exploit the hierarchical structure of BTPC, in the first case giving progressively refined motion estimates for increasing numbers of pels and in the second case providing efficient residue coding. Compression results for BTRMEC and BTRC are compared against conventional block-based motion compensated coding as provided by MPEG, They show that both BTRMEC and BTRC are efficient methods to code video sequences.

关键词： interpolative coding motion estimation video coding

来源：评论

学校读者我要写书评

暂无评论

Tournament coding of Integer Sequences

引用

COMPUTER JOURNAL 2009年第3期52卷 368-377页

作者： Teuhola, Jukka Univ Turku Dept Informat Technol FI-20014 Turku Finland

A new, simple non-statistical source coding technique for sequences of integers is suggested. The method is based on a tournament scheme, with the sequence arranged into pairs, where maxima ('winners') are encoded recursively, and minima are encoded by semi-fixed-length codes using the related maxima to bound the code lengths. In the experiments, tournament coding has outperformed the other non-statistical methods (gamma, delta, Fibonacci and interpolative coding) for uniform distribution of numbers. Also for non-uniform distributions the method is quite competitive.

关键词： source coding interpolative coding data compression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：