Minimum redundancy coding (also known as Huffman coding) is one of the enduring techniques of data compression. Many efforts have been made to improve the efficiency of minimum redundancy coding, the majority based on...
详细信息
Minimum redundancy coding (also known as Huffman coding) is one of the enduring techniques of data compression. Many efforts have been made to improve the efficiency of minimum redundancy coding, the majority based on the use of improved representations for explicit Huffman trees. In this paper, we examine how minimum redundancy coding can be implemented efficiently by divorcing coding from a code tree, with emphasis on the situation when n is large, perhaps on the order of 10(6). We review techniques for devising minimum redundancy codes, and consider in detail how encoding and decoding should be accomplished. In particular, we describe a modified decoding method that allows improved decoding speed, requiring just a few machine operations per output symbol (rather than for each decoded bit), and uses just a few hundred bytes of memory above and beyond the space required to store an enumeration of the source alphabet.
Minimum-redundancy prefix codes have been a mainstay of research and commercial compression systems since their discovery by David Huffman more than 50 years ago. In this experimental evaluation we compare techniques ...
详细信息
Minimum-redundancy prefix codes have been a mainstay of research and commercial compression systems since their discovery by David Huffman more than 50 years ago. In this experimental evaluation we compare techniques for decoding minimum-redundancy codes, and quantify the relative benefits of recently developed restricted codes that are designed to accelerate the decoding process. We find that table-based decoding techniques offer fast operation, provided that the size of the table is kept relatively small, and that approximate coding techniques can offer higher decoding rates than Huffman codes with varying degrees of loss of compression effectiveness. Copyright (c) 2006 John Wiley & Sons, Ltd.
We consider the problem of constructing and transmitting the prelude for Huffman coding. With careful organization of the;required operations and an appropriate representation for the prelude, it is possible to make s...
详细信息
We consider the problem of constructing and transmitting the prelude for Huffman coding. With careful organization of the;required operations and an appropriate representation for the prelude, it is possible to make semistatic coding efficient even when S, the size of the source alphabet, is of the same magnitude as m, the length of the message being coded. The proposed structures are of direct relevance in applications that mimic one pass operation through the use of semistatic compression on a block-by block basis.
Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de...
详细信息
Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://***.
Finding associations between entities is a common information need in many areas. It has been facilitated by the increasing amount of graph-structured data on the Web describing relations between entities. In this pap...
详细信息
ISBN:
(纸本)9783319465234;9783319465227
Finding associations between entities is a common information need in many areas. It has been facilitated by the increasing amount of graph-structured data on the Web describing relations between entities. In this paper, we define an association connecting multiple entities in a graph as a minimal connected subgraph containing all of them. We propose an efficient graph search algorithm for finding associations, which prunes the search space by exploiting distances between entities computed based on a distance oracle. Having found a possibly large group of associations, we propose to mine frequent association patterns as a conceptual abstract summarizing notable subgroups to be explored, and present an efficient mining algorithm based on canonical codes and partitions. Extensive experiments on large, real RDF datasets demonstrate the efficiency of the proposed algorithms.
A canonical Huffman code is an optimal prefix-free compression code whose codewords enumerated in the lexicographical order form a list of binary words in non-decreasing lengths. Gagie et al. (2015) gave a representat...
详细信息
A canonical Huffman code is an optimal prefix-free compression code whose codewords enumerated in the lexicographical order form a list of binary words in non-decreasing lengths. Gagie et al. (2015) gave a representation of this coding capable of encoding and decoding a symbol in constant worst-case time. It uses slg sigma max+ o(s) + O(iota 2max) bits of space, where sand iota maxare the alphabet size and maximum codeword length, respectively. We refine their representation to reduce the space complexity to slg sigma max(1 + o(1)) bits while preserving the constant encode and decode times. Our algorithmic idea can be applied to any canonical code. (c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the
作者:
Zhang, ZhangLi, JunYu, JunChinese Acad Sci
Inst Comp Technol Beijing 100080 Peoples R China Chinese Acad Sci
Beijing Genom Inst Beijing 101300 Peoples R China Chinese Acad Sci
Grad Sch Beijing 100039 Peoples R China Zhejiang Univ
James D Watson Inst Genome Sci Hangzhou Genom Inst Key Lab Genom Bioinformat Zhejiang Prov Hangzhou 310027 Peoples R China
Background: Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades,...
详细信息
Background: Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades, several methods have been proposed but they have not considered unequal transitional substitutions (between the two purines, A and G, or the two pyrimidines, T and C) that become apparent when sequences data to be compared are vast and significantly diverged. Results: We propose a new method (MYN), a modified version of the Yang-Nielsen algorithm (YN), for evolutionary analysis of protein-coding sequences in general. MYN adopts the Tamura-Nei Model that considers the difference among rates of transitional and transversional substitutions as well as factors in codon frequency bias. We evaluate the performance of MYN by comparing to other methods, especially to YN, and to show that MYN has minimal deviations when parameters vary within normal ranges defined by empirical data. Conclusion: Our comparative results deriving from consistency analysis, computer simulations and authentic datasets, indicate that ignoring unequal transitional rates may lead to serious biases and that MYN performs well in most of the tested cases. These results also suggest that acquisitions of reliable synonymous and nonsynonymous substitution rates primarily depend on less biased estimates of transition/transversion rate ratio.
暂无评论