The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exp...
详细信息
The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and 1278) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length n --> infinity, 2) a variety of very simple new techniques for BWT-based lossless sourcecoding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources.
Two weighting procedures are presented for compaction of output sequences generated by binary independent sources whose unknown parameter may occasionally change, The resulting codes need no knowledge of the sequence ...
详细信息
Two weighting procedures are presented for compaction of output sequences generated by binary independent sources whose unknown parameter may occasionally change, The resulting codes need no knowledge of the sequence length T, i.e., they are strongly sequential, and also the number of parameter changes is unrestricted, The additional-transition redundancy of the first method was shown to achieve the Merhav lower bound, i.e., log T bits per transition, For the second method we could prove that additional-transition redundancy is not more than 3/2 log T bits per transition, which is more than the Merhav bound;however, the storage and computational complexity of this method are also more interesting than those of the first method, Simulations show that the difference in redundancy performance between the two methods is negligible.
暂无评论