A practical, suboptimal universal block sourcecoding scheme, subject to a fidelity criterion, is proposed. The algorithm is an extension of the Lempel-Ziv algorithm and is based on string matching with distortion. It...
详细信息
A practical, suboptimal universal block sourcecoding scheme, subject to a fidelity criterion, is proposed. The algorithm is an extension of the Lempel-Ziv algorithm and is based on string matching with distortion. It is shown that given average distortion D > 0, the algorithm achieves a rate not exceeding R(D/2) for a large class of sources and distortion measures. Tighter bounds on the rate are derived for discrete memoryless sources and for memoryless Gaussian sources.
Dube and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string...
详细信息
Dube and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.
Context weighting procedures are presented for sources with models (structures) in four different classes. Although the procedures are designed for universal data compression purposes, their generality allows applicat...
详细信息
Context weighting procedures are presented for sources with models (structures) in four different classes. Although the procedures are designed for universal data compression purposes, their generality allows application in the area of classification.
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, ca...
详细信息
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2...X-n;that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O (n(-nu)) for some 0 < v < 1/2 as n --> infinity. As a matter of fact, we show that we can predict K = O(1) symbols with the same probability of error.
Many image compression techniques require the quantization of multiple vector sources with significantly different distributions. With vector quantization (VQ), these sources are optimally quantized using separate cod...
详细信息
Many image compression techniques require the quantization of multiple vector sources with significantly different distributions. With vector quantization (VQ), these sources are optimally quantized using separate codebooks, which may collectively require an enormous memory space, Since storage is limited in most applications, a convenient may to gracefully trade between performance and storage is needed, Earlier work addressed this problem by clustering the multiple sources into a small number of source groups, where each group shares a codebook, We propose a new solution based on a size-limited universal codebook that can be viewed as the union of overlapping source codebooks, This framework allows each source codebook to consist of any desired subset of the universal codevectors and provides greater design flexibility which improves the storage-constrained performance. A key feature of this approach is that no two sources need be encoded at the same rate. An additional advantage of the proposed method is its close relation to universal, adaptive, finite-state and classified quantization, Necessary conditions for optimality of the universal codebook and the extracted source codebooks are derived. An iterative design algorithm is introduced to obtain a solution satisfying these conditions. Possible applications of the proposed technique are enumerated, and its effectiveness is illustrated for coding of images using finite-state vector quantization, multistage vector quantization, and tree-structured vector quantization.
The problem of variable length and fixed-distortion universal source coding (or D-semifaithful sourcecoding) for stationary and memoryless sources on countably infinite alphabets (infinity -alphabets) is addressed in...
详细信息
The problem of variable length and fixed-distortion universal source coding (or D-semifaithful sourcecoding) for stationary and memoryless sources on countably infinite alphabets (infinity -alphabets) is addressed in this paper. The main results of this work offer a set of sufficient conditions (from weaker to stronger) to obtain weak minimax universality, strong minimax universality, and corresponding achievable rates of convergences for the worst-case redundancy for the family of stationary memoryless sources whose densities are dominated by an envelope function (or the envelope family) on infinity-alphabets. An important implication of these results is that universal D-semifaithful sourcecoding is not feasible for the complete family of stationary and memoryless sources on infinity-alphabets. To demonstrate this infeasibility, a sufficient condition for the impossibility is presented for the envelope family. Interestingly, it matches the well-known impossibility condition in the context of lossless (variable-length) universal source coding. More generally, this work offers a simple description of what is needed to achieve universal D-semifaithful coding for a family of distributions Lambda. This reduces to finding a collection of quantizations of the product space at different blocklengths - reflecting the fixed distortion restriction - that satisfy two asymptotic requirements: the first is a universal quantization condition with respect to Lambda, and the second is a vanishing information radius (I-radius) condition for Lambda reminiscent of the condition known for lossless universal source coding.
Consider a binary modulo-additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process Z, the capacity is 1 - H(Z) (H(.) denoting the entropy rate). It is shown analogously that wh...
详细信息
Consider a binary modulo-additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process Z, the capacity is 1 - H(Z) (H(.) denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence z(infinity), the capacity under finite-state encoding and decoding is 1 - (rho) over bar (z(infinity)), where (rho) over bar(.) is Lempel and Ziv's finite-state compressibility. This quantity, termed the porosity (sigma) under bar(.) of the channel, holds as the fundamental limit to communication-even when the encoder is designed with knowledge of the noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These results, both converse and achievability, may be interpreted as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, and also as an extension to existing work on communicating across modulo-additive channels with an individual noise sequence. In addition, a potentially more practical architecture is suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav.
The Lawrence algorithm is a universal binary variable-to-fixed length sourcecoding algorithm. Here, a modified version of this algorithm is introduced and its asymptotic performance is investigated. For M (the segmen...
详细信息
The Lawrence algorithm is a universal binary variable-to-fixed length sourcecoding algorithm. Here, a modified version of this algorithm is introduced and its asymptotic performance is investigated. For M (the segment set cardinality) large enough, it is shown that the rate R(theta) as a function of the source parameter-theta satisfies R(theta) almost-equal-to h(theta . (1 + log log M/2 log M), for 0 < theta < 1. Here h( . ) is the binary entropy function. In addition to this, it is proven that no codes exist that have a better asymptotic performance, thereby establishing the asymptotic optimality of our modified Lawrence code. The asymptotic bounds show that universal variable-to-fixed length codes can have a significantly lower redundancy than universal fixed-to-variable length codes with the same number of codewords.
We propose a variation of the Context Tree Weighting algorithm for tree source modified such that the growth of the context resembles Lempel-Ziv parsing. We analyze this algorithm, give a concise upper bound to the in...
详细信息
We propose a variation of the Context Tree Weighting algorithm for tree source modified such that the growth of the context resembles Lempel-Ziv parsing. We analyze this algorithm, give a concise upper bound to the individual redundancy for any tree source, and prove the asymptotic optimality of the data compression rate for any stationary and ergodic source.
We investigate the task of compressing an image by using different probability models for compressing different regions of the image. In this task, using a larger number of regions would result in better compression, ...
详细信息
We investigate the task of compressing an image by using different probability models for compressing different regions of the image. In this task, using a larger number of regions would result in better compression, but would also require more bits for describing the regions and the probability models used in the regions. We discuss using quadtree methods for performing the compression. We introduce a class of probability models for images, the k-rectangular tilings of an image, that is formed by partitioning the image into Ic rectangular regions and generating the coefficients within each region by using a probability model selected from a finite class of N probability models. For an image of size n x n, we give a sequential probability assignment algorithm that codes the image with a code length which is within O(k log Nn/k) of the code length produced by the best probability model in the class. The algorithm has a computational complexity of O(Nn(3)), An interesting subclass of the class of k-rectangular tilings is the class of tilings using rectangles whose widths are powers of two. This class is far more flexible than quadtrees and yet has a sequential probability assignment algorithm that produces a code length that is within O(k log Nn/k) of the best model in the class with a computational complexity of O(Nn(2) log n) (similar to the computational complexity of sequential probability assignment using quadtrees), We also consider progressive transmission of the coefficients of the image.
暂无评论