Average case universal compression of independent and identically distributed (i.i.d.) sources is investigated, where the source alphabet is large, and may be sublinear in size or even larger than the compressed data ...
详细信息
Average case universal compression of independent and identically distributed (i.i.d.) sources is investigated, where the source alphabet is large, and may be sublinear in size or even larger than the compressed data sequence length n. In particular, the well-known results, including Rissanen's strongest sense lower bound, for fixed-size alphabets are extended to the case where the alphabet size k is allowed to grow with n. It is shown that as long as k = o(n), instead of the coding cost in the fixed-size alphabet case of 0.5 log n extra code bits for each one of the k - I unknown probability parameters, the cost is now 0.5 log(n/k) code bits for each unknown parameter. This result is shown to be the lower bound in the minimax and maximin senses, as well as for almost every source in the class. Achievability of this bound is demonstrated with two-part codes based on quantization of the maximum-likelihood (ML) probability parameters, as well as by using the well-known Krichevsky-Trofimov (KT) low-complexity sequential probability estimates. For very large alphabets, k >> n, it is shown that an average minimax and maximin bound on the redundancy is essentially (to first order) log(k/n) bits per symbol. This bound is shown to be achievable both with two-part codes and with a sequential modification of the KT estimates. For k = Theta(n), the redundancy is Theta(1) bits per symbol. Finally, sequential codes are designed for coding sequences in which only m < min{k, n} alphabet symbols occur.
Consider approximate (lossy) matching of a source string similar toP, With a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate R-1 ...
详细信息
Consider approximate (lossy) matching of a source string similar toP, With a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate R-1 = R(P, Q, d) for this setting. We observe that for large word length End,vith high probability, the matching codeword is typical with a distribution Q(1) which is different from Q. If a new random codebook is generated similar toQ(1), then the source string will favor codewords which are typical,vith a new distribution Q(2), resulting in minimum coding rate R-2 = R(P, Q1, d),and so on, We show that the sequences of distributions Q1, Q(2),.., and rates R-1, R-2, ..., generated by this procedure, converge to an optimum reproduction distribution Q*, and the rate-distortion function R(P, d), respectively. We also derive a fixed rate-distortion slope version of this natural type selection process,In the latter case, an iteration of the process stochastically simulates an iteration of the Blahut-Arimoto (BA) algorithm for rate-distortion function computation (without recourse to prior knowledge of the underlying source distribution). To strengthen these limit statements, we also characterize the steady-state error of these procedures when iterating at a finite String length. Implications of the main results provide fresh insights into the workings of lossy variants of the Lempel-Ziv algorithm for adaptive compression.
A universal variable-to-fixed length algorithm for binary memoryless sources which converges to the entropy of the source at the optimal rate is known. We study the problem of universal variable-to-fixed length coding...
详细信息
A universal variable-to-fixed length algorithm for binary memoryless sources which converges to the entropy of the source at the optimal rate is known. We study the problem of universal variable-to-fixed length coding for the class of Markov sources with finite alphabets. We give an upper bound on the performance of the code for large dictionary sizes and show that the code is optimal in the sense that no codes exist that have better asymptotic performance, The optimal redundancy is shown to be H log log M/log M where H is the entropy rate of the source and M is the code size, This result is analogous to Rissanen's result for fixed-to-variable length codes. We investigate the performance of a variable-to-fixed coding method which does not need to store the dictionaries, either at the coder or the decoder, We also consider the performance of both these source codes on individual sequences. For individual sequences we bound the performance in terms of the best code length achievable by a class of coders, All the codes that we consider are prefix-free and complete.
We shall generalize B.S. Clarke and A.R. Barron's analysis of the Bayes method for the FSMX sources. The FSMX source considered here is specified by the set of all states and its parameter value. At first, we show...
详细信息
We shall generalize B.S. Clarke and A.R. Barron's analysis of the Bayes method for the FSMX sources. The FSMX source considered here is specified by the set of all states and its parameter value. At first, we show the asymptotic codelengths of individual sequences of the Bayes codes for the FSMX sources. Secondly, we show the asymptotic expected codelengths. The Bayesian posterior density and the maximum likelihood estimator satisfy asymptotic normality for the finite ergodic Markov source, and this is the key of our analysis.
Exponential error bounds achievable by universal coding and decoding are derived for frame-asynchronous discrete memoryless multiple access channels with two senders, via the method of subtypes, a refinement of the me...
详细信息
Exponential error bounds achievable by universal coding and decoding are derived for frame-asynchronous discrete memoryless multiple access channels with two senders, via the method of subtypes, a refinement of the method of types. An empirical entropy decoder is employed. A key tool is an improved packing lemma, that overcomes the technical difficulty caused by codeword repetitions via an induction based new argument. The asymptotic form of the bounds admits numerical evaluation. This demonstrates that error exponents achievable by synchronous transmission can be superseded via controlled asynchronism, i.e. a deliberate shift of the codewords.
We derive the asymptotics of the redundancy of Bayes rules for Markov chains of fixed order over a finite alphabet, extending the work of Barren and Clarke on independent and identically distributed (i.i.d.) sources. ...
详细信息
We derive the asymptotics of the redundancy of Bayes rules for Markov chains of fixed order over a finite alphabet, extending the work of Barren and Clarke on independent and identically distributed (i.i.d.) sources. The asymptotics are derived when the actual source is the class of phi-mixing sources which strictly includes Markov chains. These results can be used to derive minimax asymptotic: rates of convergence for universal codes when a Markov chain of fixed order is used as a model.
The method of types is one of the key technical tools in Shannon Theory, and this tool is valuable also in other fields. In this paper, some key applications will be presented in sufficient detail enabling an interest...
详细信息
The method of types is one of the key technical tools in Shannon Theory, and this tool is valuable also in other fields. In this paper, some key applications will be presented in sufficient detail enabling an interested nonspecialist to gain a working knowledge of the method, and a wide selection of further applications will be surveyed. These range from hypothesis testing and large deviations theory through error exponents for discrete memoryless channels and capacity of arbitrarily varying channels to multiuser problems. While the method of types is suitable primarily for discrete memoryless models, its extensions to certain models with memory will also be discussed.
We investigate a type of lossless source code called a grammar-based code, which, in response to any input data string a: over a fixed finite alphabet, selects a contest-free grammar G(x) representing x in the sense t...
详细信息
We investigate a type of lossless source code called a grammar-based code, which, in response to any input data string a: over a fixed finite alphabet, selects a contest-free grammar G(x) representing x in the sense that x is the unique string belonging to the language generated by G(x), Lossless compression of a: takes place indirectly,ia compression of the production rules of the grammar G(x), It is shown that, subject to some mild restrictions, a grammar-based code is a universal code with respect to the family of finite-state information sources over the finite alphabet. Redundancy bounds for grammar-based codes are established. Reduction rules for designing grammar-based codes are presented.
In this paper, we characterize functions that simulate independent unbiased coin flips from independent coin flips of unknown bias. We call such functions randomizing. Our characterization of randomizing functions ena...
详细信息
In this paper, we characterize functions that simulate independent unbiased coin flips from independent coin flips of unknown bias. We call such functions randomizing. Our characterization of randomizing functions enables us to identify the functions that generate the largest average number of fair coin flips from a fixed number of biased coin flips. We show that these optimal functions are efficiently computable. Then we generalize the characterization, and we present a method to simulate an arbitrary rational probability distribution optimally (in terms of the average number of output digits) and efficiently (in terms of computational complexity) from outputs of many-faced dice of unknown distribution. We also study randomizing functions on exhaustive prefix-free sets.
Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of ...
详细信息
Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal coding scheme is examined, and a general expression is derived for it. An upper bound for the redundancy of this scheme, defined as the difference between its rate and the minimal possible rate, given by the rate distortion function of the source, is derived. This bound holds for all distortion levels. Furthermore, a composite upper bound on the redundancy as a function of the quantizer resolution that leads to a tighter bound in the high rate (low distortion) case is presented.
暂无评论