We consider the universal source coding problem for first-order stationary, irreducible and aperiodic Markov sources for short blocklengths. Achievability is derived based on the previously introduced algorithm for un...
详细信息
ISBN:
(纸本)9781467377041
We consider the universal source coding problem for first-order stationary, irreducible and aperiodic Markov sources for short blocklengths. Achievability is derived based on the previously introduced algorithm for universal compression of memoryless sources in the finite blocklengths, the Type Size Code, which encodes strings based on type class size. We derive the third-order asymptotic coding rate of the Type Size code for this model class. We also present a converse on the third-order coding rate for the general class of fixed-to-variable codes and show the optimality of Type Size codes for such Markov sources.
Dube and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string...
详细信息
Dube and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.
Consider a binary modulo-additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process Z, the capacity is 1 - H(Z) (H(.) denoting the entropy rate). It is shown analogously that wh...
详细信息
Consider a binary modulo-additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process Z, the capacity is 1 - H(Z) (H(.) denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence z(infinity), the capacity under finite-state encoding and decoding is 1 - (rho) over bar (z(infinity)), where (rho) over bar(.) is Lempel and Ziv's finite-state compressibility. This quantity, termed the porosity (sigma) under bar(.) of the channel, holds as the fundamental limit to communication-even when the encoder is designed with knowledge of the noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These results, both converse and achievability, may be interpreted as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, and also as an extension to existing work on communicating across modulo-additive channels with an individual noise sequence. In addition, a potentially more practical architecture is suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav.
We discuss the properties of Jeffreys mixture for a Markov model. First, we show that a modified Jeffreys mixture asymptotically achieves the minimax coding regret for universal data compression, where we do not put a...
详细信息
We discuss the properties of Jeffreys mixture for a Markov model. First, we show that a modified Jeffreys mixture asymptotically achieves the minimax coding regret for universal data compression, where we do not put any restriction on data sequences. Moreover, we give an approximation formula for the prediction probability of Jeffreys mixture for a Markov model. By this formula, it is revealed that the prediction probability by Jeffreys mixture for the Markov model with alphabet is not of the form (n(x vertical bar s) + alpha)/(n(s) + beta), where n(x vertical bar s) is the number of occurrences of the symbol following the context s is an element of {0,1} and n(s) = n(0 vertical bar s) + n(1 vertical bar s). Moreover, we propose a method to compute our minimax strategy, which is a combination of a Monte Carlo method and the approximation formula, where the former is used for earlier stages in the data, while the latter is used for later stages.
A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes,...
详细信息
A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes, maximum decimation matching, hierarchical structure of multidimensional sourcecoding, and dictionary augmentation. As a counterpart of the longest match search in the Lempel-Ziv algorithm, two classes of maximum decimation matching are studied. Also, an underlying behavior of the dictionary augmentation scheme for estimating the source statistics is examined. For an m-dimensional source, m augmentative patches are appended into the dictionary at each coding epoch, thus requiring the transmission of a substantial amount of information to the decoder. The property of the hierarchical structure of the sourcecoding algorithm resolves this issue by successively incorporating lower dimensional coding procedures in the scheme. In regard to universal lossy source coders, we propose two distortion functions, the local average distortion and the local minimax distortion with a set of threshold levels for each source symbol. For performance evaluation, we implemented three image compression algorithms based upon the MDIP;one is lossless and the others are lossy. The lossless image compression algorithm does not perform better than the Lempel-Ziv-Welch coding, but experimentally shows efficiency in capturing the source structure. The two lossy image compression algorithms are implemented using the two distortion functions, respectively. The algorithm based on the local average distortion is efficient at minimizing the signal distortion, but the images by the one with the local minimax distortion have a good perceptual fidelity among other compression algorithms. Our insights inspire future research on feature extraction of multidimensional discrete sources.
In this paper, we investigate the redundancy in the universal compression of finite-length smooth parametric sources. Rissanen demonstrated that for a smooth parametric source with $d$ unknown parameters, the expected...
详细信息
ISBN:
(纸本)9780769543529
In this paper, we investigate the redundancy in the universal compression of finite-length smooth parametric sources. Rissanen demonstrated that for a smooth parametric source with $d$ unknown parameters, the expected redundancy for regular codes is asymptotically given by $frac{d}sourcelog n + o(log n)$ for almost all sources. Clarke and Barron derived the "minimax expected redundancy" for memoryless sources, which is the maximum redundancy of the best code over the space of source parameters. However, the minimax redundancy is for a particular parameter value, which does not provide much insight about different source parameters. We derived a lower bound on the compression of finite-length memoryless sequences using a probabilistic treatment. In this paper, we extend our analysis to smooth parametric sequences. We focus on two-part codes with an asymptotic $O(1)$ extra redundancy. We also require that the length function be regular, which is not restrictive since all codes that we know are regular. We derive a lower bound on the probability that the source is compressed with redundancy greater than any redundancy level $R_0$, i.e., we find a lower bound on $mb{P}[R_n(l_{2p},theta)>R_0]$, where $R_n(l_{2p},theta)$ is the redundancy in the compression of a parametric sequence of length $n$ using a two-part length function $l_{2p}$ for the source parameter $theta$.
The universal lossless sourcecoding problem is one of the most important problem in communication systems. The aim of sourcecoding is to compress data to reduce costs in digital communication. Traditional universal ...
详细信息
The universal lossless sourcecoding problem is one of the most important problem in communication systems. The aim of sourcecoding is to compress data to reduce costs in digital communication. Traditional universal source coding schemes are usually designed for stationary sources. Recently, some universal codes for nonstationary sources have been proposed. Independent piecewise identically distributed (i.p.i.d.) sources are simple nonstationary sources that parameter changes discontinuously. In this paper, we assume new i.p.i.d. sources class, and we prove that Bayes codes minimize the mean redundancy when parameter transition pattern is known and parameter is unknown.
A generalized latent semantic analysis framework using a universal source coding algorithm for content-based image retrieval is proposed. By the multidimensional incremental parsing algorithm which is considered as a ...
详细信息
ISBN:
(纸本)9781424417650
A generalized latent semantic analysis framework using a universal source coding algorithm for content-based image retrieval is proposed. By the multidimensional incremental parsing algorithm which is considered as a multidimensional extension of the Lempel-Ziv data compression method, a given image is compressed at a moderate bitrate while constructing the dictionary which implicitly embeds source statistics. Instead of concatenating all the corresponding dictionaries of an image corpus, we sequentially compress images using a previously constructed dictionary and end up with a visual lexicon which contains the least number of visual words covering all the images in the corpus. From the latent semantic analysis of the co-occurrence pattern of visual words over the images, a similarity between a given query and an image from the corpus is measured. An application of the proposed technique on a database of 20,000 natural scene images has demonstrated that the performance of the proposed system is favorable to that of existing approaches.
This paper investigates some relations among four complexities of sequence over countably infinite alphabet, and shows that two kinds of empirical entropies and the self-entropy rate regarding a Markov source are asym...
详细信息
This paper investigates some relations among four complexities of sequence over countably infinite alphabet, and shows that two kinds of empirical entropies and the self-entropy rate regarding a Markov source are asymptotically equal and lower bounded by the maximum number of phrases in distinct parsing of the sequence. Some connections with sourcecoding theorems are also investigated.
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, ca...
详细信息
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2...X-n;that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O (n(-nu)) for some 0 < v < 1/2 as n --> infinity. As a matter of fact, we show that we can predict K = O(1) symbols with the same probability of error.
暂无评论