A new analysis shows that, when we apply the lempel-ziv incremental parsing algorithm to an i.i.d., source with probabilities pi, i = 1, . . ., m, the expected length E\W(t)\ of the tth parsed segment W(t) is given by...
详细信息
A new analysis shows that, when we apply the lempel-ziv incremental parsing algorithm to an i.i.d., source with probabilities pi, i = 1, . . ., m, the expected length E\W(t)\ of the tth parsed segment W(t) is given by ( t/n ) SIGMA(n = 1)t (-1) n - 1 PI(l = 2)n (1 - SIGMA(i = 1)m p(i)l), with the null product being 1. It is also shown that lim(t --> infinity) E\W(t)\/logt = H(X)-1 for the same source with the entropy H(X). This gives a variable-to-fixed length (VF) scheme version of the ziv-lempel universal coding theorem.
An upper bound on the probability of a sequence drawn from a finite-state source is derived. The bound is given in terms of the number of phrases obtained by parsing the sequence according to the lempel-ziv (L-Z) incr...
详细信息
An upper bound on the probability of a sequence drawn from a finite-state source is derived. The bound is given in terms of the number of phrases obtained by parsing the sequence according to the lempel-ziv (L-Z) incremental parsing rule, and is universal in the sense that it does not depend on the statistical parameters that characterize the source. This bound is used to derive an upper bound on the redundancy of the L-Z universal data compression algorithm applied to finite-state sources, that depends on the length N of the sequence, on the number K of states of the source, and, eventually, on the source entropy. A variation of the L-Z algorithm is presented, and an upper bound on its redundancy is derived for finite-state sources. A method to derive tighter implicit upper bounds on the redundancy of both algorithms is also given, and it is shown that for the proposed variation this bound is smaller than for the original L-Z algorithm, or every value of N and K.
We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the lempel-ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state ...
详细信息
We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the lempel-ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.
ziv and lempel proposed two important universal coding algorithms in 1977 and 1978. While the second algorithm, called LZ78, has been sufficiently analyzed in the literature, the first, LZ77, has not yet. LZ77 parses ...
详细信息
ziv and lempel proposed two important universal coding algorithms in 1977 and 1978. While the second algorithm, called LZ78, has been sufficiently analyzed in the literature, the first, LZ77, has not yet. LZ77 parses input data into a sequence of phrases, each of which is the longest match in a fixed-sized sliding window which consists of the previously encoded M symbols. Each phrase is replaced by a pointer to denote the longest match in the window. Then, a window slides to just before the next symbol to be encoded, and so on. In this paper, we modify the algorithm of LZ77 to restrict pointers to starting only at the boundary of a previously parsed phrase in a window. Although the number of parsed phrases should increase more than those in LZ77, the amount of bits needed to encode pointers is considerably reduced since the number of possible positions to be encoded is much smaller. Then, we show that for any stationary finite state source, the modified LZ77 code is asymptotically optimal with the convergence rate O(log log M / log M) where M is the size of a sliding window.
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple unive...
详细信息
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as fast as n-1 log n, where n is the sample size. For the case of finite-alphabet observations, the class of schemes that can be implemented by finite-state machines (FSM's), is studied. It is shown that Markovian machines with sufficiently long memory exist that are asymptotically nearly as good as any given FSM (deterministic or randomized) for the purpose of sequential decision. For the continuous-valued observation case, a useful class of parametric schemes is discussed with special attention to the recursive least squares (RLS) algorithm.
A new notion of empirical informational divergence (relative entropy) between two individual sequences is introduced. If the two sequences are independent realizations of two finite-order, finite alphabet, stationary ...
详细信息
A new notion of empirical informational divergence (relative entropy) between two individual sequences is introduced. If the two sequences are independent realizations of two finite-order, finite alphabet, stationary Markov processes, the empirical relative entropy converges to the relative entropy almost surely. This new empirical divergence is based on a version of the lempel-ziv data compression algorithm. A simple universal classification algorithm for individual sequences into a finite number of classes which is based on the empirical divergence, is introduced. It discriminates between the classes whenever they are distinguishable by some finite-memory classifier, for almost every given training sets and almost any test sequence from these classes. It is universal in the sense of being independent of the unknown sources.
A new application of symbolic substitution is presented for string matching with the goal of data compression. A temporal sequence of input symbols is mapped onto a two-dimensional array that contains a tree structure...
详细信息
A new application of symbolic substitution is presented for string matching with the goal of data compression. A temporal sequence of input symbols is mapped onto a two-dimensional array that contains a tree structure, which in turn is mapped into another array for string generation. A symbolic substitution system and the necessary rules are developed to implement the mapping of the input mapping and the generation of an output sequence. A nonadaptive scheme of compression and decompression is described first, followed by an adaptive scheme with additional rules for the dynamic adaptation process.
The problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finite-state predictability of an infinite sequence is defined as the minimum fraction of prediction ...
详细信息
The problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finite-state predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finite-state (FS) predictor. It is proved that this FS predictability can be attained by universal sequential prediction schemes. Specifically, an efficient prediction procedure based on the incremental parsing procedure of the lempel-ziv data compression algorithm is shown to achieve asymptotically the FS predictability. Finally, some relations between compressibility and predictability are pointed out, and the predictability is proposed as an additional measure of the complexity of a sequence.
The problem of estimating the number of states of a finite-alphabet, finite-state source is investigated. An estimator is developed that asymptotically attains the minimum probability of underestimating the number of ...
详细信息
The problem of estimating the number of states of a finite-alphabet, finite-state source is investigated. An estimator is developed that asymptotically attains the minimum probability of underestimating the number of states, among all estimators with a prescribed exponential decay rate of overestimation probability. The proposed estimator relies on the lempel-ziv data compression algorithm in an intuitively appealing manner.
Recently, there has been an interest in increasing the capacity of storage systems through the use of information lossless data compression. Several algorithms are being investigated and implemented. One such algorith...
详细信息
Recently, there has been an interest in increasing the capacity of storage systems through the use of information lossless data compression. Several algorithms are being investigated and implemented. One such algorithm is the lempel-ziv data compression algorithm. A new data compression algorithm is presented based on the original lempel-ziv algorithm and bounds are derived showing the improved lempel-ziv algorithm's performance as compared to the original lempel-ziv algorithm's performance. In addition, both algorithms are implemented in software and are compared with each other as well as with the lempel-ziv-Welch data compression algorithm.
暂无评论