A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes,...
详细信息
A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes, maximum decimation matching, hierarchical structure of multidimensional sourcecoding, and dictionary augmentation. As a counterpart of the longest match search in the Lempel-Ziv algorithm, two classes of maximum decimation matching are studied. Also, an underlying behavior of the dictionary augmentation scheme for estimating the source statistics is examined. For an m-dimensional source, m augmentative patches are appended into the dictionary at each coding epoch, thus requiring the transmission of a substantial amount of information to the decoder. The property of the hierarchical structure of the sourcecoding algorithm resolves this issue by successively incorporating lower dimensional coding procedures in the scheme. In regard to universal lossy source coders, we propose two distortion functions, the local average distortion and the local minimax distortion with a set of threshold levels for each source symbol. For performance evaluation, we implemented three image compression algorithms based upon the MDIP;one is lossless and the others are lossy. The lossless image compression algorithm does not perform better than the Lempel-Ziv-Welch coding, but experimentally shows efficiency in capturing the source structure. The two lossy image compression algorithms are implemented using the two distortion functions, respectively. The algorithm based on the local average distortion is efficient at minimizing the signal distortion, but the images by the one with the local minimax distortion have a good perceptual fidelity among other compression algorithms. Our insights inspire future research on feature extraction of multidimensional discrete sources.
We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independ...
详细信息
ISBN:
(纸本)9781479951864
We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N/B). Its redundancy is approximately B log(N/B) bits above Rissanen's lower bound on universalcoding performance, with respect to any tree source whose maximal depth is at most log(N/B).
A fixed-rate universal lossy coding scheme is introduced for independent and identically distributed (i.i.d.) sources, It is shown for finite alphabet sources and arbitrary single letter distortion measures that as th...
详细信息
A fixed-rate universal lossy coding scheme is introduced for independent and identically distributed (i.i.d.) sources, It is shown for finite alphabet sources and arbitrary single letter distortion measures that as the sample size n grows the expected distortion obtained using this universal scheme converges to Shannon's distortion rate function D(R) at a rate O (log n/n). The scheme can be extended to universal quantization of real i.i.d sources subject to a squared error criterion, It is shown in this case that the per-letter distortion converges to D(R) at a rate O(root log n/n) both in expectation and almost surely for any real-valued bounded i.i.d. source.
The problem of universal source coding for binary trees is considered. Zhang, Yang, and Kieffer derived upper bounds on the average-case redundancy of codes based on directed acyclic graph (DAG) compression for binary...
详细信息
The problem of universal source coding for binary trees is considered. Zhang, Yang, and Kieffer derived upper bounds on the average-case redundancy of codes based on directed acyclic graph (DAG) compression for binary tree sources with certain properties. In this paper, a natural class of binary tree sources is presented such that the demanded properties are fulfilled. Moreover, for both subclasses considered in the paper of Zhang, Yang, and Kieffer, their result is improved by deriving bounds on the maximal pointwise redundancy (or worst-case redundancy) instead of the average-case redundancy. Finally, using context-free tree grammars instead of DAGs, upper bounds on the maximal pointwise redundancy for certain binary tree sources are derived. This yields universal codes for new classes of binary tree sources.
The problem of universal source coding for binary trees is considered. Zhang, Yang, and Kieffer derived upper bounds on the average-case redundancy of codes based on directed acyclic graph (DAG) compression for binary...
详细信息
The problem of universal source coding for binary trees is considered. Zhang, Yang, and Kieffer derived upper bounds on the average-case redundancy of codes based on directed acyclic graph (DAG) compression for binary tree sources with certain properties. In this paper, a natural class of binary tree sources is presented such that the demanded properties are fulfilled. Moreover, for both subclasses considered in the paper of Zhang, Yang, and Kieffer, their result is improved by deriving bounds on the maximal pointwise redundancy (or worst-case redundancy) instead of the average-case redundancy. Finally, using context-free tree grammars instead of DAGs, upper bounds on the maximal pointwise redundancy for certain binary tree sources are derived. This yields universal codes for new classes of binary tree sources.
Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space,...
详细信息
Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling of deep learning as a service (DLaaS) and use for on-device intelligence. This work is concerned with finding universal lossless compressed representations of deep feedforward networks with synaptic weights drawn from discrete sets, and directly performing inference without full decompression. The basic insight that allows less rate than naive approaches is recognizing that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation. We provide efficient algorithms to dissipate this irrelevant uncertainty and then use arithmetic coding to nearly achieve the entropy bound in a universal manner. We also provide experimental results of our approach on several standard datasets.
The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) ...
详细信息
The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) source. CSE is applicable to two-dimensional (2D) sources, such as images, by dealing with a line of pixels of a 2D source as a symbol of an extended alphabet. At the initial step of CSE encoding process, we need to output the number of occurrences of all symbols of the extended alphabet, so that the time complexity increases exponentially when the size of source becomes large. To reduce computational time, we can rearrange pixels of a 2D source into a 1D source string along a space-filling curve like a Hilbert curve. However, information on adjacent cells in a 2D source may be lost in the conversion. To reduce the time complexity and compress a 2D source without converting to a 1D source, we propose a new CSE which can encode a 2D source in a block-by-block fashion instead of in a line-by-line fashion. The proposed algorithm uses the flat torus of an input 2D source as a probabilistic model instead of the circular string of the source. Moreover, we prove the asymptotic optimality of the proposed algorithm for 2D general sources.
Motivated from the fact that universal source coding on countably infinite alphabets (infinity-alphabets) is not feasible, this work introduces the notion of "almost lossless sourcecoding". Analog to the we...
详细信息
Motivated from the fact that universal source coding on countably infinite alphabets (infinity-alphabets) is not feasible, this work introduces the notion of "almost lossless sourcecoding". Analog to the weak variable-length sourcecoding problem studied by Han (IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1217-1226, Jul. 2000), almost lossless sourcecoding aims at relaxing the lossless block-wise assumption to allow an average per-letter distortion that vanishes asymptotically as the block-length tends to infinity. In this setup, we show on one hand that Shannon entropy characterizes the minimum achievable rate (similarly to the case of finite alphabet sources) while on the other that almost lossless universal source coding becomes feasible for the family of finite-entropy stationary memoryless sources with infinity-alphabets. Furthermore, we study a stronger notion of almost lossless universality that demands uniform convergence of the average per-letter distortion to zero, where we establish a necessary and sufficient condition for the so-called family of "envelope distributions" to achieve it. Remarkably, this condition is the same necessary and sufficient condition needed for the existence of a strongly minimax (lossless) universalsource code for the family of envelope distributions. Finally, we show that an almost lossless coding scheme offers faster rate of convergence for the (minimax) redundancy compared to the well-known information radius developed for the lossless case at the expense of tolerating a non-zero distortion that vanishes to zero as the block-length grows. This shows that even when lossless universality is feasible, an almost lossless scheme can offer different regimes on the rates of convergence of the (worst case) redundancy versus the (worst case) distortion.
universal fixed-to-variable lossless sourcecoding for memoryless sources is studied in the finite blocklength and higher order asymptotic regimes. Optimal three-term fixed-error asymptotic expressions are derived for...
详细信息
universal fixed-to-variable lossless sourcecoding for memoryless sources is studied in the finite blocklength and higher order asymptotic regimes. Optimal three-term fixed-error asymptotic expressions are derived for general fixed-to-variable codes and for prefix codes. It is shown that the non-prefix Type Size code, in which codeword lengths are chosen in ascending order of type class size, achieves the optimal third-order term, and outperforms classical two-stage codes. Converse results are proved making use of a result on the distribution of the empirical entropy and Laplace's approximation. Finally, the fixed-to-variable coding problem without a prefix constraint is shown to be essentially the same as the universal guessing problem.
universal fixed-to-variable lossless sourcecoding for memoryless sources is studied in the finite blocklength and higher order asymptotic regimes. Optimal three-term fixed-error asymptotic expressions are derived for...
详细信息
universal fixed-to-variable lossless sourcecoding for memoryless sources is studied in the finite blocklength and higher order asymptotic regimes. Optimal three-term fixed-error asymptotic expressions are derived for general fixed-to-variable codes and for prefix codes. It is shown that the non-prefix Type Size code, in which codeword lengths are chosen in ascending order of type class size, achieves the optimal third-order term, and outperforms classical two-stage codes. Converse results are proved making use of a result on the distribution of the empirical entropy and Laplace's approximation. Finally, the fixed-to-variable coding problem without a prefix constraint is shown to be essentially the same as the universal guessing problem.
暂无评论