In this paper, we introduce a new concept called context-free bipartite grammar (CFBG) and present a framework wherein large sparse binary matrices can be compactly represented by CFBGs. Similar to the traditional con...
详细信息
ISBN:
(纸本)9784885523090
In this paper, we introduce a new concept called context-free bipartite grammar (CFBG) and present a framework wherein large sparse binary matrices can be compactly represented by CFBGs. Similar to the traditional concept of context-free grammar (CFG), a CFBG consists of a set of production rules. Unlike CFGs, however, the right member of each production rule in a CFBG is a labeled bipartite graph with each edge labeled either as a variable or terminal symbol. Given a CFBG, start with its initial variable and repeatedly expand each variable labeled edge by first deleting that edge and then inserting in some manner all edges contained in the right member of that variable. The CFBG is admissible if the edge expansion process leads to a unique bipartite graph containing only terminal symbol labeled edges, in which case the CFBG is said to represent the matrix equal to the biadjacency matrix of the unique graph. Two bipartite grammar transforms, a sequential D-neighborhood pairing transform and an iterative pairing transform (IPT), are further presented to convert any binary matrix into a CFBG representing it. Experiments show that compared with popular sparse matrix storage methods such as compressed row storage and quadtree, CFBGs obtained by IPT can reduce the storage of sparse matrices significantly (by a factor of as much as 68). Index Terms grammar-based coding, sparse matrices, bipartite graphs, context-free grammars, bipartite grammars.
In this paper, we investigate the performance of grammar-based codes for sources with countably infinite alphabets. Let Lambda denote an arbitrary class of stationary, ergodic sources with a countably infinite alphabe...
详细信息
In this paper, we investigate the performance of grammar-based codes for sources with countably infinite alphabets. Let Lambda denote an arbitrary class of stationary, ergodic sources with a countably infinite alphabet. It is shown that grammar-based codes can be modified so that they are universal with respect to any Lambda if and only if there exists a universal code for Lambda. Moreover, upper bounds on the worst case redundancies of grammar-based codes among large sets of length-n individual sequences from a countably infinite alphabet, are established. Depending upon the conditions satisfied by length-n individual sequences, these bounds range from O(log log n/ log n) to O(1/ log(1-alpha) n) for some 0 < alpha < 1. These results complement the previous universality and redundancy results in the literature on the performance of grammar-based codes for sources with finite alphabets.
The compression performance of grammar-based codes is revisited from a new perspective. Previously, the compression performance of grammar-based codes was evaluated against that of the best arithmetic coding algorithm...
详细信息
The compression performance of grammar-based codes is revisited from a new perspective. Previously, the compression performance of grammar-based codes was evaluated against that of the best arithmetic coding algorithm with finite contexts. In this correspondence, we first define semifinite-state sources and finite-order semi-Markov sources. based on the definitions of semifinite-state sources and finite-order semi-Markov sources, and the idea of run-length encoding (RLE), we then extend traditional RLE algorithms to context-based RLE algorithms: RLE algorithms with k contexts and RLE algorithms of order k, where k is a nonnegative integer. For each individual sequence x, let r(sr,k)*(x) and r(sr/k)*(x) be the best compression rate given by RLE algorithms;kith k contexts and by RLE algorithms of order k, respectively. It is proved that for any x, r(sr,k)* is no greater than the best compression rate among all arithmetic coding algorithms with k contexts. Furthermore, it is shown that there exist stationary, ergodic semi-Markov sources for which the best RLE algorithms without any context outperform the best arithmetic coding algorithms with any finite number of contexts. Finally, we show that the worst case redundancies of grammar-based codes against r(sr,k)*(x) and r(sr/k)*(x) among all length-n individual sequences x from a finite alphabet are upper-bounded by d(1) log log n / log n and d(2) log log n / log n, respectively, where d(1) and d(2) are constants. This redundancy result is stronger than all previous corresponding results.
In this paper, the multilevel pattern matching (MPM) code for compression of one-dimensional (I-D) sequences is first generalized to compress two-dimensional (2-D) images, resulting in. a 2-D multilevel pattern matchi...
详细信息
In this paper, the multilevel pattern matching (MPM) code for compression of one-dimensional (I-D) sequences is first generalized to compress two-dimensional (2-D) images, resulting in. a 2-D multilevel pattern matching (MPM) code. It is shown that among all images of n pixels, the worst case redundancy of the 2-D MPM code against any finite-template-based arithmetic code is O(1/rootlogn). This result contrasts unfavorably with the fact that among all I-D sequences of length n, the MPM code has a worst case redundancy of O(1/log n) against any finite-state arithmetic code;this is caused by the so-called 2-D boundary effect. To alleviate the 2-D boundary effect, we extend the 2-D MPM code to the case of context modeling, yielding a context-dependent 2-D MPM code. It is shown that among all images of n pixels, the context-dependent 2-D MPM code has an O(1/log n) worst case redundancy against any finite-template-based arithmetic code satisfying a mild condition;this redundancy is better than that of the 2-D MPM code without context models. Experimental results demonstrate that the context-dependent 2-D MPM code significantly outperforms the 2-D MPM code without context models for bi-level images. It is also demonstrated that, in terms of compression rates, the context-dependent 2-D. MPM code performs significantly better than the progressive coding mode of JBIG1 for both textual and bi-level images, and better than or comparably to the sequential coding mode of JBIG1 and JBIG2. In addition to, its excellent compression performance, the context-dependent 2-D MPNI code allows progressive transmission of images.
The concept of context-free grammar (CFG)-basedcoding is extended to the case of countable-context models, yielding context-dependent grammar (CDG)-basedcoding. Given a countable-context model, a greedy CDG transfor...
详细信息
The concept of context-free grammar (CFG)-basedcoding is extended to the case of countable-context models, yielding context-dependent grammar (CDG)-basedcoding. Given a countable-context model, a greedy CDG transform is proposed. based on this greedy CDG transform, two universal lossless data compression algorithms, an improved sequential context-dependent algorithm and a hierarchical context-dependent algorithm, are then developed. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source with a finite alphabet. Moreover, it is proved that these algorithms' worst case redundancies among all individual sequences of length n from a finite alphabet are upper-bounded by d log log n/log n, as long as the number of distinct contexts grows with the sequence length n in the order of O(n(a)), where 0 < a < 1 and d are positive constants. It is further shown that for some nonstationary sources, the proposed context-dependent algorithms can achieve better expected redundancies than any existing CFG-based codes, including the Lempel-Ziv algorithm, the multilevel pattern matching algorithm, and the context-free algorithms in Part I of this series of papers.
暂无评论