A new general formulation of huffman tree construction is presented which has broad application. Recall that the huffman algorithm forms a tree, in which every node has some associated weight, by specifying at every s...
详细信息
A new general formulation of huffman tree construction is presented which has broad application. Recall that the huffman algorithm forms a tree, in which every node has some associated weight, by specifying at every step of the construction which nodes are to be combined to form a new node with a new combined weight. We characterize a wide class of weight combination functions, the quasilinear functions, for which the huffman algorithm produces optimal trees under correspondingly wide classes of cost criteria. In addition, known results about huffman tree construction and related concepts from information theory and from the theory of convex functions are tied together. Suggestions for possible future applications are given.
In digital communications, it is necessary to compress the data for a faster and more reliable transmission. As such, the data should undergo source encoding, also known as data compression, which is the process by wh...
详细信息
ISBN:
(纸本)9781509003600
In digital communications, it is necessary to compress the data for a faster and more reliable transmission. As such, the data should undergo source encoding, also known as data compression, which is the process by which data are compressed into a fewer number of bits, before transmission. Also, source encoding is essential to limit file sizes for data storage. Two of the most common and most widely used source encoding techniques are the huffman algorithm and Lempel-Ziv algorithm. The main objective of this research is to identify which technique is better in text, image and audio compression applications. The files for each data type were converted into bit streams using an analog-to-digital converter and pulse code modulation. The bit streams underwent compression through both compression algorithms and the efficiency of each algorithm is quantified by measuring their compression ratio for each data type.
The conflict between ever-increasing volumes of microscan imager logging data and limited cable transmission bandwidth intensifying day by day. In this paper, an improved lossless data compression algorithm is propose...
详细信息
ISBN:
(纸本)9783662450482
The conflict between ever-increasing volumes of microscan imager logging data and limited cable transmission bandwidth intensifying day by day. In this paper, an improved lossless data compression algorithm is proposed. Specifically, according to the characteristics of the micro resistivity imaging logging data, it is proved that Hex character encoding has better compressibility than decimal character encoding. Then, it analyzed that traditional quaternary huffman algorithm does not be fully applicable to microscan imager logging data. Lastly, it employed improved quaternary huffman algorithm for logging data compression so as to enhance the data compression ratio. The experiment comparsions show that, compared to the convention quaternary algorithm and the improved compressed huffman encoding, Both elapsed time and compression ratio are a great improvement.
Learning text representation is forming a core for numerous natural language processing applications. Word embedding is a type of text representation that allows words with similar meaning to have similar representati...
详细信息
Learning text representation is forming a core for numerous natural language processing applications. Word embedding is a type of text representation that allows words with similar meaning to have similar representation. Word embedding techniques categorize semantic similarities between linguistic items based on their distributional properties in large samples of text data. Although these techniques are very efficient, handling semantic and pragmatics ambiguity with high accuracy is still a challenging research task. In this article, we propose a new feature as a semantic score which handles ambiguities between words. We use external knowledge bases and the huffman Coding algorithm to compute this score that depicts the semantic relatedness between all fragments composing a given text. We combine this feature with word embedding methods to improve text representation. We evaluate our method on a hashtag recommendation system in Twitter where text is noisy and short. The experimental results demonstrate that, compared with state-of-the-art algorithms, our method achieves good results.
An improved huffman algorithm is proposed in order to take up as litle as storage space during the process of transmiting data remotely and transmitting more useful information in the limited channel capacity in 120 e...
详细信息
An improved huffman algorithm is proposed in order to take up as litle as storage space during the process of transmiting data remotely and transmitting more useful information in the limited channel capacity in 120 emergency treatment ambulance terminal *** multi-channel simultaneous data acquisition and compression system based on DSP and FPGA technology are designed,which are composed of multi-channel data acquisition module,DSP data processing module,ambulance interface *** research result indicates that the presented huffman algorithm can reduce the requirement power and bandwidth in the compressing data technology and raise communication efficiency compared with LZW *** the multi-channel data acquisition and transmission rate are improved though the DSP embedded data compression algorithm,and the system performance is safety and reliable for 120 first aid dispatch and command system.
A new type of sufficient condition is provided for a probability distribution on the nonnegative integers to be given an optimal D-ary prefix code by a huffman-type algorithm, In the justification of our algorithm, we...
详细信息
A new type of sufficient condition is provided for a probability distribution on the nonnegative integers to be given an optimal D-ary prefix code by a huffman-type algorithm, In the justification of our algorithm, we introduce two new (essentially one) concepts as the definition of the ''optimality'' of a prefix D-ary code, which are shown to be equivalent to that defined in the traditional way, These new concepts of the optimality are meaningful even for the case where the Shannon entropy H(P) diverges.
Based on a rearrangement inequality by Hardy, Littlewood, and Polya, we define two-operator algebras for independent random variables. These algebras are called huffman algebras since the huffman algorithm on these al...
详细信息
Based on a rearrangement inequality by Hardy, Littlewood, and Polya, we define two-operator algebras for independent random variables. These algebras are called huffman algebras since the huffman algorithm on these algebras produces an optimal binary tree that minimizes the weighted lengths of leaves. Many examples of such algebras are given. For the case with random weights of the leaves, we prove the optimality of the tree constructed by the power-of-2 rule, i.e., the huffman algorithm assuming identical weights, when the weights of the leaves are independent and identically distributed.
Forests constructed by the binary huffman and Hu-Tucker algorithms solve parallelized search problems. Bounds on the resulting minimum average search lengths for items occurring with given probabilities are established.
Forests constructed by the binary huffman and Hu-Tucker algorithms solve parallelized search problems. Bounds on the resulting minimum average search lengths for items occurring with given probabilities are established.
Let P = {p(i)} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the huffman algorithm, there are nontrivial P for which know...
详细信息
Let P = {p(i)} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the huffman algorithm, there are nontrivial P for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, beta-exponential means, those of the form log(a) Sigma(i)p(i)a(n(i)), where n(i) is the length of the ith codeword and a is a positive constant. Applications of such minimizations include a novel problem of maximizing the chance of message receipt in single-shot communications (a < 1) and a previously known problem of minimizing the chance of buffer overflow in a queueing system (a > 1). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions and both are extended to alphabetic codes, as well as to minimizing maximum pointwise redundancy. The aforementioned application of minimizing the chance of buffer overflow is also considered.
We propose a lossless compression method based on a chain code composed of only three symbols. The method is applicable to compress 2-D binary object shapes, and it consists of representing the orthogonal direction ch...
详细信息
We propose a lossless compression method based on a chain code composed of only three symbols. The method is applicable to compress 2-D binary object shapes, and it consists of representing the orthogonal direction changes of the discrete contour, corresponding to each object binary shape, by three bits of a chain code. According to our experimental results, we find that this method is suitable for the representation of bilevel images. The results are about 25% more efficient in compression than the Freeman chain code method, and an average of 29% better than the Joint Bilevel Image Experts Group (JBIG) compressor. (c) 2005 society of Photo-Optical Instrumentation Engineers.
暂无评论