This paper presents a compression algorithm based on wavelets for electrocardiograms. Subband coding, an important feature of wavelet transforms, is exploited for the proposed method in order to enhance the compressio...
详细信息
This paper presents a compression algorithm based on wavelets for electrocardiograms. Subband coding, an important feature of wavelet transforms, is exploited for the proposed method in order to enhance the compression ratio. The algorithm can be regarded as an adaptive one because several threshold levels are applied on the same signal. The results are accurate and the proposed compression system can be used to design a real-time remote monitoring system of the patient.
This paper presents an experimental study of TCPHC, a novel algorithm for compressing the Transmission Control Protocol (TCP) header to reduce its overhead in IPv6-enabled Low-power and Lossy Networks (6LoWPANs). Resu...
详细信息
This paper presents an experimental study of TCPHC, a novel algorithm for compressing the Transmission Control Protocol (TCP) header to reduce its overhead in IPv6-enabled Low-power and Lossy Networks (6LoWPANs). Results show that TCPHC outperforms TCP both in low-loss and high-loss networks. In fact, TCPHC can reduce the TCP header to 6 bytes in more than 95% of the cases. Moreover, experimental results show that our TCP header compression algorithm reduces the energy consumption by up to 15%.
One of the main challenges of modern computer systems is to overcome the ever more prominent limitations of disk I/O and memory bandwidth, which today are thousands-fold slower than computational speeds. In this paper...
详细信息
One of the main challenges of modern computer systems is to overcome the ever more prominent limitations of disk I/O and memory bandwidth, which today are thousands-fold slower than computational speeds. In this paper, we investigate reducing memory bandwidth and overall I/O and memory access times by using multithreaded compression and decompression of large datasets. Since the goal is to achieve a significant overall speedup of I/O, both level of compression achieved and efficiency of the compression and decompression algorithms, are of importance. Several compression methods for efficient disk access for large seismic datasets are implemented and empirically tested on on several modern CPUs and GPUs, including the Intel i7 and NVIDIA c2050 GPU. To reduce I/O time, both lossless and lossy symmetrical compression algorithms as well as hardware alternatives, are tested. Results show that I/O speedup may double by using an SSD vs. HDD disk on larger seismic datasets. Lossy methods investigated include variations of DCT-based methods in several dimensions, and combining these with lossless compression methods such as RLE (Run-Length Encoding) and Huffman encoding. Our best compression rate (0.16%) and speedups (6 for HDD and 3.2 for SSD) are achieved by using DCT in 3D and combining this with a modified RLE for lossy methods. It has an average error of 0.46% which is very acceptable for seismic applications. A simple predictive model for the execution time is also developed and shows an error of maximum 5% vs. our obtained results. It should thus be a good tool for predicting when to take advantage of multithreaded compression. This model and other techniques developed in this paper should also be applicable to several other data intensive applications.
This paper introduces a new Lossless Segment Based DNA compression (LSBD) method for compressing the DNA sequences. It stores the individual gene position in a compressed file. Since LSBD method performs a gene wise c...
详细信息
This paper introduces a new Lossless Segment Based DNA compression (LSBD) method for compressing the DNA sequences. It stores the individual gene position in a compressed file. Since LSBD method performs a gene wise compression, further processing of compressed data reduces memory usage. The biggest advantage of this algorithm is that it enables part by part decompression and can work on any sized data. Here the method identifies individual gene location and then constructs triplets that are mapped to an eight bit number. The individual gene information is stored in a pointer table and a pointer is provided to corresponding location in the compressed file. The LSBD technique appropriately compresses the non-base characters and performs well on repeating sequences.
In this paper we propose a BWT-based LZW algorithm for reducing the compressed size and the compression time. BWT and MTF can expose potential redundancies in a given input and then significantly improve the compressi...
详细信息
In this paper we propose a BWT-based LZW algorithm for reducing the compressed size and the compression time. BWT and MTF can expose potential redundancies in a given input and then significantly improve the compression ratio of LZW. In order to avoid the poor matching speed of LZW on long runs of the same character, we propose a variant of RLE named RLE-N. RLE-N does not affect the compression ratio, but it contributes LZW to reduce the execution time obviously. The experimental results show that our algorithm performs well on normal files.
The usual way of ensuring the confidentiality of the compressed data is to encrypt it with a standard encryption algorithm such as the AES. However, the cost of encryption not only brings an additional computational c...
详细信息
The usual way of ensuring the confidentiality of the compressed data is to encrypt it with a standard encryption algorithm such as the AES. However, the cost of encryption not only brings an additional computational complexity, but also lacks the flexibility to perform pattern matching on the compressed data, which is an active research topic in stringology. In this study, we investigate the secure compression solutions, and propose a practical method to keep contents of the compressed data hidden. The method is based on the Burrows - Wheeler transform ({BWT}) such that a randomly selected permutation of the input symbols are used as the lexicographical ordering during the construction. The motivation is the observation that on BWT of an input data it is not possible to perform a successful search nor construct any part of it without the correct knowledge of the character ordering. %Capturing that secret ordering from the BWT is hard. The proposed method is supposed to be is an elegant alternative to the standard encryption approaches with the advantage of supporting the compressed pattern matching, while still pertaining the confidentiality. When the input data is homophonic such that the frequencies of the symbols are flat and the alphabet is sufficiently large, it is possible to unify compression and security in a single framework with the proposed technique instead of the two - level compress - then - encrypt paradigm.
To improve the storage space of finite automata on regular expression matching, the paper researches the main idea of the delayed input DFA algorithm based on bounding default path, and analyses the algorithm problem ...
详细信息
To improve the storage space of finite automata on regular expression matching, the paper researches the main idea of the delayed input DFA algorithm based on bounding default path, and analyses the algorithm problem when bounding small length default path. Then we propose optimized algorithm based on weight first principle and node first principle and assess them on the actual rule set, the results show that the optimized algorithm could effectively improve the compression ratio when the default path is bounded small.
Data compression algorithms were usually designed for data processing symbol by symbol. The input symbols of these algorithms are usually taken from the ASCII table, i.e. the size of the input alphabet is 256 symbols ...
详细信息
Data compression algorithms were usually designed for data processing symbol by symbol. The input symbols of these algorithms are usually taken from the ASCII table, i.e. the size of the input alphabet is 256 symbols which are representable by 8-bit numbers. Several other techniques were developed-syllable-based compression, which uses the syllable as a basic compression symbol, and word-based compression, which uses words as basic symbols. These three approaches are strictly bounded and no overlap is allowed. This may be a problem because it may be helpful to have an overlap between them and use a character-based approach with a few symbols as a sequence of characters. This paper describes an algorithm that looks for the optimal alphabet for different text files. The alphabet may contain characters and 2-grams.
Currently, a large number of web sites are generated from web templates so as to improve the productivity of web sites construction. However, the prevalence of web templates has a negative impact on the efficiency of ...
详细信息
Currently, a large number of web sites are generated from web templates so as to improve the productivity of web sites construction. However, the prevalence of web templates has a negative impact on the efficiency of search engine in many aspects, including the relevance judgment of web IR and resource usage of analysis tool. In this paper, we present a direct and fast method to detect pages of the same template by DOM tree characteristics. After analyzing and compressing DOM tree nodes of the HTML page, our method generates a hash value digest, also called fingerprint, for each page to identify its DOM structure. In addition, we also introduce some other page features to aid in judging the page template type. Through experimental evaluations over thirty thousand sub-domains, we show that our approach can obtain the analysis results rapidly but with a high accuracy rate above 95 percents.
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extend...
详细信息
ISBN:
(纸本)9781937284114
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.
暂无评论