The approximate set-membership data structures (ASMDS), like the Bloom filter and cuckoo filter, provide constant-time testing of set-membership. They produce false positives because of a loss of bits during compressi...
详细信息
ISBN:
(纸本)9781665414906
The approximate set-membership data structures (ASMDS), like the Bloom filter and cuckoo filter, provide constant-time testing of set-membership. They produce false positives because of a loss of bits during compression. However, in case all potential false positives are known (or can be evaluated), it is possible to use filter cascades and collectively eliminate such false positives. The application of filter cascading algorithm to the Bloom filter was originally proposed for optimizing memory usage and is currently an integral part of CRLLite. Recently proposed cuckoo filters function similarly to Bloom filters but with cuckoo hashing techniques. They produce comparatively lower storage overheads and additionally support efficient deletions. Therefore, applying the cascading algorithms to the cuckoo filter will also produce lower storage overheads in comparison to cascading Bloom filters. Further, cuckoo filter's support for deletions enable efficient updates to the filter cascades. In this paper, we present the design and analysis of cascading cuckoo filters, a potentially more space-optimal ASMDS in comparison to cascading Bloom filters. A novel contribution of this paper is the application of filter cascading algorithm to cuckoo filter, which has not been proposed before to the best of our knowledge.
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. Thi...
详细信息
ISBN:
(纸本)9781509042975
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. This is the view taken in the representation of relations in data-warehousing to support On-Line Analytical Processing (OLAP). The main drawback of such a storage scheme is that the equivalent array is typically a highly sparse multi-dimensional array with dominating null entries, and requires a storage scheme with high compression scheme that retains the significant non-null elements. We introduce, analyse and compare the performances of some storage schemes for Multi-Dimensional Sparse Arrays (MDSAs). We first introduce a previously known method called Bit Encoded Sparse Storage (BESS) and then introduce four new storage schemes namely;Patricia trie compressed storage (PTCS), extended compressed row storage (xCRS), bit encoded compressed row storage (BxCRS) and a hybrid storage scheme (Hybrid), that combines the two methods of BESS and xCRS. The performances of these storage schemes are compared with respect to their compression ratios and computational efficiency for accessing an element, retrieving sub-array elements and computing aggregate functions and other analytic functions. We focus primarily on the aggregate function of summation of sub-array elements in this paper. The results show that xCRS, BxCRS, Hybrid and BESS can achieve compression ratios of less than 40% for MDSAs with more than 80% sparsity. The BESS storage scheme gives the best performance in computing multi-dimensional aggregates, for varying sparsity and dimensionality, compared with the other schemes. The key virtue of PTCS is that it is the only scheme that allows for insertions and deletions without reorganising the entire storage previously allocated.
Cache compression algorithms must abide by hardware constraints;thus, their efficiency ends up being low, and most cache lines end up barely compressed. Moreover, schemes that compress relatively well often decompress...
详细信息
ISBN:
(纸本)9781665432191
Cache compression algorithms must abide by hardware constraints;thus, their efficiency ends up being low, and most cache lines end up barely compressed. Moreover, schemes that compress relatively well often decompress slowly, and vice versa. This paper proposes a compression scheme achieving high (good) compaction ratio and fast decompression latency. The key observation is that by further subdividing the chunks of data being compressed one can tailor the algorithms. This concept is orthogonal to most existent compressors, and results in a reduction of their average compressed size. In particular, we leverage this concept to boost a single-cycle-decompression compressor to reach a compressibility level competitive to state-of-the-art proposals. When normalized against the best long decompression latency state-of-the-art compressors, the proposed ideas further enhance the average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.
Streaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks;therefore, new, robust, and highly efficient coding algorithms are necessary. One method o...
详细信息
Streaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks;therefore, new, robust, and highly efficient coding algorithms are necessary. One method overlooked to date, which can work alongside existing audio compression schemes, is that which takes into account the semantics and natural repetition of music. Similarity detection within polyphonic audio has presented problematic challenges within the field of music information retrieval. One approach to deal with bursty errors is to use self-similarity to replace missing segments. Many existing systems exist based on packet loss and replacement on a network level, but none attempt repairs of large dropouts of 5 seconds or more. Music exhibits standard structures that can be used as a forward error correction (FEC) mechanism. FEC is an area that addresses the issue of packet loss with the onus of repair placed as much as possible on the listener's device. We have developed a server-client-based framework (SoFI) for automatic detection and replacement of large packet losses on wireless networks when receiving time-dependent streamed audio. Whenever dropouts occur, SoFI swaps audio presented to the listener between a live stream and previous sections of the audio stored locally. Objective and subjective evaluations of SoFI where subjects were presented with other simulated approaches to audio repair together with simulations of replacements including varying lengths of time in the repair give positive results.
Enjoyment of audio has now become about flexibility and personal freedom. Digital audio content can be acquired from many sources and wireless networking allows digital media devices and associated peripherals to be u...
详细信息
Enjoyment of audio has now become about flexibility and personal freedom. Digital audio content can be acquired from many sources and wireless networking allows digital media devices and associated peripherals to be unencumbered by wires. However, despite recent improvements in capacity and quality of service, wireless networks are inherently unreliable communications channels for the streaming of audio, being susceptible to the effects of range, interference, and occlusion. This time-varying reliability of wireless audio transfer introduces data corruption and loss, with unpleasant audible effects that can be profound and prolonged in duration. Traditional communications techniques for error mitigation perform poorly and in a bandwidth inefficient manner in the presence of such large-scale defects in a digital audio stream. A novel solution that can complement existing techniques takes account of the semantics and natural repetition of music. Through the use of self-similarity metadata, missing or damaged audio segments can be seamlessly replaced with similar undamaged segments that have already been successfully received. We propose a technology to generate relevant self-similarity metadata for arbitrary audio material and to utilize this metadata within a wireless audio receiver to provide sophisticated and real-time correction of large-scale errors. The primary objectives are to match the current section of a song being received with previous sections while identifying incomplete sections and determining replacements based on previously received portions of the song. This article outlines our approach to Forward Error Correction (FEC) technology that is used to "repair" a bursty dropout when listening to time-dependent media on a wireless network. Using self-similarity analysis on a music file, we can "automatically" repair the dropout with a similar portion of the music already received thereby minimizing a listener's discomfort.
Sequence data repositories archive and disseminate fastq data in compressed format. In spite of having relatively lower compression efficiency, data repositories continue to prefer GZIP over available specialized fast...
详细信息
Sequence data repositories archive and disseminate fastq data in compressed format. In spite of having relatively lower compression efficiency, data repositories continue to prefer GZIP over available specialized fastq compression algorithms. Ease of deployment, high processing speed and portability are the reasons for this preference. This study presents FQC, a fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. This study also proposes a novel archival strategy which allows sequence repositories to simultaneously store and disseminate lossless as well as (multiple) lossy variants of fastq files, without necessitating any additional storage requirements. For academic users, Linux, Windows, and Mac implementations (both 32 and 64-bit) of FQC are freely available for download at: https://***/compression/FQC.
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. Thi...
详细信息
ISBN:
(纸本)9781509042982
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. This is the view taken in the representation of relations in data-warehousing to support On-Line Analytical Processing (OLAP). The main drawback of such a storage scheme is that the equivalent array is typically a highly sparse multi-dimensional array with dominating null entries, and requires a storage scheme with high compression scheme that retains the significant non-null elements. We introduce, analyse and compare the performances of some storage schemes for Multi-Dimensional Sparse Arrays (MDSAs). We first introduce a previously known method called Bit Encoded Sparse Storage (BESS) and then introduce four new storage schemes namely;Patricia trie compressed storage (PTCS), extended compressed row storage (xCRS), bit encoded compressed row storage (BxCRS) and a hybrid storage scheme (Hybrid), that combines the two methods of BESS and xCRS. The performances of these storage schemes are compared with respect to their compression ratios and computational efficiency for accessing an element, retrieving sub-array elements and computing aggregate functions and other analytic functions. We focus primarily on the aggregate function of summation of sub-array elements in this paper. The results show that xCRS, BxCRS, Hybrid and BESS can achieve compression ratios of less than 40% for MDSAs with more than 80% sparsity. The BESS storage scheme gives the best performance in computing multi-dimensional aggregates, for varying sparsity and dimensionality, compared with the other schemes. The key virtue of PTCS is that it is the only scheme that allows for insertions and deletions without reorganising the entire storage previously allocated.
暂无评论