In Spark, a massive amount of immediate data inevitably leads to excessive I/O overhead. To mitigate this issue, Spark incorporates four compression algorithms to reduce the size of the data for better performance. Ho...
详细信息
In Spark, a massive amount of immediate data inevitably leads to excessive I/O overhead. To mitigate this issue, Spark incorporates four compression algorithms to reduce the size of the data for better performance. However, compression and decompression only constitute a portion of the overall logical flows of Spark applications. This indicates a potential considerable interaction between compression algorithms and Spark applications regarding performance. Consequently, identifying factors that significantly impact the performance of compression algorithms in Spark, and subsequently, determining the actual performance benefits these algorithms provide to Spark applications, remains a significant challenge. To address the challenge, this paper presents a monitoring framework, named PAC, for conducting indepth and systematic p erformance a nalysis of c ompression algorithms in Spark. As the pioneer of such monitoring frameworks, PAC is built on top of Spark core and collaborates with multiple monitors to collect various types of performance metrics of compressors, correlates and integrates them into structured tuples by the data transformer in PAC. This makes it easier to diagnosis of factors that have a significant influence on the performance of compression algorithms in Spark. Upon utilizing PAC, our experiments reveal that new determinants include the input/output data sizes and types of compression/decompression invocations, CPU consumption for compressing a massive amount of data, and hardware utilization, besides traditional determinants. Moreover, these experiments demonstrate that ZSTD is more susceptible to performance issues when compressing and decompressing small data, despite the overall input and output data being huge. In terms of performance, LZ4 serves as a viable alternative to ZSTD. These findings not only benefit researchers and developers in making more informed decisions in terms of configuring and tuning Spark execution environments but also su
Parallel transmission (pTX) is a versatile solution to enable UHF MRI of the human body, where radiofrequency (RF) field inhomogeneity appears very challenging. Today, state of the art monitoring of the local SAR in p...
详细信息
Parallel transmission (pTX) is a versatile solution to enable UHF MRI of the human body, where radiofrequency (RF) field inhomogeneity appears very challenging. Today, state of the art monitoring of the local SAR in pTX consists in evaluating the RF power deposition on specific SAR matrices called Virtual Observation Points (VOPs). It essentially relies on accurate electromagnetic simulations able to return the local SAR distribution inside the body in response to any applied pTX RF waveform. In order to reduce the number of SAR matrices to a value compatible with real time SAR monitoring (<< 10(3)) , a VOP set is obtained by partitioning the SAR model into clusters, and associating a so- called dominant SAR matrix to every cluster. More recently, a clustering-free compression method was proposed, allowing for a significant reduction in the number of SAR matrices. The concept and derivation however assumed static RF shims and their extension to dynamic pTX is not straightforward, thereby casting doubt on the strict validity of the compression approach for these more complicated RF waveforms. In this work, we provide the mathematical framework to tackle this problem and find a rigorous justification of this criterion in the light of convex optimization theory. Our analysis led us to a variant of the clustering-free compression approach exploiting convex optimization. This new compression algorithm offers computational gains for large SAR models and for high-channel count pTX RF coils.
Today's industry is flooded with tracking data originating from vessels across the globe that transmit their position at frequent intervals. These voluminous and high-speed streams of data has led researchers to d...
详细信息
Today's industry is flooded with tracking data originating from vessels across the globe that transmit their position at frequent intervals. These voluminous and high-speed streams of data has led researchers to develop novel ways to compress them in order to speed-up processing without losing valuable information. To this end, several algorithms have been developed that try to compress streams of vessel tracking data without compromising their spatio-temporal and kinematic features. In this paper, we present a wide range of several well-known trajectory compression algorithms and evaluate their performance on data originating from vessel trajectories. Trajectory compression algorithms included in this research are suitable for either historical data (offline compression) or real-time data streams (online compression). The performance evaluation is three-fold and each algorithm is evaluated in terms of compression ratio, execution speed and information loss. Experiments demonstrated that each algorithm has its own benefits and limitations and that the choice of a suitable compression algorithm is application-dependent. Finally, considering all assessed aspects, the Dead-Reckoning algorithm not only presented the best performance, but it also works over streaming data, which constitutes an important criterion in maritime surveillance.
Image compression has always been a hot topic in the field of aerospace. The onboard processing unit, due to its background of deep space operation, has greatly limited performance in terms of data storage and data tr...
详细信息
ISBN:
(数字)9798331540043
ISBN:
(纸本)9798331540050
Image compression has always been a hot topic in the field of aerospace. The onboard processing unit, due to its background of deep space operation, has greatly limited performance in terms of data storage and data transmission. Satellite remote sensing images are developing towards larger size, higher frame frequency, and more frequency bands, which increasingly demand storage space and processing power. There is an urgent need for a set of efficient and lightweight image compression hardware and software. Therefore, in this work, we have improved the Context-Based Adaptive Lossless Image compression (CALIC) algorithm, including error quantization and entropy coding, to meet the requirements for on-orbit processing in terms of compression time and compression ratio. At the same time, aiming at the working scenario of on-orbit processing, we chose Field-Programmable Gate Array (FPGA) as the platform to deploy the improved CALIC and designed the arithmetic coding Av value calculation process in parallel on this basis, which greatly reduced the processing time on the hardware side. We have carried out simulation verification of this paper’s image compression algorithm on the KODAK24 and SZTAKI_Airchange datasets. The experimental results show that our method strikes a good balance between compression ratio and compression rate compared with typical compression algorithms, making it suitable for on-orbit processing scenarios.
Heterogeneous datasets are prevalent in big-data domains. However, compressing such datasets with a single algorithm results in suboptimal compression ratios. This paper investigates how machine-learning techniques ca...
详细信息
ISBN:
(数字)9798350385878
ISBN:
(纸本)9798350385885
Heterogeneous datasets are prevalent in big-data domains. However, compressing such datasets with a single algorithm results in suboptimal compression ratios. This paper investigates how machine-learning techniques can help by predicting an effective compression algorithm for each file in a heterogeneous dataset. In particular, we show how to train a very simple model using nothing but the compression ratios of a few algorithms as features. We named this technique "MLcomp". Despite its simplicity, it is very effective as our evaluation on nearly 9,000 files from a heterogeneous dataset and a library of over 100,000 compression algorithms demonstrates. Using MLcomp to pick one lossless algorithm from this library for each file yields an average compression ratio that is 97.8% of the best possible.
This paper evaluates innovation on two of the most widely used compression algorithms — the LZ77 and the LZ78 algorithms — for information compression and supply coding programs. A detailed assessment of these algor...
详细信息
ISBN:
(数字)9798350370249
ISBN:
(纸本)9798350370270
This paper evaluates innovation on two of the most widely used compression algorithms — the LZ77 and the LZ78 algorithms — for information compression and supply coding programs. A detailed assessment of these algorithms presents insights approximately their design philosophy, technical traits, and performance metrics. Additionally, the studies offer a complete overview of their packages and implementations. The paper investigates the prominent considerations for implementing compression algorithms and analyzes their relative blessings and downsides. An exploration of the constraints of every algorithm is likewise undertaken. Furthermore, the research explores distinctive design elements of the algorithms, such as data structure, compression engine, and set of rules optimization. Eventually, the paper assesses the algorithms by reading the exchange-offs between their performance and complexity. This study facilitates practitioners and developers to apprehend the results of using those algorithms for diverse compression and supply coding applications. It can assist practitioners in identifying the first-class compression set of rules for their unique application needs. This paper gives an innovative evaluation of the LZ77 and LZ78 compression algorithms. These algorithms are algorithms designed for statistics compression and source coding. They’re used to take a given series of characters (or bits) and encode them right into an area-green representation. The innovation evaluation presents a valuable perception of how the algorithms improve compression ratios and shop area. It also discusses the effect of positive parameters such as window length, repeat cycle, and buffer size on compression efficiency. ultimately, the paper will speak to future studies’ challenges in this region and the capability packages of those algorithms.
While cache side-channel attacks have been known for over a decade, attacks and defenses have been mostly limited to cryptographic algorithms. In this work, we analyze the security of compression algorithms and their ...
详细信息
ISBN:
(数字)9798350341058
ISBN:
(纸本)9798350341065
While cache side-channel attacks have been known for over a decade, attacks and defenses have been mostly limited to cryptographic algorithms. In this work, we analyze the security of compression algorithms and their susceptibility to cache side-channels. We design TaintChannel, a tool that automatically detects cache side-channel vulnerabilities and apply the tool to compression software to conduct a study of vulnerabilities in popular compression algorithms—LZ77, LZ78, BWT—and their mainstream implementations. We discover that the implementation of all of these algorithms leak some or all of their input data via cache side-channels. This is concerning, as compression algorithms are widely used in software that operates on sensitive data (e.g., HTTPS). We demonstrate the practicality of these vulnerabilities via two end-to-end attacks on Bzip2. These attacks work in two different threat models and use different attack techniques. Our first attack targets compression within an SGX enclave using the Prime+Probe cache attack technique and extracts the entire input while it is being compressed with an accuracy greater than 99%. Because existing cache attack techniques fall short in targeting applications with larger memory footprint such as compression software, we develop new attack techniques for larger buffers. Our second attack works in the threat model when one application attacks a different application. It allows the attacker to identify which file is being compressed out of multiple options.
compression algorithms have been proposed with the technology advance. However, there are not objective analysis procedures to guide a future choice of an algorithm directed for the type of data in the system they are...
详细信息
compression algorithms have been proposed with the technology advance. However, there are not objective analysis procedures to guide a future choice of an algorithm directed for the type of data in the system they are intended for. This paper introduces a statistical framework, based on the bootstrap method, to execute the analysis of compression algorithms using an objective comparison parameter as a criterion. A case study using the compression ratio as the parameter and file samples of 4 different types was analyzed. The proposed scheme allowed us to infer which algorithm is better to be used for each data type. RLE has proven more suitable to image, audio and video files with Huffman obtaining comparable performance. For text files, LZW has remarkably outperformed all other algorithms.
A standard interchangeable data format was recently developed through the Transportation Pooled-Fund Study TPF-5(299) to meet different requirements and needs for two- and three-dimensional (2D/3D) pavement image data...
详细信息
A standard interchangeable data format was recently developed through the Transportation Pooled-Fund Study TPF-5(299) to meet different requirements and needs for two- and three-dimensional (2D/3D) pavement image data storage and exchange. Considering pavement images occupy large amounts of storage space, efficient compression algorithms for both 8-bit 2D and higher-bit 3D images become one of the key components for a successful implementation of the standard. Highway agencies and industry vendors exclusively use the Joint Photographic Experts Group (JPEG) or its successor JPEG 2000 standard for 2D image compression, while the compression of 3D 16-bit images remains to be proprietary. The goal of this paper is to evaluate suitable algorithms for compressing 3D pavement images. An overview of the state-of-the-art pavement image data collection technologies was performed, and the relevant image data formats were identified. Subsequently, five compression algorithms suitable for 16-bit image data were selected based on results from a comprehensive survey of the TPF-5(299) technical community. The compressions of several representative field images were implemented and evaluated in terms of the computation efficiency and retaining of image quality. The expected benefits of this study include facilitating standard and efficient pavement image data collection and accelerating the development of consistent pavement condition evaluation.
With the ever-growing presence of IoT devices in everyday applications, the sheer volume of data exchanged is increasing rapidly. Communication channels' bandwidth used in these applications often presents a bottl...
详细信息
ISBN:
(纸本)9789532330984
With the ever-growing presence of IoT devices in everyday applications, the sheer volume of data exchanged is increasing rapidly. Communication channels' bandwidth used in these applications often presents a bottleneck and limits the frequency at which data can be exchanged, making application of compression beforehand a viable approach. In this paper, we analyze some of the well known lossless compression algorithms capable of seamless integration as software modules on embedded devices used in (near) real-time applications. Low processing and memory resources available on them as well as unsteadiness of aforementioned communication channels were taken into consideration in algorithm selection. Attained compression ratio is evaluated on data harvested from smart meters, whilst time and energy consumed during processing were analyzed within a measurement framework purposefully built for this research. We show that considerable gains can be attained in the volume of data that can be sent from devices based on commonly used microcontrollers, with only a modest toll on time and overall energy used.
暂无评论