Error-bounded lossy compression turns more and more important for the data-moving intensive applications to deal with big datasets efficiently in HPC environments, which often requires knowing the compressibility of t...
详细信息
ISBN:
(纸本)9798350307924
Error-bounded lossy compression turns more and more important for the data-moving intensive applications to deal with big datasets efficiently in HPC environments, which often requires knowing the compressibility of the datasets before performing the compression. However, the off-the-shelf state-of-the-art lossy compressors are often driven by error bounds, so the compression ratios cannot be forecasted until the completion of the compression operation. In this paper, we propose a lightweight, robust, easy-to-train model that estimates the compressibility of datasets for different lossy compressors accurately. Our approach combines novel predictors that measure various notions of spatial correlation and smoothness exploited by lossy compressors that are implemented efficiently on the GPU in a framework and that uses mixture model regression to improve robustness with conformal prediction to provide bounds on the estimates. We then use these models with a detailed analysis of speedup to understand the tradeoffs between high speed, consistent speed, and accuracy of the methods on real applications. We evaluate our approach in the context of 3 key applications where compression ratio estimation is highly required.
暂无评论