Transform and entropy models are the two core components in deep imagecompressionneural networks. Most existing learning-based imagecompression methods utilize convolutional-based transform, which lacks the ability...
详细信息
Despite a short history, neuralimage codecs have been shown to surpass classical image codecs in terms of rate-distortion performance. However, most of them suffer from significantly longer decoding times, which hind...
详细信息
Despite a short history, neuralimage codecs have been shown to surpass classical image codecs in terms of rate-distortion performance. However, most of them suffer from significantly longer decoding times, which hinders the practical applications of neuralimage codecs. This issue is especially pronounced when employing an effective yet time-consuming autoregressive context model since it would increase entropy decoding time by orders of magnitude. In this paper, unlike most previous works that pursue optimal RD performance while temporally overlooking the coding complexity, we make a systematical investigation on the rate-distortioncomplexity (RDC) optimization in neural image compression. By quantifying the decoding complexity as a factor in the optimization goal, we are now able to precisely control the RDC trade-off and then demonstrate how the rate-distortion performance of neuralimage codecs could adapt to various complexity demands. Going beyond the investigation of RDC optimization, a variable-complexity neural codec is designed to leverage the spatial dependencies adaptively according to industrial demands, which supports fine-grained complexity adjustment by balancing the RDC tradeoff. By implementing this scheme in a powerful base model, we demonstrate the feasibility and flexibility of RDC optimization for neuralimage codecs.
neuralcompression has the potential to revolutionize lossy imagecompression. Based on generative models, recent schemes achieve unprecedented compression rates at high perceptual quality, but they compromise semanti...
详细信息
ISBN:
(纸本)9798350364439;9798350364422
neuralcompression has the potential to revolutionize lossy imagecompression. Based on generative models, recent schemes achieve unprecedented compression rates at high perceptual quality, but they compromise semantic fidelity. Details of decompressed images may appear optically flawless, but semantically different from the originals, making compression errors difficult or impossible to detect. We explore the problem space and propose a provisional taxonomy of miscompressions. It defines three types of "what happens" and has a binary "high impact" flag indicating miscompressions that alter symbols. We discuss how the taxonomy can facilitate risk communication and research into mitigations.
imagecompression methods based on machine learning have achieved high rate-distortion performance. However, the reconstructions they produce suffer from blurring at extremely low bitrates (below 0.1 bpp), resulting i...
详细信息
imagecompression methods based on machine learning have achieved high rate-distortion performance. However, the reconstructions they produce suffer from blurring at extremely low bitrates (below 0.1 bpp), resulting in low perceptual quality. Although some methods attempt to reconstruct sharp images using Generative Adversarial Networks (GANs), reconstructing natural textures at low bitrates remains challenging. In this paper, we propose a novel imagecompression method that explicitly utilizes semantic information. Specifically, we send a semantic label map to the decoder, which takes it as input. This semantic information enables the decoder to reconstruct appropriate textures consistent with the corresponding semantic classes. Although semantic label maps can be compressed into relatively small data sizes using common methods (e.g., PNG), the data size is not negligible in an extremely low-rate setting. To address this problem, we propose simple yet effective label map compression strategies, including an autoregressive label map compressor. Our strategies significantly reduce the data size of the label map while maintaining the critical semantic information that allows the decoder to reconstruct realistic and suitable textures. By utilizing this data-efficient semantic information, our method can reconstruct realistic images even at an extremely low bitrate. As a result, the proposed method outperformed existing models, including a GAN-based model designed for low-rate settings and a state-of-the-art semantically guided method, in both quantitative evaluation and user studies. Furthermore, we analyzed the effect of semantic information by switching the input label map, confirming that the model synthesized textures appropriate to the given semantic labels.
Wavelet-like transform, based on convolutional neural network (CNN), is content-adaptive and has made remarkable achievements in end-to-end imagecompression. However, the subsequent sequential processing of each subb...
详细信息
ISBN:
(纸本)9798350358483;9798350358490
Wavelet-like transform, based on convolutional neural network (CNN), is content-adaptive and has made remarkable achievements in end-to-end imagecompression. However, the subsequent sequential processing of each subband in the entropy module takes a relatively long decoding time, resulting in inconvenience for real-world applications. In this work, for lossy imagecompression, the wavelet-like transform is transplanted into the prevailing autoencoder structure to enhance the analysis and synthesis transform due to its excellent decomposition capability. The obtained subbands of different frequencies will undergo a hierarchical decorrelation architecture for subband fusion, also called cross fusing module. The specialized treatment will be applied to different subbands according to their spatial resolution to attain a more compact latent representation. In addition, the proposed solution features an architecture that decouples the arithmetic decoding process from the sample prediction process, which significantly reduces the decoding complexity. Experiments on the Kodak test set show that the proposed method achieves -3.04% BD-Rate compared to existing decoupled end-to-end structure in RGB Peak Signal-to-Noise Ratio (PSNR).
Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based ...
详细信息
ISBN:
(纸本)9781728198354
Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute imagecompression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive imagecompression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.
End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep...
详细信息
ISBN:
(纸本)9781665492577
End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network, mapping input image into latent space, jointly with an entropy model of the latent distribution. The decoder is also learned as a deep trainable network, and the reconstructed image measures the distortion. These methods enforce the latent to follow some prior distributions. Since these priors are learned by optimization over the entire training set, the performance is optimal in average. However, it cannot fit exactly on every single new instance, hence damaging the compression performance by enlarging the bitstream. In this paper, we propose a simple yet efficient instancebased parameterization method to reduce this amortization gap at a minor cost. The proposed method is applicable to any end-to-end compressing methods, improving the compression bitrate by 1% without any impact on the reconstruction quality.
While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability ...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability to achieve state-of-the-art perceptual levels. To address this limitation, we adopt a non-isotropic diffusion model at the decoder side. This model imposes an inductive bias aimed at distinguishing between frequency contents, thereby facilitating the generation of high-quality images. Moreover, our framework is equipped with a novel entropy model that accurately models the probability distribution of latent representation by exploiting spatio-channel correlations in latent space, while accelerating the entropy decoding step. This channel-wise entropy model leverages both local and global spatial contexts within each channel chunk. The global spatial context is built upon the Transformer, which is specifically designed for imagecompression tasks. The designed Transformer employs a Laplacian-shaped positional encoding, the learnable parameters of which are adaptively adjusted for each channel cluster. Our experiments demonstrate that our proposed framework yields better perceptual quality compared to cutting-edge generative-based codecs, and the proposed entropy model contributes to notable bitrate savings. The code is available at https://***/Atefeh-Khoshtinat/Blur-dissipated-compression.
暂无评论