Given the capabilities of massive GPU hardware, there has been a surge of using artificial neural networks (ANN) for still image compression. These compression systems usually consist of convolutional layers and can b...
详细信息
ISBN:
(纸本)9781665441155
Given the capabilities of massive GPU hardware, there has been a surge of using artificial neural networks (ANN) for still image compression. These compression systems usually consist of convolutional layers and can be considered as non-linear transform coding. Notably, these ANNs are based on an end-to-end approach where the encoder determines a compressed version of the image as features. In contrast to this, existing image and video codecs employ a block-based architecture with signal-dependent encoder optimizations. A basic requirement for designing such optimizations is estimating the impact of the quantization error on the resulting bitrate and distortion. As for non-linear, multi-layered neural networks, this is a difficult problem. This paper presents a performant auto-encoder architecture for still image compression, which represents the compressed features at multiple scales. Then, we demonstrate how an algorithm, which tests multiple feature candidates, can reduce the Lagrangian cost and optimize compression efficiency. The algorithm avoids multiple network executions by pre-estimating the impact of the quantization on the distortion by a higher-order polynomial.
The performance of variational auto-encoders (VAE) for image compression has steadily grown in recent years, thus becoming competitive with advanced visual data compression technologies. These neural networks transfor...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
The performance of variational auto-encoders (VAE) for image compression has steadily grown in recent years, thus becoming competitive with advanced visual data compression technologies. These neural networks transform the source image into a latent space with a channel-wise representation. In most works, the latents are scalar quantized before being entropy coded. On the other hand, vector quantizers generally achieve denser packings of high-dimensional data regardless of the source distribution. Hence, low-complexity variants of these quantizers are implemented in the compression standards JPEG 2000 and Versatile Video Coding. In this paper we demonstrate coding gains by using trellis-coded quantization (TCQ) over scalar quantization. For the optimization of the networks with regard to TCQ, we employ a specific noisy representation of the features during the training stage. For variable-rate VAEs, we obtained 7.7% average BD-rate savings on the Kodak images by using TCQ over scalar quantization. When different networks per target bitrate are optimized, we report a relative coding gain of 2.4% due to TCQ.
The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quant...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation which allows end-to-end optimization. Though there are different methods for approximating the quantization, none of them model the quantization noise correctly and thus, result in suboptimal networks. Hence, we propose an additional finetuning training step: After conventional end-to-end training, parts of the network are retrained on quantized latents obtained at the inference stage. For entropy-constraint quantizers like Trellis-Coded Quantization, the impact of the quantizer is particularly difficult to approximate by rounding or adding noise as the quantized latents are interdependently chosen through a trellis search based on both the entropy model and a distortion measure. We show that retraining on correctly quantized data consistently yields additional coding gain for both uniform scalar and especially for entropy-constraint quantization, without increasing inference complexity. For the Kodak test set, we obtain average savings between 1% and 2%, and for the TecNick test set up to 2.2% in terms of Bjontegaard-Delta bitrate.
暂无评论