Intra prediction has been an integral part of image and video coders for a long time. A predominant method is angular prediction that extends the reference area in a certain angle into the block. Recently many deep-le...
详细信息
ISBN:
(纸本)9781728180687
Intra prediction has been an integral part of image and video coders for a long time. A predominant method is angular prediction that extends the reference area in a certain angle into the block. Recently many deep-learning-based methods have been proposed. Since intra prediction uses multiple modes this usually requires training a large number of networks. With a conditional autoencoder we are able to generate an arbitrary number of modes with only one network. In this paper we introduce a novel loss function enforcing a spatially correlated latent space and extend the network structure to the same end. Thereby we are able to propose a simple spatial mode prediction scheme using most-probable-mode lists. By replacing matrix-based intra prediction in VVC with our method, we obtain average rate savings of 0.84% with peak gains of 2.37%.
The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of tran...
详细信息
ISBN:
(纸本)9781665492577
The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the prediction signal. In practice, conditional coding can be straightforwardly implemented using a conditional autoencoder, which has also shown good results in recent works. In this paper, we provide an information theoretical analysis of conditional coding for inter frames and show in which cases gains compared to traditional residual coding can be expected. We also show the effect of information bottlenecks which can occur in practical video coders in the prediction signal path due to the network structure, as a consequence of the data-processing theorem or due to quantization. We demonstrate that conditional coding has theoretical benefits over residual coding but that there are cases in which the benefits are quickly canceled by small information bottlenecks of the prediction signal.
This paper introduces AIVC, an end-to-end neural video codec. It is based on two conditional autoencoders MNet and CNet, for motion compensation and coding. AIVC learns to compress videos using any coding configuratio...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
This paper introduces AIVC, an end-to-end neural video codec. It is based on two conditional autoencoders MNet and CNet, for motion compensation and coding. AIVC learns to compress videos using any coding configurations through a single end-to-end rate-distortion optimization. Furthermore, it offers performance competitive with the recent video coder HEVC under several established test conditions. A comprehensive ablation study is performed to evaluate the benefits of the different modules composing AIVC. The implementation is made available at https://***/AIVC/.
Recently, image compression codecs based on Neural Networks (NN) outperformed the state-of-art classic ones such as BPG, an image format based on HEVC intra. However, the typical NN codec has high complexity, and it h...
详细信息
ISBN:
(纸本)9781665492577
Recently, image compression codecs based on Neural Networks (NN) outperformed the state-of-art classic ones such as BPG, an image format based on HEVC intra. However, the typical NN codec has high complexity, and it has limited options for parallel data processing. In this work, we propose a conditional separation principle that aims to improve parallelization and lower the computational requirements of an NN codec. We present a conditional Color Separation (CCS) codec which follows this principle. The color components of an image are split into primary and non-primary ones. The processing of each component is done separately, by jointly trained networks. Our approach allows parallel processing of each component, flexibility to select different channel numbers, and an overall complexity reduction. The CCS codec uses over 40% less memory, has 2x faster encoding and 22% faster decoding speed, with only 4% BD-rate loss in RGB PSNR compared to our baseline model over BPG.
Recently, deep learning has demonstrated impressive performance in image compression. Methods, that achieve and even outperform conventional codecs performances, are continually emerging. However, most of them need to...
详细信息
ISBN:
(纸本)9781665432870
Recently, deep learning has demonstrated impressive performance in image compression. Methods, that achieve and even outperform conventional codecs performances, are continually emerging. However, most of them need to train and deploy separate networks for rate adaptation. This is impractical and extensive in terms of memory cost and power consumption, especially for broad bitrate ranges. Further, methods that consider the semantic-important structure of the image are extremely sparse. This leads to non-optimized bit allocation for the eye-catching foreground details, that have to be preserved for the almost all computer vision applications. Towards this end, we establish an end-to-end multi-rate deep semantic image compression with quantized conditional autoencoder. It includes two neural networks for the semantic analysis and image compression, respectively. The semantic analysis network extracts the essential semantic regions of the input image, and calculates the Semantic-Important Structural SIMilarity (SI-SSIM) index for each of them. The compression network is then trained to optimize a multi-loss function based on SI-SSIM and conditioned on the activation bitwidths. Performances of our model are evaluated on the JPEG AI dataset for objective and perceptual quality metrics. Obtained results show that our method yields higher performances over JPEG, JPEG 2000 and HEVC intra baselines and competitive performances with VVC intra.
暂无评论