An autoregressive image generative model that estimates the conditional probability distributions of image signals pel-by-pel is a promising tool for lossless image coding. In this paper, a generative model based on a...
详细信息
ISBN:
(纸本)9781665492584
An autoregressive image generative model that estimates the conditional probability distributions of image signals pel-by-pel is a promising tool for lossless image coding. In this paper, a generative model based on a convolutional neural network (CNN) was combined with a locally trained adaptive predictor to improve its accuracy. Furthermore, sets of parameters that adjust the estimated probability distribution were numerically optimized for each image to minimize the resulting coding rate. Simulation results indicate that the proposed method improves the coding efficiency obtained by the CNN-based model for most of the tested images.
Compressed sensing (CS) has been demonstrated to be an effective method for robust image coding. However, for existing CS-based image coding schemes, recovery performance drops rapidly at high packet loss rates (PLRs)...
详细信息
Compressed sensing (CS) has been demonstrated to be an effective method for robust image coding. However, for existing CS-based image coding schemes, recovery performance drops rapidly at high packet loss rates (PLRs) because the received CS measurements are insufficient for stable recovery. To solve this problem, we propose a novel robust image coding scheme by using CS with measurement completion in this paper. By dividing the original image into a lot of down-sampled images with interweaving permutation (IP) and then sampling them with scrambled 2D CS (2DCS), we can obtain the CS measurement vectors of the down-sampled images. Since the CS measurement vectors preserve the correlation of the down-sampled images, they are also highly-correlated with each other. By exploring the correlation among the CS measurement vectors, a measurement completion strategy is proposed, which can recover many lost CS measurements due to packet loss at the decoder side. Simulation results show that the proposed scheme can significantly outperform previous CS-based image coding schemes at high PLRs in terms of rate distortion (R-D) performance. This advantage makes the proposed scheme a good candidate for those image communication systems which need to provide reliable transmission for the image data via channels with high PLRs.
Learnt image coding (LIC) methods recently offered state-of-the-art efficiency by training separate models for individual bitrate which apparently was impractical. Variable-rate coding with a single or very few LIC mo...
详细信息
ISBN:
(纸本)9781665492584
Learnt image coding (LIC) methods recently offered state-of-the-art efficiency by training separate models for individual bitrate which apparently was impractical. Variable-rate coding with a single or very few LIC models was emerged and mostly implemented to process a whole image directly (e.g., a single control rate-distortion factor $\lambda$ for a given image to approach target rate). This work provides a novel block-level rate control by applying the UnEqual Rate Allocation (UERA) to nonoverlapped image blocks, which basically exploits the spatial heterogeneousness of the underlying content. Such block-level UERA is enabled by modeling the rate-distortion (R-D) function of each block, by which we optimize block-wise $\lambda$s to maximize the overall R-D performance. Experiments show that our method can accurately adapt a wide range of bitrates by a single model, and provide almost identical performance as the solutions using multiple rate-specific models. Additionally, such block-level LIC significantly reduces the consumption of peak running memory and computational complexity, which is attractive for practical implementations.
In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from arti...
详细信息
Textual content is becoming increasingly important in video conferencing, while existing screen content encoding tools still produce a high bitrate in text regions. The main coding tool Intra Block Copy (IBC) inherits...
详细信息
ISBN:
(纸本)9781665492584
Textual content is becoming increasingly important in video conferencing, while existing screen content encoding tools still produce a high bitrate in text regions. The main coding tool Intra Block Copy (IBC) inherits the MV prediction mechanism in inter-frame coding, but the adjacent text characters typically have irrelevant MVs, making it inefficient to predict MV using only neighbor MVs. To solve the problem, we propose the Position-based Motion Vector Prediction, to cache IBC AMVP PU positions as predictors. One character can find the previously encoded position to construct a good MV prediction. Experiment results show the effectiveness of the proposed prediction scheme.
With the continuous improvement of computer vision technology, more and more image information is consumed by machines rather than humans. image coding for machines (ICM) is to compress image data such that they can b...
详细信息
ISBN:
(纸本)9781665492584
With the continuous improvement of computer vision technology, more and more image information is consumed by machines rather than humans. image coding for machines (ICM) is to compress image data such that they can be more efficiently sent to the receiver side for machines to conduct visual analysis. A typical deep learning-based ICM structure contains one codec network which compresses and transmits images through the Internet and one semantic analysis task network such as image classification and object recognition. In the codec part, the side information is the hyper-prior or hierarchical layers of hyper-priors for the compression of image latent representations. In this paper, we propose a Side Information Driven image coding (SIIC) framework based on deep learning. It only compresses and transmits the side information to the receiver for image classification tasks. We obtain a top-l accuracy of 70.38% on the imageNet1K dataset with 0.046 bits per pixel.
Learning-based image coding has shown promising results for coding of natural images compared to traditional block-based coding schemes. However, improvements are needed for screen content coding. Most of the popular ...
详细信息
ISBN:
(数字)9781665484855
ISBN:
(纸本)9781665484862
Learning-based image coding has shown promising results for coding of natural images compared to traditional block-based coding schemes. However, improvements are needed for screen content coding. Most of the popular learning-based coding approaches are based on variational autoencoders employing Convolutional Neural Networks (CNNs) which are end-to-end trained on a training dataset. The receptive field area of the latents in these architectures increase based on the down-sampling ratio and the kernel size used in each convolution layer. The latents coded from the last layer therefore have a large receptive field size which may not be optimal to code image sources such as screen content or mixed content containing text, logos and small edges. This paper proposes new methods to adaptively fuse and code the latents from different layers. It enables a novel multi-level receptive field based latent coding architecture to achieve better coding performance for a diverse set of contents. Additionally, Multi-Mixture distribution based entropy modeling of latent features and content adaptive latent refinements in the encoder is proposed to bring more coding gains. The experimental results show that the approach can significantly improve the coding efficiency for screen content with average bitrate savings of 36%.
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted f...
详细信息
ISBN:
(纸本)9781665492584
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard.
In human action recognition, the way of collecting action data through video or photos is easily affected by factors such as perspective and light, and it is not easy to describe and extract features. To solve this pr...
详细信息
ISBN:
(纸本)9781665482097
In human action recognition, the way of collecting action data through video or photos is easily affected by factors such as perspective and light, and it is not easy to describe and extract features. To solve this problem, we researched human skeletal joint data and the use of the convolutional neural network (CNN). The joint data was converted into a PNG image by image coding. In addition, we proposed 3 descriptions of data arrangement order for grayscale image coding. Combined with 4 coding methods and RGB image coding, the coding scheme was expanded to 16 kinds, and used a CNN model with 9 layers structure to conduct comparative experiments on 16 kinds of coding schemes. Then, the influence of data arrangement order and coding methods was discussed based on action recognition results. The experimental results show that the “Zhi” font coding method under the data arrangement order Case 2 is easier to classify actions, and the accuracy of the test set is 96 %.
Recently, convolutional auto-encoders (CAE) were introduced for image coding. They achieved performance improvements over the state-of-the-art JPEG2000 method. However, these performances were obtained using massive C...
详细信息
暂无评论