We consider the perceptual quality optimization in image coding through adaptive quantization. A differential contrast model is proposed to measure the visual sensitivity to the quantization distortions, and thereby d...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
We consider the perceptual quality optimization in image coding through adaptive quantization. A differential contrast model is proposed to measure the visual sensitivity to the quantization distortions, and thereby deriving the spatially adaptive quantization strategy. A complementary quantitative approach is provided as a means to efficiently calculate the proposed differential contrast model. The resulting visual quality improvement is experimentally demonstrated.
Due to the huge data volume of high-resolution remote sensing imagery (RSI) and limited transmission bandwidth, RSIs are typically compressed for efficient transmission and storage. However, most of the existing compr...
详细信息
Due to the huge data volume of high-resolution remote sensing imagery (RSI) and limited transmission bandwidth, RSIs are typically compressed for efficient transmission and storage. However, most of the existing compression algorithms are developed based on optimizing for the human perceptual that are not suitable for remote sensing image applications where RSIs are usually used for machine interpretation tasks, such as semantic segmentation for ground-object recognition. In this article, we propose an image coding for machines (ICMs) paradigm based on contrastive learning in a fully supervised manner to boost semantic segmentation of compressed RSIs. Specifically, we build an end-to-end compression framework to make full use of the global semantic information by clustering intracategory projected embeddings and spacing intercategory embeddings apart, to compensate for the loss of feature discriminability during the compression process and reconstruct the decision boundaries between different categories. Compared to the state-of-the-art image compression methods, our proposed method significantly improves the performance of semantic segmentation on the remote sensing labeling benchmark datasets.
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted f...
详细信息
ISBN:
(纸本)9781665492577
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard.
To accelerate the fractal decoding process, a minimum domain block set (MDBS)-based fast fractal decoding method is proposed here. In fractal encoding process, it is found that there exists a MDBS which can provide th...
详细信息
To accelerate the fractal decoding process, a minimum domain block set (MDBS)-based fast fractal decoding method is proposed here. In fractal encoding process, it is found that there exists a MDBS which can provide the best-matched domain blocks for all range blocks. In the decoding process, MDBS is first identified before the first iteration. Then, only the range blocks inside MDBS are reconstructed in each of the first to penultimate iterations, and the computations of reconstructing the remaining range blocks outside MDBS can be saved to speedup the decoding process. Finally, all range blocks are reconstructed to obtain the decoded image in the last iteration. Experimental results show that about 5%-17% of total computations in decoding process can be saved.
In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from arti...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, we propose PO-ELIC, i.e., Perception-Oriented Efficient Learned image coding. To be specific, we adapt ELIC, one of the state-of-the-art LIC models, with adversarial training techniques. We apply a mixture of losses including hinge-form adversarial loss, Charbonnier loss, and style loss, to finetune the model towards better perceptual quality. Experimental results demonstrate that our method achieves comparable perceptual quality with HiFiC with much lower bitrate.
The resource-constrained camera integrated Visual Sensor Networks (VSN) have conquered numerous visual aided services from visual surveillance to habitat monitoring. VSN is capable of sensing, processing and communica...
详细信息
The resource-constrained camera integrated Visual Sensor Networks (VSN) have conquered numerous visual aided services from visual surveillance to habitat monitoring. VSN is capable of sensing, processing and communicating visual data wirelessly. These networks are built with inexpensive low power sensor motes with a lightweight processor, limited storage, and bandwidth. The huge amount of redundancy present in the images makes the processing and communication consume more energy than expected. The number of bits must be reduced using energy-efficient compression techniques for efficient transmission. Low computational energy and communication energy are always favored for an increased lifetime of the wireless sensor network. The highly sensitive and self-descriptive nature of images makes security in VSN even more critical. In this work, we propose an energy-efficient low bitrate secured image coder for resource-constrained VSN. Light weight design protocols are highly required in secured image transmission over VSN. Through this communication, we also propose a novel chaotic map using Pascal's triangle. The system follows a unique interleaved fashion of compression and encryption process to consume less computational resources. A series of tests were carried out to validate the secured image coder's ruggedness and its suitability in VSN. The performance and the strength of the low bitrate secured image coder are tested with compression efficiency and cryptanalysis tests. Simulations were carried out in Atmel's ATmega128 processor for energy consumption analysis. The energy consumed by the proposed system for compression, encryption and transmission of an image of size 512 x 512 is 109.364mJ (milli Joules), which is only 4.57 % of the energy consumed by raw image transmission. In addition, the system is implemented in real time image sensor platform based on Arduino Due board integrated with OV7670 camera module for real time verification and the experimental result
Fast-growing intelligent media processing applications demand efficient processing throughout the processing chain from the edge to the cloud, and the complexity bottleneck usually lies in the parallel decoding of mul...
详细信息
Fast-growing intelligent media processing applications demand efficient processing throughout the processing chain from the edge to the cloud, and the complexity bottleneck usually lies in the parallel decoding of multiple-channel compressed bitstreams before analyzing. This occurs because the traditional media coding scheme generates a binary stream without a semantic structure, which is unable to be operated directly at the bitstream level to support different tasks such as classification, recognition, detection, etc. Therefore, in this article, we propose a learning-based semantically structured image coding (SSIC) framework to generate a semantically structured bitstream (SSB), where each part of the bitstream represents a specific object and can be directly used for the aforementioned intelligent tasks. Specifically, we integrate an object location extraction module into the compression framework to locate and align objects in the feature domain. Then, each object together with the background is compressed separately and reorganized to form a structured bitstream to enable the analysis or reconstruction of specific objects directly from partial bitstream. Furthermore, in contrast to existing learning-based compression schemes that train the specific model for a specific bitrate, we share most of the model parameters among various bitrates to significantly reduce the model size for variable-rate compression. The experimental results demonstrate the effectiveness of the proposed coding scheme whose compression performance is comparable to existing image coding schemes, where intelligent tasks such as classification and pose estimation can be directly performed on a partial bitstream without performance degradation, significantly reducing the complexity for analyzing tasks.
作者:
Chen, YihaoTan, BinWu, JunZhang, ZhifengRen, HaoqiTongji Univ
Coll Elect & Informat Engn Shanghai 201804 Peoples R China Tongji Univ
Key Lab Embedded Syst & Serv Comp Minist Educ Shanghai 201804 Peoples R China Jinggangshan Univ
Coll Elect & Informat Engn Jian 343009 Jiangxi Peoples R China Fudan Univ
Sch Comp Sci Shanghai 200433 Peoples R China Fudan Univ
Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China
This paper provides a method to build a deep learning image coding system based on inverse problem, choosing a suitable measurement operator to reduce the amount of information transmitted at the sender, and reconstru...
详细信息
This paper provides a method to build a deep learning image coding system based on inverse problem, choosing a suitable measurement operator to reduce the amount of information transmitted at the sender, and reconstructing the original image by tackling the inverse problem at the receiver. Unlike most compressed sensing (CS) methods, the proposed coding scheme does not rely on sparsity but uses the structural priors of the generative adversarial networks (GAN) to solve the inverse problem. The proposed model trains the GAN to learn a mapping from the latent space to the sample space formed by correlated images on the cloud. Then the measurements are used to localize the optimal latent variable in the representation space which corresponding to the original image in the sample space. The proposed method encodes and transmits the measurements instead of the original image, which greatly reduces the cost of transmission while ensuring the quality of the reconstructed the image at high compression ratios. To the best of our knowledge, this is the first time to introduce the GAN-based inverse problem in the field of the deep image coding area. The experimental results show that the visual quality of the images generated by the proposed scheme is better than the traditional encoding scheme JPEG2000. Especially in the case of extremely high compression ratios, the proposed scheme can still maintain good performance.
The Joint Picture Expert Group (JPEG) committee has been standardizing next-generation image compression, called JPEG XL, to meet the specific needs for a responsive web, wide color gamut, and high dynamic range. JPEG...
详细信息
The Joint Picture Expert Group (JPEG) committee has been standardizing next-generation image compression, called JPEG XL, to meet the specific needs for a responsive web, wide color gamut, and high dynamic range. JPEG XL supports lossy and lossless compression. A variable-sized discrete cosine transform (DCT) block is used for lossy compression. A block partitioning method is regarded as a critical function for the performance of JPEG XL. The current DCT block partitioning method used in JPEG XL is highly dependent on the compression rate and tends to assign small-sized DCT blocks to homogeneously textured regions (HTRs) having similar or regular patterns. We propose a region-adaptive DCT block partitioning method that assigns larger blocks to the HTR. The proposed method identifies the HTRs by using a combined metric employing a sum-modified Laplacian, zero-crossing, and colorfulness metric for measuring the region homogeneity. Objective, subjective, and visual comparison evaluations with the ten images recommended by the JPEG working group were provided to show the improvement in coding performance. The proposed method shows its superiority in terms of the compression efficiency evaluated using six objective metrics, subjective tests with 15 participants, visual comparison improvements in the HTR, and gains in the execution time.
The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image...
详细信息
The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel face image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to reconstruct image with compact structure and color features, where sparse edges are extracted to connect both kinds of vision and a key reference pixel selection method is proposed to determine the priorities of the reference color pixels for scalable coding. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as an enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a decoding network to reconstruct images from compact structure and color representations, which is flexible to accept inputs in a scalable way and to control the imagery effect of the outputs between signal fidelity and visual realism. Experimental results and comprehensive performance analysis over the face image dataset demonstrate the superiority of our framework in both human vision tasks and machine vision tasks, which provide useful evidence on the emerging standardization efforts on MPEG VCM (Video coding for Machine).
暂无评论