Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework – H.266/VVC. Witnessing such suc...
详细信息
Overfitted image codecs offer compelling compression performance and low decoder complexity, through the overfitting of a lightweight decoder for each image. Such codecs include Cool-chic, which presents image coding ...
详细信息
ISBN:
(数字)9789464593617
ISBN:
(纸本)9798331519773
Overfitted image codecs offer compelling compression performance and low decoder complexity, through the overfitting of a lightweight decoder for each image. Such codecs include Cool-chic, which presents image coding performance on par with VVC while requiring around 2000 multiplications per decoded pixel. This paper proposes to decrease Cool-chic encoding and decoding complexity. The encoding complexity is reduced by shortening Cool-chic training, up to the point where no overfitting is performed at all. It is also shown that a tiny neural decoder with 300 multiplications per pixel still outperforms HEVC. A near real-time CPU implementation of this decoder is made available at https://***/Cool-Chic/.
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that larg...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to${\text{tell codec what to compress}}$: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly encode and transmit objects of the image in order with a structured bitstream. In this way, diverse vision benchmarks including image classification, object detection, instance segmentation, etc., can be well supported with such a semantically structured bitstream. We dub our method "SDComp" for "Semantically Disentangled Compression", and compare it with state-of-the-art codecs on a wide variety of different vision tasks. SDComp codec leads to more flexible reconstruction results, promised decoded visual quality, and a more generic/satisfactory intelligent task-supporting ability.
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In thes...
详细信息
ISBN:
(数字)9798350387254
ISBN:
(纸本)9798350387261
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate. Code is available at https://***/final-0/ICM-v1.
The filter bank implementation of standard Discrete Wavelet Transform (DWT) suffers from coefficient expansion problem. In this paper a methodology is proposed to address the coefficient expansion problem of standard ...
详细信息
ISBN:
(数字)9798350388534
ISBN:
(纸本)9798350388541
The filter bank implementation of standard Discrete Wavelet Transform (DWT) suffers from coefficient expansion problem. In this paper a methodology is proposed to address the coefficient expansion problem of standard DWT. With this methodology a non –expansive implementation is carried out without increasing computation load and memory requirements. However, this non-expansive filter bank realization of discrete wavelet transform suffers from boundary artifacts, but it is restricted to a fewer coefficients only at the boundaries. An optimal filter is required to reduce this boundary artifacts problem. The proposed non-expansive DWT is highly suited for real time wavelet based image coding system.
Accurate estimation of the state of health (SOH) of lithium-ion batteries is a key initiative to guarantee their service reliability in complex operating environments. Using one-dimensional time series data to transfo...
详细信息
ISBN:
(数字)9798331529192
ISBN:
(纸本)9798331529208
Accurate estimation of the state of health (SOH) of lithium-ion batteries is a key initiative to guarantee their service reliability in complex operating environments. Using one-dimensional time series data to transform two-dimensional image for battery degradation feature extraction can improve the accuracy of battery SOH evaluation, reduce the complexity of evaluation model and the demand for the amount of test data. Although existing studies have attempted to apply image coding techniques to enhance the degradation features of original data, the advantages and disadvantages of different image coding methods have not been systematically compared. Therefore, in this work, five commonly used image coding methods including recurrence plots, Gramian angular summation field, Gramian angular difference field, relative position matrix, and time series data folding are selected and comprehensively compared. Firstly, the original one-dimensional voltage signal is encoded into a two-dimensional image, which is then inputted into the CNN-GRU-based SOH prediction model, and finally the future battery SOH value is output. The experimental results show that there are differences in the applicable stages and conditions of different coding methods, so they need to be adapted with specific application scenarios, which is the next research direction.
Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overloo...
详细信息
With the increasing number of images and videos consumed by computer vision algorithms, compression methods are evolving to consider both perceptual quality and performance in downstream tasks. Traditional codecs can ...
详细信息
image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important ...
详细信息
image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information ...
详细信息
ISBN:
(数字)9798350349399
ISBN:
(纸本)9798350349405
image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjøntegaard Delta rate improvements of $27.7 \%$ and $20.3 \%$ in object detection and semantic segmentation tasks, compared to the conventional training method.
暂无评论