Among various technical approaches in machine vision coding, image coding for machine (ICM) stands out for its capability to simultaneously fulfill both human perception and machine vision needs. However, it is often ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Among various technical approaches in machine vision coding, image coding for machine (ICM) stands out for its capability to simultaneously fulfill both human perception and machine vision needs. However, it is often criticized for its lack of efficiency regarding rate-analytics performance. In this paper, we propose an Appearance Redundancy Reduction (ARR) module, designed to function as a plug-in for existing ICM frameworks, aiming to further enhance the coding efficiency regarding rate-analytics without any changes to the ICM itself. To be specific, our work pays additional attention to the intrinsic correlation between the low-level image structure and high-level vision analytics, and subsequently proposes a novel colour quantization mechanism to squeeze out the analytics-free redundant appearance information. Moreover, a differentiable soften quantization operation is derived to enable end-to-end training within the ICM framework. Extensive experimental results have shown that integrating the proposed ARR module yields substantial improvements regarding rate-analytic performance, even surpassing the performance of the feature coding paradigm, while maintaining the generalizability across different tasks and acceptable perceptual representation.
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that larg...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to tell codec what to compress: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly encode and transmit objects of the image in order with a structured bitstream. In this way, diverse vision benchmarks including image classification, object detection, instance segmentation, etc., can be well supported with such a semantically structured bitstream. We dub our method "SDComp" for "Semantically Disentangled Compression", and compare it with state-of-the-art codecs on a wide variety of different vision tasks. SDComp codec leads to more flexible reconstruction results, promised decoded visual quality, and a more generic/satisfactory intelligent task-supporting ability.
With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compressi...
详细信息
With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative imagecoding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model's perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.
image compression aims to minimize the amount of data in image representation while maintaining a certain visual quality for humans, which is an essential technique for storage and transmission. Recently, along with t...
详细信息
image compression aims to minimize the amount of data in image representation while maintaining a certain visual quality for humans, which is an essential technique for storage and transmission. Recently, along with the development of computer vision, machines have become another primary receiver for images and require compressed images at a certain quality level, which may be different from that of human vision. In many scenarios, compressed images should serve both human and machine vision tasks, but few compression methods are designed for both goals simultaneously. In this article, we propose a unified and scalable deep image compression (USDIC) framework that jointly optimizes the image quality according to human and machine vision in an end-to-end style. For the encoder, we propose an information splitting mechanism (ISM) to separate images into semantic and visual features, which mainly aims at machine analysis and human viewing tasks. For the decoder, we design a scalable decoding architecture. The encoded semantic feature is first decoded for machine analysis tasks, and the image is decoded and reconstructed further by leveraging the decoded semantic features. Herein, to further remove the redundancy between the semantic and visual features of images, we propose a scalable entropy model (SEM) with a joint optimization strategy to reconstruct the image using the two kinds of decoded features. Extensive experimental results show that the proposed USDIC achieves much better performance on the image analysis task while maintaining competitive performance on the traditional image reconstruction task compared with popular image compression methods.
image coding for machine (ICM) aims to compress an image so that the reconstructed one can meet the requirements of both human vision and machine vision. Existing methods apply the constraint from the downstream model...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
image coding for machine (ICM) aims to compress an image so that the reconstructed one can meet the requirements of both human vision and machine vision. Existing methods apply the constraint from the downstream models to improve machine analytics performance while compromising the visual quality. This paper proposes a novel adversarially augmented adaptation route that achieves a better trade-off between the utility of the human and machine perspectives by making slight changes to the image manifold. In detail, a targeted adversarial attack is employed to generate subtle image perturbations that are nearly imperceptible to humans but significantly improve machine analytic performance. These perturbed images would be subsequently employed as ground truth to guide training/fine-tuning of an end-to-end image compression network. Note that, our method is a plug-and-play framework that does not rely on any change in existing architecture or loss functions. Extensive experimental results demonstrate the superiority of the proposed scheme over conventional ICM frameworks and the effectiveness of our design.
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communicati...
详细信息
ISBN:
(纸本)9781728185514
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.
暂无评论