image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM;optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In thes...
详细信息
ISBN:
(纸本)9798350387261;9798350387254
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable imagecoding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate. Code is available at https://***/final-0/ICM-v1.
With the development of machine learning, advanced photography and image transmission systems, images are being processed more and more by machines, so image coding for machines (ICM) came into being. After the image ...
详细信息
With the development of machine learning, advanced photography and image transmission systems, images are being processed more and more by machines, so image coding for machines (ICM) came into being. After the image codec compresses and transmits the image, the image will be handed over to machine vision task networks. These vision tasks include image classification, semantic segmentation, and so on. We propose a side information-driven imagecoding for hybrid machine-human vision (SICMH) framework, not only for machine vision tasks, but also for human vision-oriented image reconstruction. The proposed SICMH framework can perform image classification, semantic segmentation, and coarse image reconstruction by using purely the side information. Moreover, SICMH can perform fine image reconstruction by using the residue information. In particular, we propose a multi-scale feature fusion block to enhance the usage of side information, and a novel semantic segmentation network named modified TrSeg to generate better semantic segmentation maps. The experimental results well demonstrated the effectiveness of our proposed framework. SICMH achieves the same image classification and semantic segmentation accuracy as the existing traditional or learning-based multi-task ICM frameworks using the lowest bitrate. For the image reconstruction task, the proposed SICMH achieved the same PSNR as existing learning-based multi-task hybrid ICM frameworks and the traditional image codec BPG again with the lowest bitrate.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
Semantic feature compression aims to compress image features for downstream machine vision tasks without reconstructing image pixels. Such a task is very challenging since it needs to learn features which are not only...
详细信息
ISBN:
(纸本)9798350358483;9798350358490
Semantic feature compression aims to compress image features for downstream machine vision tasks without reconstructing image pixels. Such a task is very challenging since it needs to learn features which are not only useful for machine vision tasks, but also easy to compress. While existing learnable feature coding models utilize downstream task networks as teacher networks to guide the learning and compression of semantic features, they use simple entropy models and do not effectively reduce information redundancy. In this work, we propose a transformer-based spatial-channel auto-regressive feature context model (SC-AR FCM) to assist the entropy coding of learnable features. Through extensive experimentation on object detection and segmentation tasks, we demonstrate that the rate-accuracy performance of our proposed method surpasses traditional image compression techniques and state-of-the-art learning-based feature compression techniques.
image coding for machines (ICM) is developed to compress images with a focus on machine vision tasks rather than human perception. For ICM, It is very important to develop a universal codec adaptable to different mach...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
image coding for machines (ICM) is developed to compress images with a focus on machine vision tasks rather than human perception. For ICM, It is very important to develop a universal codec adaptable to different machine tasks. In this paper, we propose novel parallel task-prompts that can be easily adapted to various machine vision tasks without necessitating new networks or scratch training. Besides, Our parallel prompts are compatible with mainstream backbones such as transformers and convolutional neural networks, making them widely applicable across different model architectures. In order to fine-tune our task-prompts, we leverage a machine task network as the teacher net, guiding our student ICM network to efficiently compress feature maps for downstream machine tasks. Through extensive experimentation on object detection and segmentation, we demonstrate that our proposed method surpasses traditional image compression techniques and state-of-the-art learning-based feature compression techniques in terms of rate-accuracy performance.
In goal-oriented communications, the objective of the receiver is often to apply a Deep-Learning model, rather than reconstructing the original data. In this context, direct learning over compressed data, without any ...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
In goal-oriented communications, the objective of the receiver is often to apply a Deep-Learning model, rather than reconstructing the original data. In this context, direct learning over compressed data, without any prior decoding, holds promise for enhancing the time-efficient execution of inference models at the receiver. However, conventional entropic-coding methods like Huffman and Arithmetic break data structure, rendering them unsuitable for learning without decoding. In this paper, we propose an alternative approach in which entropic coding is realized with Low-Density Parity Check (LDPC) codes. We hypothesize that Deep Learning models can more effectively exploit the internal code structure of LDPC codes. At the receiver, we leverage a specific class of Recurrent Neural Networks (RNNs), specifically Gated Recurrent Unit (GRU), trained for image classification. Our numerical results indicate that classification based on LDPC-coded bit-planes surpasses Huffman and Arithmetic coding, while necessitating a significantly smaller learning model. This demonstrates the efficiency of classification directly from LDPC-coded data, eliminating the need for any form of decompression, even partial, prior to applying the learning model.
暂无评论