Recent development of intelligent object detection systems requires high-definition images for reliable detection accuracy performance, which can cause a high occupation problem of network bandwidth as well as archivi...
详细信息
ISBN:
(数字)9781510653320
ISBN:
(纸本)9781510653320;9781510653313
Recent development of intelligent object detection systems requires high-definition images for reliable detection accuracy performance, which can cause a high occupation problem of network bandwidth as well as archiving storage capacity. In this paper, we propose an objectness measure-based image compression method of thermal images for machine vision. Based on the objectness of a certain area, bounding box for the area with high objectness is adjusted in order not to affect the possible object detection performance and the image is compressed in a way that the area having a high objectness is compressed with lower compression ratio than other area. The experiments indicate that superior object detection accuracy at comparable BPP is accomplished using the proposed scheme to that of the state-of-the-art video compression method.
The past decades have witnessed the rapid development of image and videocoding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image...
详细信息
The past decades have witnessed the rapid development of image and videocoding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/videocoding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel face image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to reconstruct image with compact structure and color features, where sparse edges are extracted to connect both kinds of vision and a key reference pixel selection method is proposed to determine the priorities of the reference color pixels for scalable coding. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as an enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a decoding network to reconstruct images from compact structure and color representations, which is flexible to accept inputs in a scalable way and to control the imagery effect of the outputs between signal fidelity and visual realism. Experimental results and comprehensive performance analysis over the face image dataset demonstrate the superiority of our framework in both human vision tasks and machine vision tasks, which provide useful evidence on the emerging standardization efforts on MPEG VCM (video coding for machine).
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on...
详细信息
ISBN:
(纸本)9781728185514
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACS) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as obje...
详细信息
ISBN:
(纸本)9783030880071;9783030880064
Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as object detection and semantic segmentation. How to jointly consider the human vision and the machine vision to compress images for human and machine vision tasks is still an open problem. In this paper, we provide a multi-task framework for image compression and semantic segmentation. More specifically, an end-to-end mutual enhancement network is designed to efficiently compress the given image, and simultaneously segment the semantic information. Firstly, a uniform feature learning strategy is adopted to jointly learn the features for image compression and semantic segmentation in the encoder. Moreover, a multi-scale aggregation module in the encoder is employed to enhance the semantic features. Then, by transmitting the quantified features, both the decompressed image features and the learned semantic features can be reconstructed. Finally, we decode this information for the image compression task and the semantic segmentation task. On one hand, we can utilize the decompressed semantic features to implement semantic segmentation in the decoder. On the other hand, the quality of the decompressed image can be further improved depending on the obtained semantic segmentation map. Experimental results prove that our framework is effective to simultaneously support image compression and semantic segmentation, both in the subjective and objective evaluation.
The past decades have witnessed the rapid development of image and videocoding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image...
详细信息
ISBN:
(纸本)9781728113319
The past decades have witnessed the rapid development of image and videocoding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/videocoding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to perform image reconstruction with features and additional reference pixels, in which compact edge maps are extracted in this work to connect both kinds of vision in a scalable way. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as a sort of enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels. Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection, which provide useful evidence on the emerging standardization efforts on MPEG VCM (video coding for machine)(1).
In this paper, we study a new problem arising from the emerging MPEG standardization effort video coding for machine (VCM)(1), which aims to bridge the gap between visual feature compression and classical videocoding...
详细信息
ISBN:
(纸本)9781728113319
In this paper, we study a new problem arising from the emerging MPEG standardization effort video coding for machine (VCM)(1), which aims to bridge the gap between visual feature compression and classical videocoding. VCM is committed to address the requirement of compact signal representation for both machine and human vision in a more or less scalable way. To this end, we make endeavors in leveraging the strength of predictive and generative models to support advanced compression techniques for both machine and human vision tasks simultaneously, in which visual features serve as a bridge to connect signal-level and task-level compact representations in a scalable manner. Specifically, we employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames via a generative model, relying on the appearance of the coded key frames. Meanwhile, the sparse motion pattern is compact and highly effective for high-level vision tasks, e.g. action recognition. Experimental results demonstrate that our method yields much better reconstruction quality compared with the traditional video codecs (0.0063 gain in SSIM), as well as state-of-the-art action recognition performance over highly compressed videos (9.4% gain in recognition accuracy), which showcases a promising paradigm of coding signal for both human and machine vision.
暂无评论