检索结果-内蒙古大学图书馆

International Workshop on Advanced Imaging Technology (IWAIT)

作者： Kim, Shin Lee, Yegi Lim, Hanshin Choo, Hyon-Gon Seo, Jeongil Yoon, Kyoungro Konkuk Univ Seoul 05029 South Korea Elect & Telecommun Res Inst Daejeon 61012 South Korea

ISBN: (数字)9781510653320

ISBN: (纸本)9781510653320;9781510653313

Recent development of intelligent object detection systems requires high-definition images for reliable detection accuracy performance, which can cause a high occupation problem of network bandwidth as well as archiving storage capacity. In this paper, we propose an objectness measure-based image compression method of thermal images for machine vision. Based on the objectness of a certain area, bounding box for the area with high objectness is adjusted in order not to affect the possible object detection performance and the image is compressed in a way that the area having a high objectness is compressed with lower compression ratio than other area. The experiments indicate that superior object detection accuracy at comparable BPP is accomplished using the proposed scheme to that of the state-of-the-art video compression method.

关键词： machine Vision Image Compression video coding for machine VVC Thermal Image

来源：评论

学校读者我要写书评

暂无评论

Towards coding for Human and machine Vision: Scalable Face Image coding

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2021年 23卷 2957-2971页

作者： Yang, Shuai Hu, Yueyu Yang, Wenhan Duan, Ling-Yu Liu, Jiaying Peking Univ Beijing 100080 Peoples R China

The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel face image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to reconstruct image with compact structure and color features, where sparse edges are extracted to connect both kinds of vision and a key reference pixel selection method is proposed to determine the priorities of the reference color pixels for scalable coding. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as an enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a decoding network to reconstruct images from compact structure and color representations, which is flexible to accept inputs in a scalable way and to control the imagery effect of the outputs between signal fidelity and visual realism. Experimental results and comprehensive performance analysis over the face image dataset demonstrate the superiority of our framework in both human vision tasks and machine vision tasks, which provide useful evidence on the emerging standardization efforts on MPEG VCM (video coding for machine).

关键词： Image coding machine vision Task analysis Image reconstruction Visualization Feature extraction Image color analysis Generative compression image coding scalable coding video coding for machine

来源：评论

学校读者我要写书评

暂无评论

Learning in Compressed Domain for Faster machine Vision Tasks

Learning in Compressed Domain for Faster Machine Vision Task...

引用

IEEE International Conference on Visual Communications and Image Processing (VCIP) - Visual Communications in the Era of AI and Limited Resources

作者： Liu, Jinming Sun, Heming Katto, Jiro Waseda Univ Dept Comp Sci & Commun Engn Tokyo Japan Waseda Univ Waseda Res Inst Sci & Engn Tokyo Japan JST PRESTO 4-1-8 Honcho Kawaguchi Saitama Japan

ISBN: (纸本)9781728185514

Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACS) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.

关键词： compressed domain image compression face alignment video coding for machine

来源：评论

学校读者我要写书评

暂无评论

An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation 4th

An End-to-End Mutual Enhancement Network Toward Image Compre...

引用

4th Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

作者： Chen, Junru Yao, Chao Liu, Meiqin Zhao, Yao Beijing Jiaotong Univ Inst Informat Sci Beijing 100044 Peoples R China Beijing Jiaotong Univ Beijing Key Lab Adv Informat Sci & Network Techno Beijing 100044 Peoples R China Univ Sci & Technol Beijing Sch Comp & Commun Engn Beijing 100083 Peoples R China

ISBN: (纸本)9783030880071;9783030880064

Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as object detection and semantic segmentation. How to jointly consider the human vision and the machine vision to compress images for human and machine vision tasks is still an open problem. In this paper, we provide a multi-task framework for image compression and semantic segmentation. More specifically, an end-to-end mutual enhancement network is designed to efficiently compress the given image, and simultaneously segment the semantic information. Firstly, a uniform feature learning strategy is adopted to jointly learn the features for image compression and semantic segmentation in the encoder. Moreover, a multi-scale aggregation module in the encoder is employed to enhance the semantic features. Then, by transmitting the quantified features, both the decompressed image features and the learned semantic features can be reconstructed. Finally, we decode this information for the image compression task and the semantic segmentation task. On one hand, we can utilize the decompressed semantic features to implement semantic segmentation in the decoder. On the other hand, the quality of the decompressed image can be further improved depending on the obtained semantic segmentation map. Experimental results prove that our framework is effective to simultaneously support image compression and semantic segmentation, both in the subjective and objective evaluation.

关键词： Learning-based compression video coding for machine Semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

TOWARDS coding FOR HUMAN AND machine VISION: A SCALABLE IMAGE coding APPROACH

TOWARDS CODING FOR HUMAN AND MACHINE VISION: A SCALABLE IMAG...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Hu, Yueyu Yang, Shuai Yang, Wenhan Duan, Ling-Yu Liu, Jiaying Peking Univ Beijing Peoples R China

ISBN: (纸本)9781728113319

The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to perform image reconstruction with features and additional reference pixels, in which compact edge maps are extracted in this work to connect both kinds of vision in a scalable way. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as a sort of enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels. Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection, which provide useful evidence on the emerging standardization efforts on MPEG VCM (video coding for machine)(1).

关键词： video coding for machine Image coding Scalable coding Generative Compression

来源：评论

学校读者我要写书评

暂无评论

AN EMERGING coding PARADIGM VCM: A SCALABLE coding APPROACH BEYOND FEATURE AND SIGNAL

AN EMERGING CODING PARADIGM VCM: A SCALABLE CODING APPROACH ...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Xia, Sifeng Liang, Kunchangtai Yang, Wenhan Duan, Ling-Yu Liu, Jiaying Peking Univ Beijing Peoples R China

ISBN: (纸本)9781728113319

In this paper, we study a new problem arising from the emerging MPEG standardization effort video coding for machine (VCM)(1), which aims to bridge the gap between visual feature compression and classical video coding. VCM is committed to address the requirement of compact signal representation for both machine and human vision in a more or less scalable way. To this end, we make endeavors in leveraging the strength of predictive and generative models to support advanced compression techniques for both machine and human vision tasks simultaneously, in which visual features serve as a bridge to connect signal-level and task-level compact representations in a scalable manner. Specifically, we employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames via a generative model, relying on the appearance of the coded key frames. Meanwhile, the sparse motion pattern is compact and highly effective for high-level vision tasks, e.g. action recognition. Experimental results demonstrate that our method yields much better reconstruction quality compared with the traditional video codecs (0.0063 gain in SSIM), as well as state-of-the-art action recognition performance over highly compressed videos (9.4% gain in recognition accuracy), which showcases a promising paradigm of coding signal for both human and machine vision.

关键词： video coding for machine joint feature and video compression human vision sparse motion pattern frame generation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：