检索结果-内蒙古大学图书馆

2024 International Conference on image Processing

作者： Shen, Xuelin Ou, Haoqiao Yang, Wenhan Guangdong Lab Artificial Intelligence & Digital E Shenzhen Guangdong Peoples R China PengCheng Lab Shenzhen Guangdong Peoples R China Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen Peoples R China

ISBN: (纸本)9798350349405;9798350349399

Among various technical approaches in machine vision coding, image coding for machine (ICM) stands out for its capability to simultaneously fulfill both human perception and machine vision needs. However, it is often criticized for its lack of efficiency regarding rate-analytics performance. In this paper, we propose an Appearance Redundancy Reduction (ARR) module, designed to function as a plug-in for existing ICM frameworks, aiming to further enhance the coding efficiency regarding rate-analytics without any changes to the ICM itself. To be specific, our work pays additional attention to the intrinsic correlation between the low-level image structure and high-level vision analytics, and subsequently proposes a novel colour quantization mechanism to squeeze out the analytics-free redundant appearance information. Moreover, a differentiable soften quantization operation is derived to enable end-to-end training within the ICM framework. Extensive experimental results have shown that integrating the proposed ARR module yields substantial improvements regarding rate-analytic performance, even surpassing the performance of the feature coding paradigm, while maintaining the generalizability across different tasks and acceptable perceptual representation.

关键词： image coding for machine colour distillation machine vision image compression

来源：评论

学校读者我要写书评

暂无评论

Tell Codec What Worth Compressing: Semantically Disentangled image coding for machine with LMMs

Tell Codec What Worth Compressing: Semantically Disentangled...

引用

2024 Conference on Visual Communications and image Processing

作者： Liu, Jinming Wei, Yuntao Lin, Junyan Zhao, Shengyang Sun, Heming Chen, Zhibo Zeng, Wenjun Jin, Xin Shanghai Jiao Tong Univ Shanghai Peoples R China Ningbo Inst Digital Twin Eastern Inst Technol Ningbo Peoples R China Univ Sci & Technol China Hefei Peoples R China Yokohama Natl Univ Yokohama Kanagawa Japan

ISBN: (纸本)9798331529543;9798331529550

We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to tell codec what to compress: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly encode and transmit objects of the image in order with a structured bitstream. In this way, diverse vision benchmarks including image classification, object detection, instance segmentation, etc., can be well supported with such a semantically structured bitstream. We dub our method "SDComp" for "Semantically Disentangled Compression", and compare it with state-of-the-art codecs on a wide variety of different vision tasks. SDComp codec leads to more flexible reconstruction results, promised decoded visual quality, and a more generic/satisfactory intelligent task-supporting ability.

关键词： image coding for machine Large Multimodal Model Semantically Structured Bitstream

来源：评论

学校读者我要写书评

暂无评论

Human-machine Collaborative image Compression Method Based on Implicit Neural Representations

引用

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS 2024年第2期14卷 198-208页

作者： Li, Huanyang Zhang, Xinfeng Univ Chinese Acad Sci Sch Comp Sci & Technol Beijing 100049 Peoples R China

With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model's perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.

关键词： image compression image coding for machine implicit neural representation

来源：评论

学校读者我要写书评

暂无评论

Unified and Scalable Deep image Compression Framework for Human and machine

引用

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 2024年第10期20卷 1-22页

作者： Zhang, Gai Zhang, Xinfeng Tang, Lv Univ Chinese Acad Sci Beijing Peoples R China

image compression aims to minimize the amount of data in image representation while maintaining a certain visual quality for humans, which is an essential technique for storage and transmission. Recently, along with the development of computer vision, machines have become another primary receiver for images and require compressed images at a certain quality level, which may be different from that of human vision. In many scenarios, compressed images should serve both human and machine vision tasks, but few compression methods are designed for both goals simultaneously. In this article, we propose a unified and scalable deep image compression (USDIC) framework that jointly optimizes the image quality according to human and machine vision in an end-to-end style. For the encoder, we propose an information splitting mechanism (ISM) to separate images into semantic and visual features, which mainly aims at machine analysis and human viewing tasks. For the decoder, we design a scalable decoding architecture. The encoded semantic feature is first decoded for machine analysis tasks, and the image is decoded and reconstructed further by leveraging the decoded semantic features. Herein, to further remove the redundancy between the semantic and visual features of images, we propose a scalable entropy model (SEM) with a joint optimization strategy to reconstruct the image using the two kinds of decoded features. Extensive experimental results show that the proposed USDIC achieves much better performance on the image analysis task while maintaining competitive performance on the traditional image reconstruction task compared with popular image compression methods.

关键词： Deep-learning based image compression image coding for machine scalable coding

来源：评论

学校读者我要写书评

暂无评论

image coding FOR ANALYTICS VIA ADVERSARIALLY AUGMENTED ADAPTATION 49

IMAGE CODING FOR ANALYTICS VIA ADVERSARIALLY AUGMENTED ADAPT...

引用

49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Shen, Xuelin Yin, Kangsheng Wang, Xu He, Yulin Wang, Shiqi Yang, Wenhan Guangdong Lab Artificial Intelligence & Digital E Shenzhen Guangdong Peoples R China Shenzhen Univ Shenzhen Peoples R China City Univ Hong Kong Hong Kong Peoples R China Peng Cheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350344868;9798350344851

image coding for machine (ICM) aims to compress an image so that the reconstructed one can meet the requirements of both human vision and machine vision. Existing methods apply the constraint from the downstream models to improve machine analytics performance while compromising the visual quality. This paper proposes a novel adversarially augmented adaptation route that achieves a better trade-off between the utility of the human and machine perspectives by making slight changes to the image manifold. In detail, a targeted adversarial attack is employed to generate subtle image perturbations that are nearly imperceptible to humans but significantly improve machine analytic performance. These perturbed images would be subsequently employed as ground truth to guide training/fine-tuning of an end-to-end image compression network. Note that, our method is a plug-and-play framework that does not rely on any change in existing architecture or loss functions. Extensive experimental results demonstrate the superiority of the proposed scheme over conventional ICM frameworks and the effectiveness of our design.

关键词： image coding for machine machine vision Targeted adversarial attack machine vision coding

来源：评论

学校读者我要写书评

暂无评论

Learn A Compression for Objection Detection - VAE with a Bridge

Learn A Compression for Objection Detection - VAE with a Bri...

引用

IEEE International Conference on Visual Communications and image Processing (VCIP) - Visual Communications in the Era of AI and Limited Resources

作者： Mei, Yixin Li, Fan Li, Li Li, Zhu Xi An Jiao Tong Univ Sch Informat & Commun Engn Xian Peoples R China Univ Sci & Technol China Dept Elect Engn & Informat Sci Hefei Peoples R China Univ Missouri Dept Comp Sci & Elect Engn Kansas City MO 64110 USA

ISBN: (纸本)9781728185514

Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.

关键词： image coding for machine object detection learning-based image compression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：