检索结果-内蒙古大学图书馆

video coding for machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年第7期46卷 5174-5191页

作者： Yang, Wenhan Huang, Haofeng Hu, Yueyu Duan, Ling-Yu Liu, Jiaying Peking Univ Beijing 100871 Peoples R China Peng Cheng Lab Shenzhen 518000 Guangdong Peoples R China Peking Univ Natl Engn Res Ctr Visual Technol Sch Comp Sci Beijing 100871 Peoples R China

As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, video coding for machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior i

关键词： Task analysis Machine vision Feature extraction Image coding video coding Optimization Encoding video coding for machines analytics taxonomy compact visual representation multiple tasks codebook-hyperprior

来源：评论

学校读者我要写书评

暂无评论

Perceptual video coding for machines via Satisfied Machine Ratio Modeling

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年第12期46卷 7651-7668页

作者： Zhang, Qi Wang, Shanshe Zhang, Xinfeng Jia, Chuanmin Wang, Zhao Ma, Siwei Gao, Wen Peking Univ Sch Comp Sci Natl Engn Res Ctr Visual Technol Beijing 100871 Peoples R China Univ Chinese Acad Sci Sch Comp Sci & Technol Beijing 101408 Peoples R China Peking Univ Wangxuan Inst Comp Technol Beijing 100871 Peoples R China PengCheng Lab Shenzhen 518066 Peoples R China

video coding for machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is derived from machine perceptual differences between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and create a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep feature differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality. Extensive experiments demonstrate that SMR models significantly improve compression performance for machines and exhibit robust generalizability on unseen machines, codecs, datasets, and frame types.

关键词： video coding for machines perceptual coding just noticeable difference satisfied user ratio

来源：评论

学校读者我要写书评

暂无评论

Distance-based feature repack algorithm for video coding for machines

引用

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION 2024年 102卷

作者： Zhang, Yuan Gong, Xiaoli Yu, Hualong Wu, Zijun Yu, Lu Zhejiang Univ 38 Zheda Rd Hangzhou Zhejiang Peoples R China China Telecom 1835 South Pudong Rd Shanghai Peoples R China Xi An Jiao Tong Univ Xian Shaanxi Peoples R China

Nowadays, the use of video data for machine (VCM) tasks has become increasingly prevalent, with deep learning and computer vision requiring large volumes of video data for object detection, object tracking, and other tasks. However, the features required for machine tasks are different from those used by humans, and a new approach is needed to encode and compress video data for machine consumption. video coding for machines has received considerable attention, with many approaches focusing on compressing features rather than the video itself. However, a key challenge in this process is repacking the features in an efficient and effective manner. This paper proposes a distance -based patch tiling and intra-block quilting method to repack feature sequences in a manner that is better suited for existing video codecs, based on statistical analysis of feature characteristics in the channel dimension. Experimental results demonstrate that our method achieves an 65.54% BD -rate gain compared to benchmark methods. This research has significant implications for improving the efficiency of video coding for machine applications, and future work could explore the use of feature dimensionality reduction and combination of neural network (NN) codec to optimize the repacking of features for compression.

关键词： Collaborative intelligence Deep feature compression H.265/HEVC Action recognition video coding for machines

来源：评论

学校读者我要写书评

暂无评论

Adaptive spatial down-sampling method based on object occupancy distribution for video coding for machines

引用

EURASIP JOURNAL ON IMAGE AND video PROCESSING 2024年第1期2024卷 36页

作者： An, Eun-bin Kim, Ayoung Jung, Soon-heung Kwak, Sangwoon Lee, Jin Young Cheong, Won-Sik Choo, Hyon-Gon Seo, Kwang-deok Yonsei Univ Div Software 1 Yonseidae Gil Wonju 26493 Gangwon South Korea Elect & Telecommun Res Inst Media Res Div 218 Gajeong Ro Daejeon 34129 South Korea

As the performance of machine vision continues to improve, it is being used in various industrial fields to analyze and generate massive amounts of video data. Although the demand for and consumption of video data by machines has increased significantly, video coding for machines needs to be improved. It is therefore necessary to consider a new codec that differs from conventional codecs based on the human visual system (HVS). Spatial down-sampling plays a critical role in video coding for machines because it reduces the volume of the video data to be processed while maintaining the shape of the data's features that are important for the machine to reference when processing the video. An effective method of determining the intensity of spatial down-sampling as an efficient coding tool for machines is still in the early stages. Here, we propose a method of determining an optimal scale factor for spatial down-sampling by collecting and analyzing information on the number of objects and the ratio of the area occupied by the object within a picture. We compare the data reduction ratio to the machine accuracy error ratio (DRAER) to evaluate the performance of the proposed method. By applying the proposed method, the DRAER was found to be a maximum of 21.40 dB and a minimum of 11.94 dB. This shows that video coding gain for the machines could be achieved through the proposed method while maintaining the accuracy of machine vision tasks.

关键词： Spatial down-sampling video coding for machines Object occupancy distribution video object detection Machine vision

来源：评论

学校读者我要写书评

暂无评论

Suboptimal video coding for machines method based on selective activation of in-loop filter

引用

ETRI JOURNAL 2024年第3期46卷 538-549页

作者： Kim, Ayoung An, Eun-Vin Jung, Soon-heung Choo, Hyon-Gon Seo, Jeongil Seo, Kwang-deok Yonsei Univ Div Software Wonju South Korea Elect & Telecommun Res Inst Media Res Div Daejeon South Korea Dong A Univ Dept Comp Engn Busan South Korea

A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that increases the compression efficiency and maintains the accuracy of machine vision tasks must be devised. Hence, the Moving Picture Experts Group created a standardization process for video coding for machines (VCM) to reduce bitrates while maintaining the accuracy of machine vision tasks. In particular, in-loop filters have been developed for improving the subjective quality and machine vision task accuracy. However, the high computational complexity of in-loop filters limits the development of a high-performance VCM architecture. We analyze the effect of an in-loop filter on the VCM performance and propose a suboptimal VCM method based on the selective activation of in-loop filters. The proposed method reduces the computation time for video coding by approximately 5% when using the enhanced compression model and 2% when employing a Versatile video coding test model while maintaining the machine vision accuracy and compression efficiency of the VCM architecture.

关键词： Beyond VVC enhanced compression model Versatile video coding video coding for machines

来源：评论

学校读者我要写书评

暂无评论

Feature Map Compression for video coding for machines Based on Receptive Block Based Principal Component Analysis

引用

IEEE ACCESS 2023年 11卷 26308-26319页

作者： Lee, Minhun Choi, Hansol Kim, Jihoon Do, Jihoon Kwon, Hyoungjin Jeong, Se Yoon Sim, Donggyu Oh, Seoung-Jun Kwangwoon Univ Dept Comp Engn Seoul 139701 South Korea Elect & Telecommun Res Inst Media Coding Res Sect Daejeon 34129 South Korea Kwangwoon Univ Dept Elect Engn Seoul 139701 South Korea

This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of video coding for machines, which is an emerging technology for future machine consumption applications. Notably, most extant studies compress a single feature map and hence cannot entirely consider both global and local information within the feature map. This limits performance maintenance during machine consumption tasks that analyze objects with various sizes in images/videos. To address this problem, a multiscale feature map compression method is proposed that consists of two major processes: receptive block based principal component analysis (RPCA) and uniform integer quantization. The RPCA derives the complete basis kernels of a feature map by selecting a set of major basis kernels that can represent a sufficient percentage of global or local information according to the variable-size receptive blocks of each feature map. After transforming each feature map using the set of major basis kernels, a uniform integer quantizer converts the 32-bit floating-point values of the set of major basis kernels, corresponding RPCA coefficients, and a mean vector to five-bit integer representation values. Experiment results reveal that the proposed method reduces the amount of feature maps by 99.30% with a loss of 8.30% in the average precision (AP) on the OpenImageV6 dataset and 0.77% in AP(M) and 0.47% in AP(L) on the MS COCO 2017 validation set while outperforming previous PCA-based feature map compression methods even at higher compression rates.

关键词： Task analysis Feature extraction Image coding Transform coding video coding Principal component analysis Object detection Moving picture experts group video coding for machines convolutional neural network principal component analysis feature map compression

来源：评论

学校读者我要写书评

暂无评论

An Efficient Multi-Scale Feature Compression With QP-Adaptive Feature Channel Truncation for video coding for machines

引用

IEEE ACCESS 2023年 11卷 92443-92458页

作者： Yoon, Yong-Uk Kim, Dongha Lee, Jooyoung Oh, Byung Tae Kim, Jae-Gon Korea Aerosp Univ Dept Elect & Informat Engn Goyang Si 10540 Gyeonggi Do South Korea Elect & Telecommun Res Inst Daejeon 34129 South Korea

Machine vision-based intelligent applications that analyze video data collected by machines are rapidly increasing. Therefore, it is essential to efficiently compress a large volume of video data for machine consumption. Accordingly, the Moving Picture Experts Group (MPEG) has been developing a new video coding standard called video coding for machines (VCM), aimed at video consumed by machines rather than humans. Recently, studies have demonstrated that multi-scale feature compression (MSFC)-based feature compression methods significantly improve the performance of MPEG-VCM. This paper proposes an efficient MSFC (eMSFC) method with quantization parameter (QP)-adaptive feature channel truncation. The proposed eMSFC incorporates an MSFC network with a selective learning strategy (SLS) and Versatile video coding (VVC)-based compression. The SLS extracts a single-scale feature from the input image, arranged in order of channel-wise importance. The size of the single-scale feature is adaptively adjusted by truncating the feature channels according to the QP. The truncated feature is efficiently compressed using VVC. Compared to the VCM feature anchor, the experimental results reveal that the proposed method provides a 98.72%, 98.34%, and 98.04% Bjontegaard delta rate gain for machine vision tasks of instance segmentation, object detection, and object tracking, respectively. The proposed method performed best among the "Call for Evidence" response technologies in MPEG-VCM.

关键词： Machine vision multi-scale feature compression selective learning strategy video coding for machines

来源：评论

学校读者我要写书评

暂无评论

video coding for machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation 24

Video Coding for Machines: Large-Scale Evaluation of Deep Ne...

引用

IEEE 24th International Workshop on Multimedia Signal Processing (MMSP)

作者： Marie, Alban Desnos, Karol Morin, Luce Zhang, Lu Univ Rennes CNRS INSA Rennes IETR UMR6164 Rennes France

ISBN: (数字)9781665471893

ISBN: (纸本)9781665471893

In the video coding for machines (VCM) context where visual content is compressed before being transmitted to a vision task algorithm, appropriate trade-off between the compression level and the vision task performance must be chosen. In this paper, a Deep Neural Networks (DNN) based semantic segmentation algorithm robustness to compression artifacts is evaluated with a total of 1486 different coding configurations. Results indicate the importance of using an appropriate image resolution to overcome the block-partitioning limitations in existing compression algorithms, allowing 58.3%, 49.8%, 33.5% and 24.3% bitrate savings at equivalent prediction accuracy for JPEG, JM, x265 and VVenC, respectively. Surprisingly, JPEG can achieve 73.41% bitrate reduction with the inclusion of compressed images at training time over VVC Test Model (VTM) with a DNN trained on pristine data, which implies that DNN generalization ability must not be overlooked.

关键词： video coding for machines Machine-to-Machine communication Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

PICTURE PARTITIONING DESIGN OF NEURAL NETWORK-BASED INTRA coding FOR video coding for machines 31

PICTURE PARTITIONING DESIGN OF NEURAL NETWORK-BASED INTRA CO...

引用

2024 International Conference on Image Processing

作者： Chono, Keiichi Niwa, Naoya Iwasaki, Hiroe NEC Corp Ltd Minato Tokyo Japan Tokyo Univ Agr & Technol Tokyo Japan

ISBN: (纸本)9798350349405;9798350349399

This paper presents a picture partitioning design of Neural Network-based Intra coding (NNIC) for video coding for machines (VCM). The proposed design introduces adaptive auto-encoder and probability model processing, and a new unit for the partitions of a NNIC picture. In conscious with the causality of the transmission order of the partitions, the adaptive auto-encoder processing exploits more the correlations of pixel values around the partition boundaries than a conventional design. Therefore it can bring coding gains while keeping low-delay video transmission capability. The adaptive probability model processing allows both encoder and decoder to start their entropy coding and decoding with the same delay as the conventional design. The new unit makes the picture partition signaling of NNIC compatible with that of Versatile video coding (VVC) forming the inner video coding of VCM. Simulation results show that, compared to the conventional, the proposal can attain the bit rate reduction of 11% on average with respect to machine tasks.

关键词： video coding for machines neural network picture partitioning low-delay communication

来源：评论

学校读者我要写书评

暂无评论

Recent Advances in video coding for machines Standard and Technologies

引用

ZTE Communications 2024年第1期22卷 62-76页

作者： ZHANG Qiang MEI Junjun GUAN Tao SUN Zhewen ZHANG Zixiang YU Li State Key Laboratory of Mobile Network and Mobile Multimedia Technology Shenzhen 518055China ZTE Corporation Shenzhen 518057China Huazhong University of Science and Technology Wuhan 430074China

To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization *** this paper,recent advances in video coding for machine standards are presented and comprehensive introductions to the use cases,requirements,evaluation frameworks and corresponding metrics of the VCM standard are *** the existing methods are presented,introducing the existing proposals by category and the research progress of the latest VCM ***,we give conclusions.

关键词： video coding for machines VCM video compression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：