As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, video coding for machines (VCM) is committed to bridging to an extent separate research trac...
详细信息
As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, video coding for machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior i
video coding for machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteris...
详细信息
video coding for machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is derived from machine perceptual differences between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and create a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep feature differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality. Extensive experiments demonstrate that SMR models significantly improve compression performance for machines and exhibit robust generalizability on unseen machines, codecs, datasets, and frame types.
Nowadays, the use of video data for machine (VCM) tasks has become increasingly prevalent, with deep learning and computer vision requiring large volumes of video data for object detection, object tracking, and other ...
详细信息
Nowadays, the use of video data for machine (VCM) tasks has become increasingly prevalent, with deep learning and computer vision requiring large volumes of video data for object detection, object tracking, and other tasks. However, the features required for machine tasks are different from those used by humans, and a new approach is needed to encode and compress video data for machine consumption. video coding for machines has received considerable attention, with many approaches focusing on compressing features rather than the video itself. However, a key challenge in this process is repacking the features in an efficient and effective manner. This paper proposes a distance -based patch tiling and intra-block quilting method to repack feature sequences in a manner that is better suited for existing video codecs, based on statistical analysis of feature characteristics in the channel dimension. Experimental results demonstrate that our method achieves an 65.54% BD -rate gain compared to benchmark methods. This research has significant implications for improving the efficiency of videocoding for machine applications, and future work could explore the use of feature dimensionality reduction and combination of neural network (NN) codec to optimize the repacking of features for compression.
As the performance of machine vision continues to improve, it is being used in various industrial fields to analyze and generate massive amounts of video data. Although the demand for and consumption of video data by ...
详细信息
As the performance of machine vision continues to improve, it is being used in various industrial fields to analyze and generate massive amounts of video data. Although the demand for and consumption of video data by machines has increased significantly, video coding for machines needs to be improved. It is therefore necessary to consider a new codec that differs from conventional codecs based on the human visual system (HVS). Spatial down-sampling plays a critical role in video coding for machines because it reduces the volume of the video data to be processed while maintaining the shape of the data's features that are important for the machine to reference when processing the video. An effective method of determining the intensity of spatial down-sampling as an efficient coding tool for machines is still in the early stages. Here, we propose a method of determining an optimal scale factor for spatial down-sampling by collecting and analyzing information on the number of objects and the ratio of the area occupied by the object within a picture. We compare the data reduction ratio to the machine accuracy error ratio (DRAER) to evaluate the performance of the proposed method. By applying the proposed method, the DRAER was found to be a maximum of 21.40 dB and a minimum of 11.94 dB. This shows that videocoding gain for the machines could be achieved through the proposed method while maintaining the accuracy of machine vision tasks.
A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that i...
详细信息
A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that increases the compression efficiency and maintains the accuracy of machine vision tasks must be devised. Hence, the Moving Picture Experts Group created a standardization process for video coding for machines (VCM) to reduce bitrates while maintaining the accuracy of machine vision tasks. In particular, in-loop filters have been developed for improving the subjective quality and machine vision task accuracy. However, the high computational complexity of in-loop filters limits the development of a high-performance VCM architecture. We analyze the effect of an in-loop filter on the VCM performance and propose a suboptimal VCM method based on the selective activation of in-loop filters. The proposed method reduces the computation time for videocoding by approximately 5% when using the enhanced compression model and 2% when employing a Versatile videocoding test model while maintaining the machine vision accuracy and compression efficiency of the VCM architecture.
This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of video coding for machines, which is an emerging technology for ...
详细信息
This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of video coding for machines, which is an emerging technology for future machine consumption applications. Notably, most extant studies compress a single feature map and hence cannot entirely consider both global and local information within the feature map. This limits performance maintenance during machine consumption tasks that analyze objects with various sizes in images/videos. To address this problem, a multiscale feature map compression method is proposed that consists of two major processes: receptive block based principal component analysis (RPCA) and uniform integer quantization. The RPCA derives the complete basis kernels of a feature map by selecting a set of major basis kernels that can represent a sufficient percentage of global or local information according to the variable-size receptive blocks of each feature map. After transforming each feature map using the set of major basis kernels, a uniform integer quantizer converts the 32-bit floating-point values of the set of major basis kernels, corresponding RPCA coefficients, and a mean vector to five-bit integer representation values. Experiment results reveal that the proposed method reduces the amount of feature maps by 99.30% with a loss of 8.30% in the average precision (AP) on the OpenImageV6 dataset and 0.77% in AP(M) and 0.47% in AP(L) on the MS COCO 2017 validation set while outperforming previous PCA-based feature map compression methods even at higher compression rates.
Machine vision-based intelligent applications that analyze video data collected by machines are rapidly increasing. Therefore, it is essential to efficiently compress a large volume of video data for machine consumpti...
详细信息
Machine vision-based intelligent applications that analyze video data collected by machines are rapidly increasing. Therefore, it is essential to efficiently compress a large volume of video data for machine consumption. Accordingly, the Moving Picture Experts Group (MPEG) has been developing a new videocoding standard called video coding for machines (VCM), aimed at video consumed by machines rather than humans. Recently, studies have demonstrated that multi-scale feature compression (MSFC)-based feature compression methods significantly improve the performance of MPEG-VCM. This paper proposes an efficient MSFC (eMSFC) method with quantization parameter (QP)-adaptive feature channel truncation. The proposed eMSFC incorporates an MSFC network with a selective learning strategy (SLS) and Versatile videocoding (VVC)-based compression. The SLS extracts a single-scale feature from the input image, arranged in order of channel-wise importance. The size of the single-scale feature is adaptively adjusted by truncating the feature channels according to the QP. The truncated feature is efficiently compressed using VVC. Compared to the VCM feature anchor, the experimental results reveal that the proposed method provides a 98.72%, 98.34%, and 98.04% Bjontegaard delta rate gain for machine vision tasks of instance segmentation, object detection, and object tracking, respectively. The proposed method performed best among the "Call for Evidence" response technologies in MPEG-VCM.
In the video coding for machines (VCM) context where visual content is compressed before being transmitted to a vision task algorithm, appropriate trade-off between the compression level and the vision task performanc...
详细信息
ISBN:
(数字)9781665471893
ISBN:
(纸本)9781665471893
In the video coding for machines (VCM) context where visual content is compressed before being transmitted to a vision task algorithm, appropriate trade-off between the compression level and the vision task performance must be chosen. In this paper, a Deep Neural Networks (DNN) based semantic segmentation algorithm robustness to compression artifacts is evaluated with a total of 1486 different coding configurations. Results indicate the importance of using an appropriate image resolution to overcome the block-partitioning limitations in existing compression algorithms, allowing 58.3%, 49.8%, 33.5% and 24.3% bitrate savings at equivalent prediction accuracy for JPEG, JM, x265 and VVenC, respectively. Surprisingly, JPEG can achieve 73.41% bitrate reduction with the inclusion of compressed images at training time over VVC Test Model (VTM) with a DNN trained on pristine data, which implies that DNN generalization ability must not be overlooked.
This paper presents a picture partitioning design of Neural Network-based Intra coding (NNIC) for video coding for machines (VCM). The proposed design introduces adaptive auto-encoder and probability model processing,...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
This paper presents a picture partitioning design of Neural Network-based Intra coding (NNIC) for video coding for machines (VCM). The proposed design introduces adaptive auto-encoder and probability model processing, and a new unit for the partitions of a NNIC picture. In conscious with the causality of the transmission order of the partitions, the adaptive auto-encoder processing exploits more the correlations of pixel values around the partition boundaries than a conventional design. Therefore it can bring coding gains while keeping low-delay video transmission capability. The adaptive probability model processing allows both encoder and decoder to start their entropy coding and decoding with the same delay as the conventional design. The new unit makes the picture partition signaling of NNIC compatible with that of Versatile videocoding (VVC) forming the inner videocoding of VCM. Simulation results show that, compared to the conventional, the proposal can attain the bit rate reduction of 11% on average with respect to machine tasks.
To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization *** this paper,recent advances in vid...
详细信息
To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization *** this paper,recent advances in videocoding for machine standards are presented and comprehensive introductions to the use cases,requirements,evaluation frameworks and corresponding metrics of the VCM standard are *** the existing methods are presented,introducing the existing proposals by category and the research progress of the latest VCM ***,we give conclusions.
暂无评论