Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly ...
详细信息
ISBN:
(纸本)9781728185514
Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.
Machine intelligence systems have been increasingly widely deployed in real-world circumstances, while the conventional human-vision oriented videocoding schemes are inefficient to be embedded in large-scale systems ...
详细信息
ISBN:
(纸本)9781665484855
Machine intelligence systems have been increasingly widely deployed in real-world circumstances, while the conventional human-vision oriented videocoding schemes are inefficient to be embedded in large-scale systems and further support a wide range of applications. There have been urgent demands for a new generation of compression framework to efficiently encodes visual data, where the compression and analytics for machine vision and human perception can be jointly optimized. To this end, we propose a novel visual compression framework to provide visual contents with different granularity for both human and machine vision tasks collaboratively. The proposed scalable compression framework maintains the critical semantic information in a basic layer, so that it is capable of supporting the accurate machine vision analysis under a tight bit-rate constraint. It is scalable to provide visual representations of different granularity to support various kinds of tasks, including video reconstruction that serves human vision examination. Experimental results on the humancentered videos have demonstrated the promising functionality of scalable visual coding with improved efficiency for high-performance machine analysis and human perception.
As computer vision technologies have tremendously improved over the last decade, videos and images are often consumed by machines instead of humans which are the main target for traditional video codecs. In many use c...
详细信息
ISBN:
(纸本)9781665437349
As computer vision technologies have tremendously improved over the last decade, videos and images are often consumed by machines instead of humans which are the main target for traditional video codecs. In many use cases, although machines are the main consumers, human involvement is also required, or even mandatory. In this paper, we propose a novel image coding technique targeted for machines, while maintaining the capability for human consumption. Our proposed codec generates two bitstreams: one bitstream from a traditional codec, referred to as human bitstream, optimized for human consumption;the other bitstream, referred to as machine bitstream, generated from an end-to-end learned neural network-based codec and optimized for machine tasks. Instead of working on the image domain, the proposed machine bitstream is derived from feature residuals - the difference between the features extracted from the input image and the features extracted from the reconstructed image generated by the traditional codec. With the help of the machine bitstream, we can significantly improve machine task performance in the low bitrate range. Our system beats the state-of-the-art traditional codec, the Versatile videocoding (VVC/H.266), achieving -40.5% in Bjontegaard delta bitrate reduction on average for bitrates up to 0.07 BPP.
Saliency-driven image and videocoding for humans has gained importance in the recent past. In this paper, we pro-pose such a saliency-driven coding framework for the video coding for machines task using the latest vi...
详细信息
ISBN:
(纸本)9781728176055;9781728176062
Saliency-driven image and videocoding for humans has gained importance in the recent past. In this paper, we pro-pose such a saliency-driven coding framework for the video coding for machines task using the latest videocoding standard Versatile videocoding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.
Deep neural object detection or segmentation networks are commonly trained with pristine, uncompressed data. However, in practical applications the input images are usually deteriorated by compression that is applied ...
详细信息
Deep neural object detection or segmentation networks are commonly trained with pristine, uncompressed data. However, in practical applications the input images are usually deteriorated by compression that is applied to efficiently transmit the data. Thus, we propose to add deteriorated images to the training process in order to increase the robustness of the two state-of-the-art networks Faster and Mask R-CNN. Throughout our paper, we investigate an autonomous driving scenario by evaluating the newly trained models on the Cityscapes dataset that has been compressed with the upcoming videocoding standard Versatile videocoding (VVC). When employing the models that have been trained with the proposed method, the weighted average precision of the R-CNNs can be increased by up to 3.68 percentage points for compressed input images, which corresponds to bitrate savings of nearly 48 %.
We present an object labelled dataset called SFU-HW-Objects-v1, which contains object labels for a set of raw video sequences. The dataset can be useful for the cases where both object detection accuracy and video cod...
详细信息
We present an object labelled dataset called SFU-HW-Objects-v1, which contains object labels for a set of raw video sequences. The dataset can be useful for the cases where both object detection accuracy and videocoding efficiency need to be evaluated on the same dataset. Object ground-truths for 18 of the High Efficiency videocoding (HEVC) v1 Common Test Conditions (CTC) sequences have been labelled. The object categories used for the labeling are based on the Common Objects in Context (COCO) labels. A total of 21 object classes are found in test sequences, out of the 80 original COCO label classes. Brief descriptions of the labeling process and the structure of the dataset are presented. (C) 2020 The Authors. Published by Elsevier Inc.
暂无评论