检索结果-内蒙古大学图书馆

A video Dataset for Learning-based Visual Data Compression and Analysis

A Video Dataset for Learning-based Visual Data Compression a...

IEEE International Conference on Visual Communications and Image Processing (VCIP) - Visual Communications in the Era of AI and Limited Resources

作者： Xu, Xiaozhong Liu, Shan Li, Zeqiang Tencent Media Lab Palo Alto CA 94306 USA

ISBN: (纸本)9781728185514

Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.

关键词： 4K Dataset video Compression Machine Learning video coding for machines Object Detection Object Segmentation

来源：评论

学校读者我要写书评

暂无评论

Collaborative Scalable Visual Compression for Human-Centered videos

Collaborative Scalable Visual Compression for Human-Centered...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Huang, Haofeng Yang, Wenhan Xiang, Wei Liu, Jiaying Duan, Ling-Yu Peking Univ Beijing Peoples R China Bigo Beijing Peoples R China

ISBN: (纸本)9781665484855

Machine intelligence systems have been increasingly widely deployed in real-world circumstances, while the conventional human-vision oriented video coding schemes are inefficient to be embedded in large-scale systems and further support a wide range of applications. There have been urgent demands for a new generation of compression framework to efficiently encodes visual data, where the compression and analytics for machine vision and human perception can be jointly optimized. To this end, we propose a novel visual compression framework to provide visual contents with different granularity for both human and machine vision tasks collaboratively. The proposed scalable compression framework maintains the critical semantic information in a basic layer, so that it is capable of supporting the accurate machine vision analysis under a tight bit-rate constraint. It is scalable to provide visual representations of different granularity to support various kinds of tasks, including video reconstruction that serves human vision examination. Experimental results on the humancentered videos have demonstrated the promising functionality of scalable visual coding with improved efficiency for high-performance machine analysis and human perception.

关键词： video coding for machines Scalable Visual Compression Human-Centered videos

来源：评论

学校读者我要写书评

暂无评论

Enhancing Image coding for machines with Compressed Feature Residuals 23

Enhancing Image Coding for Machines with Compressed Feature ...

引用

23rd IEEE International Symposium on Multimedia (ISM)

作者： Seppala, Joni Zhang, Honglei Le, Nam Youvalari, Ramin G. Cricri, Francesco Tavakoli, Hamed Rezazadegan Aksu, Emre Hannuksela, Miska M. Rahtu, Esa Tampere Univ Tampere Finland Nokia Technol Tampere Finland

ISBN: (纸本)9781665437349

As computer vision technologies have tremendously improved over the last decade, videos and images are often consumed by machines instead of humans which are the main target for traditional video codecs. In many use cases, although machines are the main consumers, human involvement is also required, or even mandatory. In this paper, we propose a novel image coding technique targeted for machines, while maintaining the capability for human consumption. Our proposed codec generates two bitstreams: one bitstream from a traditional codec, referred to as human bitstream, optimized for human consumption;the other bitstream, referred to as machine bitstream, generated from an end-to-end learned neural network-based codec and optimized for machine tasks. Instead of working on the image domain, the proposed machine bitstream is derived from feature residuals - the difference between the features extracted from the input image and the features extracted from the reconstructed image generated by the traditional codec. With the help of the machine bitstream, we can significantly improve machine task performance in the low bitrate range. Our system beats the state-of-the-art traditional codec, the Versatile video coding (VVC/H.266), achieving -40.5% in Bjontegaard delta bitrate reduction on average for bitrates up to 0.07 BPP.

关键词： image coding for machines video coding for machines feature residual image compression

来源：评论

学校读者我要写书评

暂无评论

Saliency-Driven Versatile video coding for Neural Object Detection

Saliency-Driven Versatile Video Coding for Neural Object Det...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Kristian Fischer Felix Fleckenstein Christian Herglotz Andre Kaup Multimedia Communications and Signal Processing Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) Erlangen Germany

ISBN: (纸本)9781728176055;9781728176062

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we pro-pose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile video coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

关键词： video coding for machines Saliency coding Versatile video coding Mask R-CNN YOLO

来源：评论

学校读者我要写书评

暂无评论

Robust Deep Neural Object Detection and Segmentation for Automotive Driving Scenario with Compressed Image Data

Robust Deep Neural Object Detection and Segmentation for Aut...

引用

IEEE International Symposium on Circuits and Systems

作者： Kristian Fischer Christian Blum Christian Herglotz Andre Kaup Multimedia Communications and Signal Processing Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) Erlangen Germany Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany

Deep neural object detection or segmentation networks are commonly trained with pristine, uncompressed data. However, in practical applications the input images are usually deteriorated by compression that is applied to efficiently transmit the data. Thus, we propose to add deteriorated images to the training process in order to increase the robustness of the two state-of-the-art networks Faster and Mask R-CNN. Throughout our paper, we investigate an autonomous driving scenario by evaluating the newly trained models on the Cityscapes dataset that has been compressed with the upcoming video coding standard Versatile video coding (VVC). When employing the models that have been trained with the proposed method, the weighted average precision of the R-CNNs can be increased by up to 3.68 percentage points for compressed input images, which corresponds to bitrate savings of nearly 48 %.

关键词： video coding for machines Resilient Learning Faster R-CNN Mask R-CNN Versatile video coding

来源：评论

学校读者我要写书评

暂无评论

A dataset of labelled objects on raw video sequences

引用

DATA IN BRIEF 2021年 34卷 106701页

作者： Choi, Hyomin Hosseini, Elahe Alvar, Saeed Ranjbar Cohen, Robert A. Bajic, Ivan V. Simon Fraser Univ Sch Engn Sci Burnaby BC V5A 1S6 Canada

We present an object labelled dataset called SFU-HW-Objects-v1, which contains object labels for a set of raw video sequences. The dataset can be useful for the cases where both object detection accuracy and video coding efficiency need to be evaluated on the same dataset. Object ground-truths for 18 of the High Efficiency video coding (HEVC) v1 Common Test Conditions (CTC) sequences have been labelled. The object categories used for the labeling are based on the Common Objects in Context (COCO) labels. A total of 21 object classes are found in test sequences, out of the 80 original COCO label classes. Brief descriptions of the labeling process and the structure of the dataset are presented. (C) 2020 The Authors. Published by Elsevier Inc.

关键词： Object detection video coding video compression video coding for machines

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：