In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in video...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.
In the hybrid video coding framework, transform is adopted to exploit the dependency within the input signal. In this paper, we propose a deep learning-based nonlinear transform for intra coding. Specifically, we inco...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In the hybrid video coding framework, transform is adopted to exploit the dependency within the input signal. In this paper, we propose a deep learning-based nonlinear transform for intra coding. Specifically, we incorporate the directional information into the residual domain. Then, a convolutional neural network model is designed to achieve better decorrelation and energy compaction than the conventional discrete cosine transform. This work has two main contributions. First, we propose to use the intra prediction signal to reduce the directionality in the residual. Second, we present a novel loss function to characterize the efficiency of the transform during the training. To evaluate the compression performance of the proposed transform, we implement it into the High Efficiency Video Coding reference software. Experimental results demonstrate that the proposed method achieves up to 1.79% BD-rate reduction for natural videos.
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing t...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is direc...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, t...
详细信息
作者:
Liu, SenZhao, ShuxinPang, YingxueChen, ZhiboCAS
Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China
There is plenty of human-machine joint decision-making scenarios in the real world applications, such as driving assistant, suspect identification, medical diagnosis, etc. Existing algorithms propose that machine shou...
详细信息
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.
Video stitching remains a challenging problem in computer vision. In this paper, we propose a novel edge-guided method to stitch multiple videos that have small overlapped regions. Our algorithm consists of three step...
详细信息
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require di...
详细信息
ISBN:
(数字)9781728163956
ISBN:
(纸本)9781728163963
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require different semantic granularities of the image. Thus, the bitstream is designed to be scalable in the sense that progressive decoding of the bitstream provides coarse-to-fine semantic granularities. As a concrete example, we consider the task of coarse-grained and fine-grained image classification. We present a method to compress the multiple deep feature maps that are intermediate representations of an image passing a trained deep network. The deep-layer feature maps can serve for coarse-grained image classification while the shallow-layer feature maps can serve for fine-grained image classification. Experimental results demonstrate the feasibility of the proposed method, as well as the advantage of the semantically scalable coding paradigm.
As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated ...
详细信息
暂无评论