Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Cl...
详细信息
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex ...
Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex aspects are involved in 3D omnidirectional IQA, especially unlimited field of view (FoV) and extra depth perception, which brings difficulty to evaluate the quality of experience (QoE) of 3D omnidirectional images. In this paper, we propose a multi-viewport based full-reference stereo 360 IQA model. Due to the freely changeable viewports when browsing in the head-mounted display, our proposed approach processes the image inside FoV rather than the projected one such as equirectangular projection (ERP). In addition, since overall QoE depends on both image quality and depth perception, we utilize the features estimated by the difference map between left and right views which can reflect disparity. The depth perception features along with binocular image qualities are employed to further predict the overall QoE of 3D 360 images. The experimental results on our public Stereoscopic OmnidirectionaL Image quality assessment Database (SOLID) show that the proposed method achieves a significant improvement over some well-known IQA metrics and can accurately reflect the overall QoE of perceived images.
Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that opera...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that operate in the pixel domain, i.e. down- and up-sampling of image or video frame, we propose to perform down- and up-sampling in the residue domain. Specifically, for each block, we perform motion estimation and compensation to achieve residual signal at the original resolution, then we down-sample the residue and compress it at low resolution, and perform residue SR using a trained CNN model. We design a new CNN for residue SR with the help of the motion compensated prediction signal. We integrate the residue SR method into the High Efficiency Video Coding (HEVC) scheme, providing mode decision at the level of coding tree unit. Experimental results show that our method achieves on average 4.0% and 2.8% BD-rate reduction under low-delay P and low-delay B configurations, respectively.
Automatic annotation of images is of crucial importance in image retrieval and management systems. Most of the existing annotation methods rely on content-based approach to annotation, whose effectiveness is restricte...
详细信息
Automatic annotation of images is of crucial importance in image retrieval and management systems. Most of the existing annotation methods rely on content-based approach to annotation, whose effectiveness is restricted due to the semantic gap between low-level features and semantic annotations, as well as the irrelevance between annotations and image content. Recently, social media analysis has been investigated for image annotation. Inspired by the abundant social diffusion records of images in online social networks, we propose a novel image annotation approach based on social diffusion analysis. We present a common-interest model to interpret social diffusion, i.e. different images have different social diffusion routes due to the preferences of users, and such preferences are represented as common interests of pairwise users rather than personalized interests. We propose an image annotation framework that consists of learning of common interests, feature extraction from social diffusion records, and automatic annotation by learning to rank. Experimental results on a real-world dataset show that our proposed approach outperforms content-based and user-preference-based annotation methods.
With the increasing popularity of mobile devices, there are more and more screens with heterogeneous resolutions. In order to solve the mismatching problem of images displaying on different screens, various image reta...
详细信息
ISBN:
(纸本)9781479989591
With the increasing popularity of mobile devices, there are more and more screens with heterogeneous resolutions. In order to solve the mismatching problem of images displaying on different screens, various image retargeting techniques have been proposed. However, little effective objective quality assessment metric for image retargeting has been proposed. In this paper, we propose an objective image retargeting quality assessment method based on Hybrid Distortion Pooled Model (HDPM) considering image local similarity, content information loss and image structural distortion. The proposed HDPM method measures the retargeted image's local similarity based on matching the similar block by Scale-Invariant Features Transform (SIFT) features and computing the corresponding blocks' similarity by structural similarity (SSIM). Furthermore, the image content information loss in retargeted image, which is regarded as the SIFT feature loss, is taken into account. Besides, we also consider image's structural distortion in the proposed method, which is based on GLCM (Gray-level co-occurrence matrix). To evaluate the effectiveness of the proposed method, extensive experiments have been conducted, and the results show improved consistency between the proposed HDPM method and the corresponding subjective evaluations.
The High Efficiency Video Coding (HEVC) with the transform bypass mode is simple but inefficient for lossless coding. For this reason, we propose a novel transform to further eliminate the redundancy between residues ...
详细信息
ISBN:
(纸本)9781479934331
The High Efficiency Video Coding (HEVC) with the transform bypass mode is simple but inefficient for lossless coding. For this reason, we propose a novel transform to further eliminate the redundancy between residues of different blocks in intra prediction. Dependent on intra prediction modes, the proposed transform is adaptable to exploit correlations of residues formed by different modes. In order to accurately obtain parameters of the transform matrix, an approach similar to the Wiener filtering method is adopted. Experimental results show that on top of the lossless coding mode in HEVC, our method offers the performance with a 7.4% bit-rate reduction on average for All Intra Main configuration. Compared with other representative algorithms, our proposal still shows an improvement in the compression ratio, without substantial increases of computational complexity in the encoder or decoder.
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in video...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.
Light field image (LFI) quality assessment is becoming more and more important, which helps to better guide the acquisition, processing and application of immersive media. However, due to the inherent high dimensional...
详细信息
The directional intra prediction (DIP) modes in HEVC are capable of predicting local continuous image features. Recently, intra block copy (IBC) is proposed for screen content coding, aiming at predicting non-local re...
详细信息
The directional intra prediction (DIP) modes in HEVC are capable of predicting local continuous image features. Recently, intra block copy (IBC) is proposed for screen content coding, aiming at predicting non-local recurrent image features. For natural video, we observe that recurrent features are often irregular and not aligned with blocks. Thus, we propose a combination of DIP and IBC with block partition for better intra prediction, where one block can be divided into several partitions, each of which may choose between DIP and IBC. We study an intra prediction scheme with the proposed combination, especially the rate-distortion optimization and entropy coding in the scheme. Preliminary experimental results show that the proposed combined intra prediction achieves as high as 5.8% bit-rate saving compared to HEVC anchor.
暂无评论