Compressed sensing (CS) has recently attracted much interest for its ability to recovery a sparse signal with very limited number of samples. In this paper, we adapt this idea and present a framework of high-resolutio...
详细信息
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can preserve important image contents and structure well without...
详细信息
ISBN:
(纸本)9781479999897
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can preserve important image contents and structure well without introducing deformation. To address this problem, we propose a Saliency & Structure Preserving Multi-operator (SSPM) method. SSPM classifies images into three categories utilizing SIFT density to improve performance of saliency preservation, helping to mitigate negative influence from center-bias property of most existing saliency detection models. SSPM also employs different principles to improve structure preservation performance, including Earth Mover's Distance (EMD) and Gray-Level Cooccurrence Matrix (GLCM) to get optimal operator sequences for smart content-aware image retargeting. SSPM method not only can well preserve salient contents and structure, but also can greatly improve deformation resilience. Experimental results demonstrated that our method outperforms state-of-art image retargeting methods.
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In...
详细信息
ISBN:
(纸本)9781728173221
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
Image quality assessment (IQA) aims to estimate human perception based image visual quality. Although existing deep neural networks (DNNs) have shown significant effectiveness for tackling the IQA problem, it still ne...
详细信息
Image quality assessment (IQA) aims to estimate human perception based image visual quality. Although existing deep neural networks (DNNs) have shown significant effectiveness for tackling the IQA problem, it still needs to improve the DNN-based quality assessment models by exploiting efficient multi-scale features. In this paper, motivated by the human visual system (HVS) combining multi-scale features for perception, we propose to use pyramid features learning to build a DNN with hierarchical multi-scale features for distorted image quality prediction. Our model is based on both residual maps and distorted images in luminance domain, where the proposed network contains spatial pyramid pooling and feature pyramid from the network structure. Our proposed network is optimized in a deep end-to-end supervision manner. To validate the effectiveness of the proposed method, extensive experiments are conducted on four widely-used image quality assessment databases, demonstrating the superiority of our algorithm.
End-to-end image coding methods based on wavelet-like transform have made great progress in recent years. The most advanced one is iWave++, which adopts multi-level lifting schemes based on convolutional neural networ...
End-to-end image coding methods based on wavelet-like transform have made great progress in recent years. The most advanced one is iWave++, which adopts multi-level lifting schemes based on convolutional neural networks. However, iWave++ still has many unresolved problems. First, the independent entropy coding of each component makes it impossible to use the correlation between components better. Secondly, additive wavelet transform limits the nonlinear ability of learnable wavelet transform. Moreover, the offline training strategy makes the iWave++ unable to adjust according to the content. In this paper, we propose an improved framework for iWave++ called iWave-Pro. iWavePro is designed with several techniques to overcome the problems mentioned above. These techniques are the joint multi-component Gaussian mixture entropy coding, the affine wavelet-like transform, and the online training. Experimental results show that our method can save 10.73% bit rate compared with iWave++ at the same quality.
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, v...
详细信息
ISBN:
(纸本)9781728173221
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, very few works have been conducted to evaluate the performance of pre-processing methods. In this paper, we select the source (SRC) videos and various pre-processing approaches to construct the first Pre-processed and Transcoded Video Database (PTVD). Then, we conduct the subjective experiment, showing that compared with the video sent to the codec directly at the same bitrate, the appropriate pre-processing methods indeed improve the perceptual quality. Finally, existing image/video quality metrics are evaluated on our database. The results indicate that the performance of the existing image/video quality assessment (IQA/VQA) approaches remain to be improved. We will make our database publicly available soon.
How to make dynamic recommendations under volatile user interest drifts has been a problem of great interest in modern recommender systems, where challenges lie in accurate and efficient measurement, modeling, and pre...
详细信息
ISBN:
(纸本)9781479973408
How to make dynamic recommendations under volatile user interest drifts has been a problem of great interest in modern recommender systems, where challenges lie in accurate and efficient measurement, modeling, and prediction of the user interest drifts. This paper studies a category-based approach to the problem with the key idea that items are aggregated into categories and recommendations are made on each category. In our approach, we use the category-wise rating matrix to measure the changing preferences of users; we design a dynamic adaptive model (DAM) to describe the patterns of interest drifts; and we utilize linear regression to predict the future interests of users in a category-based manner. We have built a category-based dynamic recommender system and tested it with two well-known datasets. Experimental results show that our proposed approach achieves superior performance on category-based rating prediction compared with state-of-the-art dynamic recommendation algorithms.
In Versatile Video Coding (VVC), local affine motion compensation (LAMC) is adopted to handle complex motions, such as rotation and zooming. However, it is inefficient to use LAMC to handle the global motion due to th...
详细信息
ISBN:
(纸本)9781665475938
In Versatile Video Coding (VVC), local affine motion compensation (LAMC) is adopted to handle complex motions, such as rotation and zooming. However, it is inefficient to use LAMC to handle the global motion due to the following two reasons. First, the use of LAMC may lead to some extra bit cost on the affine motion model parameters. Second, the precision of LAMC is restricted by the MV precision of the control points. Therefore, in this paper, we propose a global homography motion compensation (GHMC) framework to better characterize the global motion. For each coding block, an extra mode is added to perform motion compensation based on an 8-parameter global homography motion model. In addition, an extrapolation scheme is designed to derive the parameters from reference frames to save the bit cost for signaling them. The proposed framework is implemented into the VVC reference software VTM-6.0. Experimental results show that, on average, 0.69% and 0.66% BD-rate reduction is achieved under Low Delay P and Low Delay B configurations, respectively, for sequences with rich complex global motions.
Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based framewo...
详细信息
ISBN:
(数字)9781728133201
ISBN:
(纸本)9781728133218
Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based frameworks have been proposed for this purpose due to their inherent end-to-end optimization mechanisms. However, it is still quite challenging to integrate these task-driven metrics seamlessly into traditional hybrid coding framework. To the best of our knowledge, this paper is the first work trying to solve this challenge based on reinforcement learning (RL) approach. Specifically, we formulate the bit allocation problem as a Markovian Decision Process (MDP) and train RL agents to automatically decide the quantization parameter (QP) of each coding tree unit (CTU) for HEVC intra coding, according to the task-driven semantic distortion metrics. This bit allocation scheme can maximize the semantic level fidelity of the task, such as classification accuracy, while minimizing the bit-rate. We also employ gradient class activation map (Grad-CAM) and Mask R-CNN tools to extract task-related importance maps to help the agents make decisions. Extensive experimental results demonstrate the superior performance of our approach by achieving 43.1% to 73.2% bit-rate saving over the anchor of HEVC under the equivalent task-related distortions.
Semantic information is important in video encryption. However, existing image quality assessment (IQA) methods, such as the peak signal to noise ratio (PSNR), are still widely applied to measure the encryption securi...
详细信息
Semantic information is important in video encryption. However, existing image quality assessment (IQA) methods, such as the peak signal to noise ratio (PSNR), are still widely applied to measure the encryption security. Generally, these traditional IQA methods aim to evaluate the image quality from the perspective of visual signal rather than semantic information. In this paper, we propose a novel semantic-level full-reference image quality assessment (FR-IQA) method named Semantic Distortion Measurement (SDM) to measure the degree of semantic distortion for video encryption. Then, based on a semantic saliency dataset, we verify that the proposed SDM method outperforms state-of-the-art algorithms. Furthermore, we construct a Region Of Semantic Saliency (ROSS) video encryption system to demonstrate the effectiveness of our proposed SDM method in the practical application.
暂无评论