image retargeting techniques aim to obtain retargeted images with different sizes or aspect ratios for various display screens. Various content-aware image retargeting algorithms have been proposed recently. However, ...
详细信息
ISBN:
(纸本)9781479902880
image retargeting techniques aim to obtain retargeted images with different sizes or aspect ratios for various display screens. Various content-aware image retargeting algorithms have been proposed recently. However, there is still no accurate objective metric for visual quality assessment of retargeted images. In this paper, we propose a novel objective metric for assessing visual quality of retargeted images based on perceptual geometric distortion and information loss. The proposed metric measures the geometric distortion of retargeted images by SIFT flow variation. Furthermore, a visual saliency map is derived to characterize human perception of the geometric distortion. On the other hand, the information loss in a retargeted image, which is calculated based on the saliency map, is integrated into the proposed metric. A user study is conducted to evaluate the performance of the proposed metric. Experimental results show the consistency between the objective assessments from the proposed metric and subjective assessments.
One of the most important problems faced by broadcasters is the unauthorized use of their images by third parties or organizations in a large-scale database, which contains hundreds of thousands of images. For this re...
详细信息
ISBN:
(纸本)9781665450928
One of the most important problems faced by broadcasters is the unauthorized use of their images by third parties or organizations in a large-scale database, which contains hundreds of thousands of images. For this reason, it is important to perform an efficient and effective image retrieval, whose objective is to find the most similar images to a given test image. In addition, test images often contain text, and the presence of the text together with the visual part complicates the search process. In this paper, we present an image retrieval framework based on a bag of visual words, which has been shown to be effective in the literature. A convolutional neural network model is used to parse the text in the images. Experiments demonstrate the efficacy of this model in a large database.
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it's urgent to explore an efficient volumetric compression method. Recent years ha...
详细信息
ISBN:
(纸本)9781728180687
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it's urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.
According to the collage theorem, the encoding distortion for fractal image compression is directly related to the metric used in the encoding process. In this paper, we introduce a perceptually meaningful distortion ...
详细信息
ISBN:
(纸本)0780331923
According to the collage theorem, the encoding distortion for fractal image compression is directly related to the metric used in the encoding process. In this paper, we introduce a perceptually meaningful distortion measure based on the human visual system's nonlinear response to luminance and the visual masking effects. Blackwell's psychophysical raw data on contrast threshold are first interpolated as a function of background luminance and visual angle, and is then used as an error upper bound for perceptually lossless image compression. For a variety of images, experimental results show that the algorithm produces a compression ratio of 8:1 to 10:1 without introducing visual artifacts.
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compressi...
详细信息
ISBN:
(纸本)9781479902880
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compression to significantly reduce inter image redundancy. Unlike the previous prediction methods, our MoP enhances the correlation between images using feature-based geometric multi-model fitting. Based on estimated geometric models, multiple deformed prediction images are generated to reduce geometric distortions in different image regions. The block-based adaptive motion compensation is then adopted to further eliminate local variances. Experimental results demonstrate the advantage of our approach, especially for images with complicated scenes and geometric relationships.
As more and more personal photos are shared and tagged in social media, security and privacy protection are becoming an unprecedentedly focus of attention. Avoiding privacy risks such as unintended verification, b eco...
详细信息
ISBN:
(纸本)9781728180687
As more and more personal photos are shared and tagged in social media, security and privacy protection are becoming an unprecedentedly focus of attention. Avoiding privacy risks such as unintended verification, b ecomes increasingly challenging. To enable people to enjoy uploading photos without having to consider these privacy concerns, it is crucial to study techniques that allow individuals to limit the identity information leaked in visual data. In this paper, we propose a novel hybrid model consists of two stages to generate visually pleasing deidentified f ace i mages a ccording t o a s ingle i nput. Meanwhile, we successfully preserve visual similarity with the original face to retain data usability. Our approach combines latest advances in GAN-based face generation with well-designed adjustable randomness. In our experiments we show visually pleasing deidentified output of our method while preserving a high similarity to the original image content. Moreover, our method adapts well to the verificator of unknown structure, which further improves the practical value in our real life.
Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and tra...
详细信息
ISBN:
(纸本)9781728180687
Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and transmission. In order to adapt compression to the structural characteristic of plenoptic image, in this paper, we propose a Data Structure Adaptive 3D-convolutional(DSA-3D) autoencoder. The DSA-3D autoencoder enables up-sampling and down-samping the sub-aperture sequence along the angular resolution or spatial resolution, thereby avoiding the artifacts caused by directly compressing plenoptic image and achieving better compression efficiency. In addition, we propose a special and efficient S quare rearrangement to generate sub-aperture sequence. We compare Square with Zigzag sub-aperture sequence rearrangements, and analyzed the compression efficiency of block image compression and whole image compression. Compared with traditional hybrid encoders HEVC, JPEG2000 and JPEG PLENO(WaSP), the proposed DSA-3D(Square) autoencoder achieves a superior performance in terms of PSNR metrics.
Since human vision has much greater resolutions at the center of our visual field than elsewhere, different criteria of quality assessment should be applied on the image areas with different visual resolutions. This p...
详细信息
ISBN:
(纸本)9781479961399
Since human vision has much greater resolutions at the center of our visual field than elsewhere, different criteria of quality assessment should be applied on the image areas with different visual resolutions. This paper proposed a foveation-based image quality assessment method which adopted different sizes of windows in quality assessment for a single image. visual salience models which estimate visual attention regions are used to determine the foveation center and foveation resolution models are used to guide the selection of window sizes for the areas over spatial extent of the image. Finally, the quality scores obtained from different window sizes are pooled together to get a single value for the image. The proposed method has been applied to IQA metrics, SSIM, PSNR, and UQI. The result shows that both Spearman and Kendall correlation coefficients can be improved significantly by our foveation-based method.
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesi...
详细信息
ISBN:
(纸本)9781479902880
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesized views which produce unpleasant visual perception. In this paper, an edge-preserving single depth image interpolation (ESDI) method is proposed. Specifically, local planar hypothesis (LPH) assuming that depth in natural scene are clustered as local planar planes is first explored. Then finite candidates generation (FCG) is proposed to generate limited discrete values satisfied with LPH to interpolated pixels. At last, the optimal combination of candidates is formulated as an energy minimization problem with a constraint in gradient domain, solved by iterated conditional modes (ICM) algorithm. Experiments demonstrate that ESDI achieves high resolution (HR) depth image with clear and sharp edges, and produces synthesized views with desirable quality.
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing streaming-related technologies. However, due to the inherent transmission distortions caused by poor Quality of Service (QoS) conditions in streaming videos, such as intermittent stalling, rebuffering, and drastic changes in video sharpness due to bitrate fluctuations, evaluating streaming video QoE presents numerous challenges. This paper introduces a large and diverse in-the-wild streaming video QoE evaluation dataset - the SJLIVE-1k dataset. This work addresses the limitations of corresponding datasets, which lack in-the-wild video sequences under real network conditions and whose amount of video content is insufficient. Furthermore, we propose an end-to-end objective QoE evaluation strategy that extracts video content and QoS features from the video itself without using any extra information. By implementing self-supervised contrastive learning as the "reminder" to bridge the gap between the different types of features, our approach achieves state-of-the-art results across three datasets. Our proposed dataset will be released to facilitate further research.
暂无评论