Probability distribution modeling is the basis for most competitive methods for lossless coding of screen content. One such state-of-the-art method is known as soft context formation (SCF). For each pixel to be encode...
详细信息
ISBN:
(纸本)9781728185514
Probability distribution modeling is the basis for most competitive methods for lossless coding of screen content. One such state-of-the-art method is known as soft context formation (SCF). For each pixel to be encoded, a probability distribution is estimated based on the neighboring pattern and the occurrence of that pattern in the already encoded image. Using an arithmetic coder, the pixel color can thus be encoded very efficiently, provided that the current color has been observed before in association with a similar pattern. If this is not the case, the color is instead encoded using a color palette or, if it is still unknown, via residual coding. Both palette-based coding and residual coding have significantly worse compression efficiency than coding based on soft context formation. In this paper, the residual coding stage is improved by adaptively trimming the probability distributions for the residual error. Furthermore, an enhanced probability modeling for indicating a new color depending on the occurrence of new colors in the neighborhood is proposed. These modifications result in a bitrate reduction of up to 2.9 % on average. Compared to HEVC (HM-16.21 + SCM-8.8) and FLIF, the improved SCF method saves on average about 11 % and 18 % rate, respectively.
Recently, some speech recognition methods using fusion of visual and auditory information have been researched. In this paper, a study on the mouth shape image suitable for fusion of visual and auditory information ha...
详细信息
Recently, some speech recognition methods using fusion of visual and auditory information have been researched. In this paper, a study on the mouth shape image suitable for fusion of visual and auditory information has been described. Features of mouth shape which are extracted from gray level image and binary image are adopted, and speech recognition using linear combination method has been performed. From results of speech recognition, the studies on the mouth shape features which are effective in fusion of visual and auditory information have been performed. And the effectiveness of using two kinds of mouth shape features also has been confirmed.
Query suggestion is an effective human-computer interaction (HCI) approach in information retrieval. According to human vision system, the image retrieval method using visual query suggestion (VQS) can provide a frien...
详细信息
ISBN:
(纸本)9781467358293;9781467358309
Query suggestion is an effective human-computer interaction (HCI) approach in information retrieval. According to human vision system, the image retrieval method using visual query suggestion (VQS) can provide a friendly query interface to solve the query ambiguity problem. In this paper, the contentbased image retrieval (CBIR) system is realized first. Human-computer interaction used VQS to obtain users' intention. In this step, user submitted a query keyword to the system. VQS is utilized to provide a list of suggestions, each containing a keyword and a collection of representative images. If the user selects one of the image suggestions, this image will be viewed as the key image. Then CBIR system will retrieve the image sets to return the similar images based on the similarity of image features. In relevance feedback, user scored each returned image by the slider to optimize retrieval results. A friendly query interface is designed to carry out HCI in our system and the experimental result shows the proposed method can improve the average recall and precision efficiently.
image retargeting techniques aim to obtain retargeted images with different sizes or aspect ratios for various display screens. Various content-aware image retargeting algorithms have been proposed recently. However, ...
详细信息
ISBN:
(纸本)9781479902880
image retargeting techniques aim to obtain retargeted images with different sizes or aspect ratios for various display screens. Various content-aware image retargeting algorithms have been proposed recently. However, there is still no accurate objective metric for visual quality assessment of retargeted images. In this paper, we propose a novel objective metric for assessing visual quality of retargeted images based on perceptual geometric distortion and information loss. The proposed metric measures the geometric distortion of retargeted images by SIFT flow variation. Furthermore, a visual saliency map is derived to characterize human perception of the geometric distortion. On the other hand, the information loss in a retargeted image, which is calculated based on the saliency map, is integrated into the proposed metric. A user study is conducted to evaluate the performance of the proposed metric. Experimental results show the consistency between the objective assessments from the proposed metric and subjective assessments.
One of the most important problems faced by broadcasters is the unauthorized use of their images by third parties or organizations in a large-scale database, which contains hundreds of thousands of images. For this re...
详细信息
ISBN:
(纸本)9781665450928
One of the most important problems faced by broadcasters is the unauthorized use of their images by third parties or organizations in a large-scale database, which contains hundreds of thousands of images. For this reason, it is important to perform an efficient and effective image retrieval, whose objective is to find the most similar images to a given test image. In addition, test images often contain text, and the presence of the text together with the visual part complicates the search process. In this paper, we present an image retrieval framework based on a bag of visual words, which has been shown to be effective in the literature. A convolutional neural network model is used to parse the text in the images. Experiments demonstrate the efficacy of this model in a large database.
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it's urgent to explore an efficient volumetric compression method. Recent years ha...
详细信息
ISBN:
(纸本)9781728180687
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it's urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compressi...
详细信息
ISBN:
(纸本)9781479902880
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compression to significantly reduce inter image redundancy. Unlike the previous prediction methods, our MoP enhances the correlation between images using feature-based geometric multi-model fitting. Based on estimated geometric models, multiple deformed prediction images are generated to reduce geometric distortions in different image regions. The block-based adaptive motion compensation is then adopted to further eliminate local variances. Experimental results demonstrate the advantage of our approach, especially for images with complicated scenes and geometric relationships.
As more and more personal photos are shared and tagged in social media, security and privacy protection are becoming an unprecedentedly focus of attention. Avoiding privacy risks such as unintended verification, b eco...
详细信息
ISBN:
(纸本)9781728180687
As more and more personal photos are shared and tagged in social media, security and privacy protection are becoming an unprecedentedly focus of attention. Avoiding privacy risks such as unintended verification, b ecomes increasingly challenging. To enable people to enjoy uploading photos without having to consider these privacy concerns, it is crucial to study techniques that allow individuals to limit the identity information leaked in visual data. In this paper, we propose a novel hybrid model consists of two stages to generate visually pleasing deidentified f ace i mages a ccording t o a s ingle i nput. Meanwhile, we successfully preserve visual similarity with the original face to retain data usability. Our approach combines latest advances in GAN-based face generation with well-designed adjustable randomness. In our experiments we show visually pleasing deidentified output of our method while preserving a high similarity to the original image content. Moreover, our method adapts well to the verificator of unknown structure, which further improves the practical value in our real life.
Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and tra...
详细信息
ISBN:
(纸本)9781728180687
Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and transmission. In order to adapt compression to the structural characteristic of plenoptic image, in this paper, we propose a Data Structure Adaptive 3D-convolutional(DSA-3D) autoencoder. The DSA-3D autoencoder enables up-sampling and down-samping the sub-aperture sequence along the angular resolution or spatial resolution, thereby avoiding the artifacts caused by directly compressing plenoptic image and achieving better compression efficiency. In addition, we propose a special and efficient S quare rearrangement to generate sub-aperture sequence. We compare Square with Zigzag sub-aperture sequence rearrangements, and analyzed the compression efficiency of block image compression and whole image compression. Compared with traditional hybrid encoders HEVC, JPEG2000 and JPEG PLENO(WaSP), the proposed DSA-3D(Square) autoencoder achieves a superior performance in terms of PSNR metrics.
Since human vision has much greater resolutions at the center of our visual field than elsewhere, different criteria of quality assessment should be applied on the image areas with different visual resolutions. This p...
详细信息
ISBN:
(纸本)9781479961399
Since human vision has much greater resolutions at the center of our visual field than elsewhere, different criteria of quality assessment should be applied on the image areas with different visual resolutions. This paper proposed a foveation-based image quality assessment method which adopted different sizes of windows in quality assessment for a single image. visual salience models which estimate visual attention regions are used to determine the foveation center and foveation resolution models are used to guide the selection of window sizes for the areas over spatial extent of the image. Finally, the quality scores obtained from different window sizes are pooled together to get a single value for the image. The proposed method has been applied to IQA metrics, SSIM, PSNR, and UQI. The result shows that both Spearman and Kendall correlation coefficients can be improved significantly by our foveation-based method.
暂无评论