In this work, an efficient and robust learning-based JPEG2000 architecture is proposed. It uses machine learning techniques for predicting and encoding the decision bit in the embedded block coding with optimized trun...
详细信息
ISBN:
(纸本)9781728180687
In this work, an efficient and robust learning-based JPEG2000 architecture is proposed. It uses machine learning techniques for predicting and encoding the decision bit in the embedded block coding with optimized truncation (EBCOT) process. First, we apply non-locally weighted ridge regression to predict the quantized wavelet coefficients in the LL subband. Then, during the EBCOT process, we perform inter/intra subband prediction and inter/intra bit plane symbol prediction to estimate the activity of the decision bit using the deep learning architecture. Then, the binary prediction result is treated as an additional context and the decision bit is eventually coded using an advanced context-based adaptive binary arithmetic coder. Simulations show that the proposed framework provides the same visual quality as conventional codecs with as much as 30% bitrate savings.
Self-attention based encoder-decoder models achieve dominant performance in image captioning. However, most existing image captioning models (ICMs) only focus on modeling the relation between spatial tokens, while cha...
详细信息
ISBN:
(纸本)9781665475921
Self-attention based encoder-decoder models achieve dominant performance in image captioning. However, most existing image captioning models (ICMs) only focus on modeling the relation between spatial tokens, while channel-wise attention is neglected for getting visual representation. Considering that different channels of visual representation usually denote different visual objects, it may lead to poor performance in terms of object and attribute words in the captioning sentences generated by the ICMs. In this paper, we propose a novel dual-stream self-attention module (DSM) to alleviate the above issue. Specifically, we propose a parallel self-attention based module that simultaneously encodes visual information from the spatial and channel dimensions. Besides, to obtain channel-wise visual features effectively and efficiently, we introduce a group self-attention block with linear computational complexity. To validate the effectiveness of our model, we conduct extensive experiments on the standard IC benchmarks including MSCOCO and Flickr30k. Without bells and whistles, the proposed model performs new SOTAs containing 135.4 CIDEr score on MSCOCO and 70.8 CIDEr score on Flickr30k.
For the scientific exploration and research on Mars, it is an indispensable step to transmit high-quality Martian images from distant Mars to Earth. image compression is the key technique given the extremely limited M...
详细信息
ISBN:
(纸本)9781665475921
For the scientific exploration and research on Mars, it is an indispensable step to transmit high-quality Martian images from distant Mars to Earth. image compression is the key technique given the extremely limited Mars-Earth bandwidth. Recently, deep learning has demonstrated remarkable performance in natural image compression, which provides a possibility for efficient Martian image compression. However, deep learning usually requires large training data. In this paper, we establish the first large-scale high-resolution Martian image compression (MIC) dataset. Through analyzing this dataset, we observe an important non-local self-similarity prior for Marian images. Benefiting from this prior, we propose a deep Martian image compression network with the non-local block to explore both local and non-local dependencies among Martian image patches. Experimental results verify the effectiveness of the proposed network in Martian image compression, which outperforms both the deep learning based compression methods and HEVC codec.
Automatic image cropping techniques have been developed recently to address the mismatch between the native display and image characteristics, such as resolution, aspect ratio, etc. These techniques usually rely on de...
详细信息
ISBN:
(纸本)9781538607008
Automatic image cropping techniques have been developed recently to address the mismatch between the native display and image characteristics, such as resolution, aspect ratio, etc. These techniques usually rely on determining the importance of various regions in the image, or the aesthetic appeal of the final cropped image. In this work, we present a cropping method that combines bottom-up visual saliency and top-down semantic analysis to create a cropped image that best preserves important image content. Experimental results illustrate that the new method outperforms popular saliency-based cropping, which only relies on bottom-up analysis.
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern o...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern of micro-images. Due to the particular structure of lenslet images, traditional video codecs perform poorly on lenslet video. Previous works have proposed a preprocessing scheme that cuts and realigns the micro-images on each lenslet frame. While effective, this method introduces high frequency components into the processed image. In this paper, we propose an additional step to the aforementioned scheme by applying an invertible smoothing transform. We evaluate the enhanced scheme on lenslet video sequences captured with single-focused and multi-focused plenoptic cameras. On average, the enhanced scheme achieves 9.85% bitrate reduction compared to the existing scheme.
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in on...
详细信息
ISBN:
(纸本)9781728185514
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in imageprocessing and transmission use at least three color channels, some of the applications employ even more bands, for example in the infrared and ultraviolet area of the light spectrum. Typically, only some pixels and blocks in a subset of color channels are distorted. Thus, other channels can be used to reconstruct the missing pixels, which is called spatio-spectral reconstruction. Current state-of-the-art methods purely rely on the local neighborhood, which works well for homogeneous regions. However, in high-frequency regions like edges or textures, these methods fail to properly model the relationship between color bands. Hence, this paper introduces non-local filtering for building a linear regression model that describes the inter-band relationship and is used to reconstruct the missing pixels. Our novel method is able to increase the PSNR on average by 2 dB and yields visually much more appealing images in high-frequency regions.
The design of stereo image quality assessment (SIQA) methods cannot be well based on the biological theory of human vision, so the performance of many SIQA methods cannot achieve good consistency with the subjective p...
详细信息
ISBN:
(纸本)9781728180687
The design of stereo image quality assessment (SIQA) methods cannot be well based on the biological theory of human vision, so the performance of many SIQA methods cannot achieve good consistency with the subjective perception. The research on the visual system tends to the dorsal and ventral pathways, which ignores the information asymmetry in the early visual pathways. It is worth noting that the ON and OFF receptive fields in retinal ganglion cells (RGCs) respond asymmetrically to the statistical features of images. Inspired by this, we propose a SIQA method based on monocular and binocular visual features, which takes into account the asymmetry of local contrast bright and dark features in early visual pathways. First, this paper extracts the response maps of ON and OFF cell in RGCs to left and right views respectively. And then the different information fusion modes of visual cortex are used to fuse the response maps information of left and right views. Final, monocular and binocular features were extracted and sent to support vector regression (SVR) for quality regression. Experimental results show that the proposed method is superior to several mainstream SIQA metrics on two publicly available databases.
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations amo...
详细信息
ISBN:
(纸本)9781728185514
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a nonlinear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited qual...
详细信息
ISBN:
(纸本)9781728185514
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to -5.2% BD-rate with low computational complexity.
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide ...
详细信息
ISBN:
(纸本)9781728185514
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide users with immersive experience. However, due to the lack of depth information in panoramic image, it is still a challenge to convert panorama to 6-DOF VR. In this paper, we propose a new depth estimation method SPCNet based on spherical convolution to solve the problem of depth information restoration of panoramic image. Particularly, spherical convolution is introduced to improve depth estimation accuracy by reducing distortion, which is attributed to Equi-Rectangular Projection (ERP). The experimental results show that many indicators of SPCNet are better than other advanced networks. For example, RMSE is 0.419 lower than UResNet. Moreover, the threshold accuracy of depth estimation has also been improved.
暂无评论