Automatic image cropping techniques have been developed recently to address the mismatch between the native display and image characteristics, such as resolution, aspect ratio, etc. These techniques usually rely on de...
详细信息
ISBN:
(纸本)9781538607008
Automatic image cropping techniques have been developed recently to address the mismatch between the native display and image characteristics, such as resolution, aspect ratio, etc. These techniques usually rely on determining the importance of various regions in the image, or the aesthetic appeal of the final cropped image. In this work, we present a cropping method that combines bottom-up visual saliency and top-down semantic analysis to create a cropped image that best preserves important image content. Experimental results illustrate that the new method outperforms popular saliency-based cropping, which only relies on bottom-up analysis.
Tire pattern image classification is an important computer vision problem in pubic security, which can guide policeman to detect criminal cases. It remains challenge due to the small diversity within different classes...
详细信息
ISBN:
(纸本)9781665475921
Tire pattern image classification is an important computer vision problem in pubic security, which can guide policeman to detect criminal cases. It remains challenge due to the small diversity within different classes. Generally, a tire pattern image classification system may require two characteristics: high accuracy and low computation. In this paper, we first assume that capturing rich feature representation will benefits tire classification and learning through a lightweight network will improve computing efficiency. We then propose a simple yet efficient two-stage training mechanism: 1) We learn a feature extractor using a Variational Auto-Encoder framework constrained by contrastive learning, projecting images to latent space owing rich feature representation. 2) We train a single-layer linear classification network depend on the features extracted by the previous trained encoder. The Top-1 and Top-5 accuracy on tire pattern dataset is 89.8% and 96.6% respectively, validating the effectiveness of our strategy.
For the scientific exploration and research on Mars, it is an indispensable step to transmit high-quality Martian images from distant Mars to Earth. image compression is the key technique given the extremely limited M...
详细信息
ISBN:
(纸本)9781665475921
For the scientific exploration and research on Mars, it is an indispensable step to transmit high-quality Martian images from distant Mars to Earth. image compression is the key technique given the extremely limited Mars-Earth bandwidth. Recently, deep learning has demonstrated remarkable performance in natural image compression, which provides a possibility for efficient Martian image compression. However, deep learning usually requires large training data. In this paper, we establish the first large-scale high-resolution Martian image compression (MIC) dataset. Through analyzing this dataset, we observe an important non-local self-similarity prior for Marian images. Benefiting from this prior, we propose a deep Martian image compression network with the non-local block to explore both local and non-local dependencies among Martian image patches. Experimental results verify the effectiveness of the proposed network in Martian image compression, which outperforms both the deep learning based compression methods and HEVC codec.
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations amo...
详细信息
ISBN:
(纸本)9781728185514
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a nonlinear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern o...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern of micro-images. Due to the particular structure of lenslet images, traditional video codecs perform poorly on lenslet video. Previous works have proposed a preprocessing scheme that cuts and realigns the micro-images on each lenslet frame. While effective, this method introduces high frequency components into the processed image. In this paper, we propose an additional step to the aforementioned scheme by applying an invertible smoothing transform. We evaluate the enhanced scheme on lenslet video sequences captured with single-focused and multi-focused plenoptic cameras. On average, the enhanced scheme achieves 9.85% bitrate reduction compared to the existing scheme.
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in on...
详细信息
ISBN:
(纸本)9781728185514
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in imageprocessing and transmission use at least three color channels, some of the applications employ even more bands, for example in the infrared and ultraviolet area of the light spectrum. Typically, only some pixels and blocks in a subset of color channels are distorted. Thus, other channels can be used to reconstruct the missing pixels, which is called spatio-spectral reconstruction. Current state-of-the-art methods purely rely on the local neighborhood, which works well for homogeneous regions. However, in high-frequency regions like edges or textures, these methods fail to properly model the relationship between color bands. Hence, this paper introduces non-local filtering for building a linear regression model that describes the inter-band relationship and is used to reconstruct the missing pixels. Our novel method is able to increase the PSNR on average by 2 dB and yields visually much more appealing images in high-frequency regions.
Though reversible predictive coding and reversible subband coding exist already as reversible coding of gray-level still images, reversible method has almost not been proposed against transform coding. Therefore, in t...
详细信息
ISBN:
(纸本)0819421030
Though reversible predictive coding and reversible subband coding exist already as reversible coding of gray-level still images, reversible method has almost not been proposed against transform coding. Therefore, in this paper, we propose some reversible transform coding methods. In case that we use conventional transform coding as it is, we have to make the number of levels of the transform coefficient very large in order to reconstruct the input signal with no distortion. Therefore, we propose transform codings that have reversibility whereas the number of levels of the transform coefficient are not very large. We propose reversible coding methods that correspond to the discrete Walsh-Hadamard, Haar, and cosine transform. Furthermore, we propose a method that uses the difference of the n-th order, a method of which the number of levels of the transform coefficient is the same as that of the input signal, and a reversible overlap transform coding method. Simulation shows that the compression efficiency of the proposed method is almost the same as that of predictive coding.
The design of stereo image quality assessment (SIQA) methods cannot be well based on the biological theory of human vision, so the performance of many SIQA methods cannot achieve good consistency with the subjective p...
详细信息
ISBN:
(纸本)9781728180687
The design of stereo image quality assessment (SIQA) methods cannot be well based on the biological theory of human vision, so the performance of many SIQA methods cannot achieve good consistency with the subjective perception. The research on the visual system tends to the dorsal and ventral pathways, which ignores the information asymmetry in the early visual pathways. It is worth noting that the ON and OFF receptive fields in retinal ganglion cells (RGCs) respond asymmetrically to the statistical features of images. Inspired by this, we propose a SIQA method based on monocular and binocular visual features, which takes into account the asymmetry of local contrast bright and dark features in early visual pathways. First, this paper extracts the response maps of ON and OFF cell in RGCs to left and right views respectively. And then the different information fusion modes of visual cortex are used to fuse the response maps information of left and right views. Final, monocular and binocular features were extracted and sent to support vector regression (SVR) for quality regression. Experimental results show that the proposed method is superior to several mainstream SIQA metrics on two publicly available databases.
This paper addresses image resealing, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image ...
详细信息
ISBN:
(纸本)9781728185514
This paper addresses image resealing, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image resealing network (known as IRN) tackles image downscaling and upscaling as mutually invertible tasks using invertible affine coupling layers. In particular, for upscaling, IRN models the missing high-frequency component by an input-independent (case-agnostic) Gaussian noise. In this work, we take one step further to predict a case-specific high-frequency component from textures embedded in the downscaled image. Moreover, we adopt integer coupling layers to avoid quantizing the downscaled image. When tested on commonly used datasets, the proposed method, termed DIRECT, improves high-resolution reconstruction quality both subjectively and objectively, while maintaining visually pleasing downscaled images.
Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the lo...
详细信息
ISBN:
(纸本)9781728185514
Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the low-resolution (LR) image is known and fixed (e.g. bicubic). Since blur kernels involved in real-life scenarios are complex and unknown, performance of these SR methods is greatly reduced for real blurry images. Reconstruction of high-resolution (HR) images from randomly blurred and noisy LR images remains a challenging task. Typical blind SR approaches involve two sequential stages: i) kernel estimation;ii) SR image reconstruction based on estimated kernel. However, due to the ill-posed nature of this problem, an iterative refinement could be beneficial for both kernel and SR image estimate. With this observation, in this paper, we propose an image SR method based on deep learning with iterative kernel estimation and image reconstruction. Simulation results show that the proposed method outperforms state-of-the-art in blind image SR and produces visually superior results as well.
暂无评论