image-to-sketch translation is to learn the mapping between an image and a corresponding human drawn sketch. Machine can be trained to mimic the human drawing process using a training set of aligned image-sketch pairs...
image-to-sketch translation is to learn the mapping between an image and a corresponding human drawn sketch. Machine can be trained to mimic the human drawing process using a training set of aligned image-sketch pairs. However, to collect such paired data is quite expensive or even unavailable for many cases since sketches exhibit various level of abstractness and drawing preferences. Hence we present an approach for learning an image-to-sketch translation network via unpaired examples. A translation network, which can translate the representation in image latent space to sketch domain, is trained in unsupervised setting. To prevent the problem of representation shifting in cross-domain translation, a novel cycle+ consistency loss is explored. Experimental results on sketch recognition and sketch-based image retrieval demonstrate the effectiveness of our approach.
Underwater image enhancement is important for images captured in underwater because underwater images often suffer from color cast, low contrast and degraded visibility due to the absorption and scattering of light in...
Underwater image enhancement is important for images captured in underwater because underwater images often suffer from color cast, low contrast and degraded visibility due to the absorption and scattering of light in water. In this paper, we propose a novel algorithm for underwater image restoration based on a generalization of the dark channel prior (GDCP). Though there are various types of underwater images, we especially focus on underwater images with depth because these images are not enhanced well by current algorithms. The proposed algorithm is composed of the iteration of GDCP and image fusion. Additionally, we introduce the new ambient light estimation to adapt to more types of images. Experimental results show that proposed algorithm is effective for various types of underwater images, especially for the images with depth.
This paper puts forth our observations from the experiments conducted on interactive segmentation techniques - Statistical Region Merging and Seeded Region Growing, both based on Region Growing methods, using Matlab s...
详细信息
This paper proposes an unsupervised learning framework for monocular depth estimation and visual odometry (VO), referred to as DVONet. The framework is trained using stereo image sequences and is able to estimate abso...
详细信息
This paper proposes an unsupervised learning framework for monocular depth estimation and visual odometry (VO), referred to as DVONet. The framework is trained using stereo image sequences and is able to estimate absolute-scale scene depth and camera poses from monocular images. To mitigate the effect of stereo occlusions in training and improve the depth estimation, left-right occlusion mask is introduced. In addition, a novel VO network is proposed where the feature extraction network is shared between pose estimation and optical flow estimation. The proposed DVONet achieves state-of-the-art results for both depth estimation and VO tasks on the KITTI driving dataset, outperforming the existing unsupervised methods and being comparable to the traditional ones.
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in ...
详细信息
ISBN:
(纸本)9781538615010
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in 2D image-to-video conversion system is analysed. Besides, the performance of different techniques in object removal and image reconstruction is compared using visual experiments and quality metrics.
Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel dep...
详细信息
Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel depth estimation algorithm is proposed based on frequency descriptors. On the basis of the imaging process analysis, we propose to first perform the occlusion discrimination and edge orientation extraction in the frequency domain for the spatial patch from the central sub-aperture image. Then, according to the occlusion orientation, a variable-block-size angular patch is selected in the normal direction to construct the frequency descriptors for focus measurement in the focal stack. Experimental results demonstrate superior performance of the proposed method in robustness and depth accuracy.
We propose a MultiScale AutoEncoder (MSAE) based extreme image coding/compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different res...
详细信息
We propose a MultiScale AutoEncoder (MSAE) based extreme image coding/compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network (GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimization. We compare the perceptual quality of our reconstructions with traditional compression algorithms using High-Efficiency Video Coding (HEVC) based Intra Profile and JPEG2000 on the public Cityscapes, ADE20K and Kodak datasets, demonstrating the significant subjective quality improvement. However, objective measurements, such as PSNR, SSIM, etc, are often deteriorated by applying the generative adversarial optimization.
Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA ...
Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA methods emerge one after another. However, current methods only consider a single scale or resolution, and some CNN-based methods directly take left and right views as an input of the network ignoring the visual fusion mechanism. In this work, a multi-scale no-reference SIQA method is proposed based on dilation convolution neural network (DCNN). Different from other CNN-based SIQA methods, the proposed one uses dilation convolution to imitate different scale of information processing fields in the human brain. Instead of left or right image, the cyclopean image generated by a new method is used as the input of the network. Moreover, the proposed multi-scale unit significantly can reduce computational parameters and computational complexity. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art no-reference SIQA methods.
We present a FPGA-based system supporting video stream transcoding with 2k full high-definition (FHD) video to 4k ultra high-definition (UHD) video super-resolution (SR) conversion. Our system focuses on building a fu...
详细信息
We present a FPGA-based system supporting video stream transcoding with 2k full high-definition (FHD) video to 4k ultra high-definition (UHD) video super-resolution (SR) conversion. Our system focuses on building a functional pipeline with convolutional neural network (CNN) accelerator and real-time video codec unit for converting H.264 video stream to H.265/HEVC video stream. The overall video processing system can be used as an important plug-in module in the video streaming network to improve the video stream service quality.
In this paper, we propose a new two-column dense Convolutional Neural Network (CNN) for stereoscopic image quality assessment. The input of one column is the cyclopean image which conforms to the binocular combination...
In this paper, we propose a new two-column dense Convolutional Neural Network (CNN) for stereoscopic image quality assessment. The input of one column is the cyclopean image which conforms to the binocular combination and rival mechanism in our brain. The input of other column is the disparity map which provides some compensation information for the cyclopean image. More importantly, we employ the features of disparity map to guide and weight the feature maps obtained from the cyclopean image, which is implemented by modifying the structure of Squeeze and Excitation block. This weighting strategy recalibrates the importance of feature maps extracted from cyclopean image. At the end of CNN, we combine the outputs from the two-column through 'Concat', and then process them to get the final quality score of the stereoscopic image. Experimental results demonstrate that the proposed method can achieve high consistent alignment with subjective assessment.
暂无评论