This paper proposes an unsupervised learning framework for monocular depth estimation and visual odometry (VO), referred to as DVONet. The framework is trained using stereo image sequences and is able to estimate abso...
详细信息
This paper proposes an unsupervised learning framework for monocular depth estimation and visual odometry (VO), referred to as DVONet. The framework is trained using stereo image sequences and is able to estimate absolute-scale scene depth and camera poses from monocular images. To mitigate the effect of stereo occlusions in training and improve the depth estimation, left-right occlusion mask is introduced. In addition, a novel VO network is proposed where the feature extraction network is shared between pose estimation and optical flow estimation. The proposed DVONet achieves state-of-the-art results for both depth estimation and VO tasks on the KITTI driving dataset, outperforming the existing unsupervised methods and being comparable to the traditional ones.
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in ...
详细信息
ISBN:
(纸本)9781538615010
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in 2D image-to-video conversion system is analysed. Besides, the performance of different techniques in object removal and image reconstruction is compared using visual experiments and quality metrics.
Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel dep...
详细信息
Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel depth estimation algorithm is proposed based on frequency descriptors. On the basis of the imaging process analysis, we propose to first perform the occlusion discrimination and edge orientation extraction in the frequency domain for the spatial patch from the central sub-aperture image. Then, according to the occlusion orientation, a variable-block-size angular patch is selected in the normal direction to construct the frequency descriptors for focus measurement in the focal stack. Experimental results demonstrate superior performance of the proposed method in robustness and depth accuracy.
We propose a MultiScale AutoEncoder (MSAE) based extreme image coding/compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different res...
详细信息
We propose a MultiScale AutoEncoder (MSAE) based extreme image coding/compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network (GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimization. We compare the perceptual quality of our reconstructions with traditional compression algorithms using High-Efficiency Video Coding (HEVC) based Intra Profile and JPEG2000 on the public Cityscapes, ADE20K and Kodak datasets, demonstrating the significant subjective quality improvement. However, objective measurements, such as PSNR, SSIM, etc, are often deteriorated by applying the generative adversarial optimization.
Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA ...
Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA methods emerge one after another. However, current methods only consider a single scale or resolution, and some CNN-based methods directly take left and right views as an input of the network ignoring the visual fusion mechanism. In this work, a multi-scale no-reference SIQA method is proposed based on dilation convolution neural network (DCNN). Different from other CNN-based SIQA methods, the proposed one uses dilation convolution to imitate different scale of information processing fields in the human brain. Instead of left or right image, the cyclopean image generated by a new method is used as the input of the network. Moreover, the proposed multi-scale unit significantly can reduce computational parameters and computational complexity. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art no-reference SIQA methods.
We present a FPGA-based system supporting video stream transcoding with 2k full high-definition (FHD) video to 4k ultra high-definition (UHD) video super-resolution (SR) conversion. Our system focuses on building a fu...
详细信息
We present a FPGA-based system supporting video stream transcoding with 2k full high-definition (FHD) video to 4k ultra high-definition (UHD) video super-resolution (SR) conversion. Our system focuses on building a functional pipeline with convolutional neural network (CNN) accelerator and real-time video codec unit for converting H.264 video stream to H.265/HEVC video stream. The overall video processing system can be used as an important plug-in module in the video streaming network to improve the video stream service quality.
In this paper, we propose a new two-column dense Convolutional Neural Network (CNN) for stereoscopic image quality assessment. The input of one column is the cyclopean image which conforms to the binocular combination...
In this paper, we propose a new two-column dense Convolutional Neural Network (CNN) for stereoscopic image quality assessment. The input of one column is the cyclopean image which conforms to the binocular combination and rival mechanism in our brain. The input of other column is the disparity map which provides some compensation information for the cyclopean image. More importantly, we employ the features of disparity map to guide and weight the feature maps obtained from the cyclopean image, which is implemented by modifying the structure of Squeeze and Excitation block. This weighting strategy recalibrates the importance of feature maps extracted from cyclopean image. At the end of CNN, we combine the outputs from the two-column through 'Concat', and then process them to get the final quality score of the stereoscopic image. Experimental results demonstrate that the proposed method can achieve high consistent alignment with subjective assessment.
This paper proposes an intelligent control system for the development of substation automatic grounding wire working robot. The system consists of two parts: robot motion control subsystem and robot visual servo subsy...
详细信息
Color-difference interpolation (CDI) has been a widely used technique for various color demosaicking methods. CDI-based methods perform interpolation in the color-difference domain assuming that the color-difference s...
详细信息
Color-difference interpolation (CDI) has been a widely used technique for various color demosaicking methods. CDI-based methods perform interpolation in the color-difference domain assuming that the color-difference signal is a low-pass signal. Recently, a residual interpolation (RI) algorithm, which conducts interpolation in the residual domain, has been developed, and it assumes that the residual domain is flatter or smoother than the channel-difference domain. In this paper, we comprehensively show a frequency domain analysis of these assumptions and observe that it is image dependent and creates artifacts in the interpolated image. With this view, we propose an algorithm that uses the inter-color correlation as well as the residual smoothness among the different channel much better than the existing algorithms. Experimental results emphasize that the proposed algorithm atribute better performances the existing algorithms in terms of both visual and objective quality.
image object co-segmentation aims to segment common objects in a group of images. This paper proposes a novel neural network, which extracts multi-scale convolutional features at multiple layers via a modified VGG net...
image object co-segmentation aims to segment common objects in a group of images. This paper proposes a novel neural network, which extracts multi-scale convolutional features at multiple layers via a modified VGG network and fuses them both within and across images as the intra-image and the inter-image features. Then these two kinds of features are further fused at each scale as the multi-scale co-features of common objects, and finally the multi-scale co-features are summed up and upsampled to obtain the co-segmentation results. To simplify the network and reduce the rapidly rising resource cost along with the inputs, the reduced input size, less downsampling and dilation convolution are adopted in the proposed model. Experimental results on the public dataset demonstrate that the proposed model achieves a comparable performance to the state-of-the-art co-segmentation methods while the computation cost has been effectively reduced.
暂无评论