Nonhomogeneous haze removal is a challenging problem, which does not follow the physical scattering model of haze. Numerous existing methods focus on homogeneous haze removal by generating transmission map of the imag...
详细信息
ISBN:
(纸本)9781665448994
Nonhomogeneous haze removal is a challenging problem, which does not follow the physical scattering model of haze. Numerous existing methods focus on homogeneous haze removal by generating transmission map of the image, which is not suitable for nonhomogeneous dehazing tasks. Some methods use end-to-end model but are also designed for homogeneous haze. Inspired by Knowledge Transfer Dehazing Network and Trident Dehazing Network, we propose a model with super-resolution method and knowledge transfer method. Our model consists of a teacher network, a dehaze network and a super-resolution network. The teacher network provides the dehaze network with reliable prior, the dehaze network focuses primarily on haze removal, and the super-resolution network is used to capture details in the hazy image. Ablation study shows that the super-resolution network has significant benefit to image quality. And comparison shows that our model outperforms previous state-of-the-art methods in terms of perceptual quality on NTIRE2021 NonHomogeneous Dehazing Challenge dataset, and also performs well on other datasets.
In deep learning-based object detection on remote sensing domain, nuisance factors, which affect observed variables while not affecting predictor variables, often matters because they cause domain changes. Previously,...
详细信息
ISBN:
(纸本)9781665448994
In deep learning-based object detection on remote sensing domain, nuisance factors, which affect observed variables while not affecting predictor variables, often matters because they cause domain changes. Previously, nuisance disentangled feature transformation (NDFT) was proposed to build domain-invariant feature extractor with with knowledge of nuisance factors. However, NDFT requires enormous time in a training phase, so it has been impractical. In this paper, we introduce our proposed method, A-NDFT, which is an improvement to NDFT. A-NDFT utilizes two acceleration techniques, feature replay and slow learner. Consequently, on a large-scale UAVDT benchmark, it is shown that our framework can reduce the training time of NDFT from 31 hours to 3 hours while still maintaining the performance. The code will be made publicly available online(1).
Super-resolution (SR) is an ill-posed problem, which means that infinitely many high-resolution (HR) images can be degraded to the same low-resolution (LR) image. To study the one-to-many stochastic SR mapping, we imp...
详细信息
ISBN:
(纸本)9781665448994
Super-resolution (SR) is an ill-posed problem, which means that infinitely many high-resolution (HR) images can be degraded to the same low-resolution (LR) image. To study the one-to-many stochastic SR mapping, we implicitly represent the non-local self-similarity of natural images and develop a Variational Sparse framework for Super-Resolution (VSpSR) via neural networks. Since every small patch of a HR image can be well approximated by the sparse representation of atoms in an over-complete dictionary, we design a two-branch module, i.e., VSpM, to explore the SR space. Concretely, one branch of VSpM extracts patch-level basis from the LR input, and the other branch infers pixel-wise variational distributions with respect to the sparse coefficients. By repeatedly sampling coefficients, we could obtain infinite sparse representations, and thus generate diverse HR images. According to the preliminary results of NTIRE 2021 challenge on learning SR space, our team ranks 7-th in terms of released scores.
Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applicat...
详细信息
ISBN:
(纸本)9781665448994
Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions;(2) sampling examples from an approximation to the underlying data distribution;and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46 % to 68 %. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 16% regression error reduction on the MPIIGaze dataset, +3.4% to +9.1% improvement in top-1 classification accuracy on the CIFAR100 dataset, and +2.9% top-1 improvement on the ImageNet dataset.
As the particles in hazy medium cause the absorption and scattering of light, the images captured under such environment suffer from quality degradation such as low contrast and color distortion. While numerous single...
详细信息
ISBN:
(纸本)9781665448994
As the particles in hazy medium cause the absorption and scattering of light, the images captured under such environment suffer from quality degradation such as low contrast and color distortion. While numerous single image dehazing methods have been proposed to reconstruct clean images from hazy images, non-homogeneous dehazing has been rarely studied. In this paper, we design an end-to-end network to remove non-homogeneous dense haze. We employ the selective residual blocks to adaptively improve the visibility of resulting images, where the input feature and the residual feature are combined with fully trainable weights. Experimental results including the ablation study demonstrate that the proposed method is a promising tool for non-homogeneous dehazing that enhances the contrast of hazy images effectively while restoring colorful appearance faithfully.
Semantic segmentation approaches are typically trained on large-scale data with a closed finite set of known classes without considering unknown objects. In certain safety-critical robotics applications, especially au...
详细信息
ISBN:
(纸本)9781665448994
Semantic segmentation approaches are typically trained on large-scale data with a closed finite set of known classes without considering unknown objects. In certain safety-critical robotics applications, especially autonomous driving, it is important to segment all objects, including those unknown at training time. We formalize the task of video class agnostic segmentation from monocular video sequences in autonomous driving to account for unknown objects. Video class agnostic segmentation can be formulated as an open-set or a motion segmentation problem. We discuss both formulations and provide datasets and benchmark different baseline approaches for both tracks. In the motion-segmentation track we benchmark real-time joint panoptic and motion instance segmentation, and evaluate the effect of ego-flow suppression. In the open-set segmentation track we evaluate baseline methods that combine appearance, and geometry to learn prototypes per semantic class. We then compare it to a model that uses an auxiliary contrastive loss to improve the discrimination between known and unknown objects. Datasets and models are publicly released at https://***/vca/.
In this paper, we propose a novel reference based image super-resolution approach via Variational AutoEncoder (RefVAE). Existing state-of-the-art methods mainly focus on single image super-resolution which cannot perf...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose a novel reference based image super-resolution approach via Variational AutoEncoder (RefVAE). Existing state-of-the-art methods mainly focus on single image super-resolution which cannot perform well on large upsampling factors, e.g., 8x. We propose a reference based image super-resolution, for which any arbitrary image can act as a reference for super-resolution. Even using random map or low-resolution image itself the proposed RefVAE can transfer the knowledge from the reference to the super-resolved images. Depending upon different references, the proposed method can generate different versions of super-resolved images from a hidden super-resolution space. Besides using different datasets for some standard evaluations with PSNR and SSIM, we also took part in the NTIRE2021 SR Space challenge [21] and have provided results of the randomness evaluation of our approach. Compared to other state-of-the-art methods, our approach achieves higher diverse scores.
Given a single RGB panorama, the goal of 3D layout reconstruction is to estimate the room layout by predicting the corners, floor boundary, and ceiling boundary. A common approach has been to use standard convolutiona...
详细信息
ISBN:
(纸本)9781665448994
Given a single RGB panorama, the goal of 3D layout reconstruction is to estimate the room layout by predicting the corners, floor boundary, and ceiling boundary. A common approach has been to use standard convolutional networks to predict the corners and boundaries, followed by post-processing to generate the 3D layout. However, the space-varying distortions in panoramic images are not compatible with the translational equivariance property of standard convolutions, thus degrading performance. Instead, we propose to use spherical convolutions. The resulting network, which we call OmniLayout performs convolutions directly on the sphere surface, sampling according to inverse equirectangular projection and hence invariant to equirectangular distortions. Using a new evaluation metric, we show that our network reduces the error in the heavily distorted regions (near the poles) by approximate to 25% when compared to standard convolutional networks. Experimental results show that OmniLayout outperforms the state-of-the-art by approximate to 4% on two different benchmark datasets (PanoContext and Stanford 2D-3D). Code is available at https://***/rshivansh/OmniLayout.
Visual representations using high dynamic range (HDR) images have become increasingly popular because of their high quality and expressive ability. HDR images are expected to be used in a broad range of applications, ...
详细信息
ISBN:
(纸本)9781665448994
Visual representations using high dynamic range (HDR) images have become increasingly popular because of their high quality and expressive ability. HDR images are expected to be used in a broad range of applications, including digital cinema, photography, and broadcast. The generation of a HDR image from a single exposure Low Dynamic Range (LDR) image is a challenging task where one must make up for missing data due to underexposure or overexposure and color quantization. In this paper, we propose a deep convolutional neural network (CNN) model with a stack of dilated convolutional blocks for reconstructing a HDR image from a single LDR image. Within each dilation block, the dilation rate of the convolution layer is three and progressively decreases to one. Multiple dilation convolution blocks are further connected densely to improve the representation capacity of the network. As the network is trained in a supervised manner, the additional information is reconstructed from learned features. Our experimental results show that the model effectively captures missing information that was lost from the original image.
Although deep learning-based models have achieved tremendous success in image-related tasks, they are known to be vulnerable to adversarial examples-inputs with imperceptible, but subtly crafted perturbation which foo...
详细信息
ISBN:
(纸本)9781665448994
Although deep learning-based models have achieved tremendous success in image-related tasks, they are known to be vulnerable to adversarial examples-inputs with imperceptible, but subtly crafted perturbation which fool the models to produce incorrect outputs. To distinguish adversarial examples from benign images, in this paper, we propose a novel watermarking-based framework for protecting deep image classifiers against adversarial attacks. The proposed framework consists of a watermark encoder, a possible adversary, and a detector followed by a deep image classifier to be protected. Specific methods of watermarking and detection are also presented. It is shown by experiment on a subset of ImageNet validation dataset that the proposed framework along with the presented methods of watermarking and detection is effective against a wide range of advanced attacks (static and adaptive), achieving a near zero (effective) false negative rate for FGSM and PGD attacks (static and adaptive) with the guaranteed zero false positive rate. In addition, for all tested deep image classifiers (ResNet50V2, MobileNetV2, InceptionV3), the impact of watermarking on classification accuracy is insignificant with, on average, 0.63% and 0.49% degradation in top 1 and top 5 accuracy, respectively.
暂无评论