The Bokeh Effect is one of the most desirable effects in photography for rendering artistic and aesthetic photos. Usually, it requires a DSLR camera with different aperture and shutter settings and certain photography...
详细信息
ISBN:
(纸本)9781665448994
The Bokeh Effect is one of the most desirable effects in photography for rendering artistic and aesthetic photos. Usually, it requires a DSLR camera with different aperture and shutter settings and certain photography skills to generate this effect. In smartphones, computational methods and additional sensors are used to overcome the physical lens and sensor limitations to achieve such effect. Most of the existing methods utilized additional sensor's data or pre-trained network for fine depth estimation of the scene and sometimes use portrait segmentation pretrained network module to segment salient objects in the image. Because of these reasons, networks have many parameters, become runtime intensive and unable to run in mid-range devices. In this paper, we used an end-to-end Deep Multi-Scale Hierarchical Network (DMSHN) model for direct Bokeh effect rendering of images captured from the monocular camera. To further improve the perceptual quality of such effect, a stacked model consisting of two DMSHN modules is also proposed. Our model does not rely on any pretrained network module for Monocular Depth Estimation or Saliency Detection, thus significantly reducing the size of model and run time. Stacked DMSHN achieves state-of-the-art results on a large scale EBB! dataset with around 6x less runtime compared to the current state-of-the-art model in processing HD quality images.
In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature ...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature mismatching. To overcome this, we propose a two-step feature alignment approach. In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. In the second step, the center alignment is used to align the features at 2D/3D centers. Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. Therefore, we propose a novel asymmetric non-local attention block with multi-scale sampling to extract depth-wise features. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITH dataset, in both 3D object detection and bird's eye view tasks.
Recurrent Neural Networks (RNNs) have been widely used in sequence analysis and modeling. However, when processing high-dimensional data, RNNs typically require very large model sizes, thereby bringing a series of dep...
详细信息
ISBN:
(纸本)9781665445092
Recurrent Neural Networks (RNNs) have been widely used in sequence analysis and modeling. However, when processing high-dimensional data, RNNs typically require very large model sizes, thereby bringing a series of deployment challenges. Although various prior works have been proposed to reduce the RNN model sizes, executing RNN models in the resource-restricted environments is still a very challenging problem. In this paper, we propose to develop extremely compact RNN models with fully decomposed hierarchical Tucker (FDHT) structure. The HT decomposition does not only provide much higher storage cost reduction than the other tensor decomposition approaches, but also brings better accuracy performance improvement for the compact RNN models. Meanwhile, unlike the existing tensor decomposition-based methods that can only decompose the input-to-hidden layer of RNNs, our proposed fully decomposition approach enables the comprehensive compression for the entire RNN models with maintaining very high accuracy. Our experimental results on several popular video recognition datasets show that, our proposed fully decomposed hierarchical tucker-based LSTM (FDHT-LSTM) is extremely compact and highly efficient. To the best of our knowledge, FDHT-LSTM, for the first time, consistently achieves very high accuracy with only few thousand parameters (3,132 to 8,808) on different datasets. Compared with the state-of-the-art compressed RNN models, such as TT-LSTM, TR-LSTM and BT-LSTM, our FDHT-LSTM simultaneously enjoys both order-of-magnitude (3,985x to 10,711x) fewer parameters and significant accuracy improvement (0.6% to 12.7%).
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation. P...
详细信息
ISBN:
(纸本)9781665448994
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation. Prior zero-label semantic segmentation works approach this task by learning visual-semantic embeddings or generative models. However, they are prone to overfitting on the seen classes because there is no training signal for them. In this paper, we study the challenging generalized zero-label semantic segmentation task where the model has to segment both seen and unseen classes at test time. We assume that pixels of unseen classes could be present in the training images but without being annotated. Our idea is to capture the latent information on unseen classes by supervising the model with self-produced pseudo-labels for unlabeled pixels. We propose a consistency regularizer to filter out noisy pseudolabels by taking the intersections of the pseudo-labels generated from different augmentations of the same image. Our framework generates pseudo-labels and then retrain the model with human-annotated and pseudo-labelled data. This procedure is repeated for several iterations. As a result, our approach achieves the new state-of-the-art on PascalVOC12 and COCO-stuff datasets in the challenging generalized zero-label semantic segmentation setting, surpassing other existing methods addressing this task with more complex strategies. Code can be found at https: //***/giuseppepastore10/STRICT.
This paper reviews the 1st LFNAT challenge on light field depth estimation, which aims at predicting disparity information of central view image in a light field (i.e., pixel offset between central view image and adja...
详细信息
Deep convolutional neural networks are susceptible to adversarial attacks. They can be easily deceived to give an incorrect output by adding a tiny perturbation to the input. This presents a great challenge in making ...
详细信息
ISBN:
(纸本)9781665445092
Deep convolutional neural networks are susceptible to adversarial attacks. They can be easily deceived to give an incorrect output by adding a tiny perturbation to the input. This presents a great challenge in making CNNs robust against such attacks. An influx of new defense techniques have been proposed to this end. In this paper, we show that latent features in certain "robust" models are surprisingly susceptible to adversarial attacks. On top of this, we introduce a unified l(infinity)-norm white-box attack algorithm which harnesses latent features in its gradient descent steps, namely LAFEAT. We show that not only is it computationally much more efficient for successful attacks, but it is also a stronger adversary than the current state-of-the-art across a wide range of defense mechanisms. This suggests that model robustness could be contingent on the effective use of the defender's hidden components, and it should no longer be viewed from a holistic perspective.
As more machine learning models are now being applied in real world scenarios it has become crucial to evaluate their difficulties and biases. In this paper we present an unsupervised method for calculating a difficul...
As more machine learning models are now being applied in real world scenarios it has become crucial to evaluate their difficulties and biases. In this paper we present an unsupervised method for calculating a difficulty score based on the accumulated loss per epoch. Our proposed method does not require any modification to the model, neither any external supervision, and it can be easily applied to a wide range of machine learning tasks. We provide results for the tasks of image classification, image segmentation, and object detection. We compare our score against similar metrics and provide theoretical and empirical evidence of their difference. Furthermore, we show applications of our proposed score for detecting incorrect labels, and test for possible biases.
Recently, the transductive graph-based methods have achieved great success in the few-shot classification task. However, most existing methods ignore exploring the class-level knowledge that can be easily learned by h...
详细信息
ISBN:
(数字)9781665445092
ISBN:
(纸本)9781665445092
Recently, the transductive graph-based methods have achieved great success in the few-shot classification task. However, most existing methods ignore exploring the class-level knowledge that can be easily learned by humans from just a handful of samples. In this paper, we propose an Explicit Class Knowledge Propagation Network (ECKPN), which is composed of the comparison, squeeze and calibration modules, to address this problem. Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph. Then, we squeeze the instance-level graph to generate the class-level graph, which can help obtain the class-level visual knowledge and facilitate modeling the relations of different classes. Next, the calibration module is adopted to characterize the relations of the classes explicitly to obtain the more discriminative class-level knowledge representations. Finally, we combine the class-level knowledge with the instance-level sample representations to guide the inference of the query samples. We conduct extensive experiments on four few-shot classification benchmarks, and the experimental results show that the proposed ECKPN significantly outperforms the state-of-the art methods.
Generation of stroke-based non-photorealistic imagery, is an important problem in the computervision community. As an endeavor in this direction, substantial recent research efforts have been focused on teaching mach...
详细信息
ISBN:
(纸本)9781665445092
Generation of stroke-based non-photorealistic imagery, is an important problem in the computervision community. As an endeavor in this direction, substantial recent research efforts have been focused on teaching machines "how to paint", in a manner similar to a human painter. However, the applicability of previous methods has been limited to datasets with little variation in position, scale and saliency of the foreground object. As a consequence, we find that these methods struggle to cover the granularity and diversity possessed by real world images. To this end, we propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time. 2) We also introduce invariance to the position and scale of the foreground object through a neural alignment model, which combines object localization and spatial transformer networks in an end to end manner, to zoom into a particular semantic instance. 3) The distinguishing features of the in-focus object are then amplified by maximizing a novel guided backpropagation based focus reward. The proposed agent does not require any supervision on human stroke-data and successfully handles variations in foreground object attributes, thus, producing much higher quality canvases for the CUB-200 Birds [29] and Stanford Cars-196 [17] datasets. Finally, we demonstrate the further efficacy of our method on complex datasets with multiple foreground object instances by evaluating an extension of our method on the challenging Virtual-KITTI [2] dataset. Source code and models are available at https://***/1jsingh/semantic-guidance.
Foggy-scene semantic segmentation (FSSS) is highly challenging due to the diverse effects of fog on scene properties and the limited training data. Existing research has mainly focused on domain adaptation for FSSS, w...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Foggy-scene semantic segmentation (FSSS) is highly challenging due to the diverse effects of fog on scene properties and the limited training data. Existing research has mainly focused on domain adaptation for FSSS, which has practical limitations when dealing with new scenes. In our paper, we introduce domain-generalized FSSS, which can work effectively on unknown distributions without extensive training. To address domain gaps, we propose a frequency decoupling (FreD) approach that separates fog-related effects (amplitude) from scene semantics (phase) in feature representations. Our method is compatible with both CNN and vision Transformer backbones and outperforms existing approaches in various scenarios.
暂无评论