Current methods for Earth observation tasks such as semantic mapping, map alignment, and change detection rely on near-nadir images;however, often the first available images in response to dynamic world events such as...
详细信息
ISBN:
(纸本)9781665448994
Current methods for Earth observation tasks such as semantic mapping, map alignment, and change detection rely on near-nadir images;however, often the first available images in response to dynamic world events such as natural disasters are oblique. These tasks are much more difficult for oblique images due to observed object parallax. There has been recent success in learning to regress an object's geocentric pose, defined as height above ground and orientation with respect to gravity, by training with airborne lidar registered to satellite images. We present a model for this novel task that exploits affine invariance properties to outperform state of the art performance by a wide margin. We also address practical issues required to deploy this method in the wild for real-world applications. Our data and code are publicly available(1).
We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. The resultant distribution can then be sampled to ge...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. The resultant distribution can then be sampled to generate likely completions, each exhibiting plausible shape details while being faithful to the input. To facilitate the use of transformers for 3D, we introduce a compact 3D representation, vector quantized deep implicit function (VQDIF), that utilizes spatial sparsity to represent a close approximation of a 3D shape by a short sequence of discrete variables. Experiments demonstrate that ShapeFormer outperforms prior art for shape completion from ambiguous partial inputs in terms of both completion quality and diversity. We also show that our approach effectively handles a variety of shape types, incomplete patterns, and real-world scans.
From a non-central panorama, 3D lines can be recovered by geometric reasoning. However, their sensitivity to noise and the complex geometric modeling required has led these panoramas being very little investigated. In...
详细信息
ISBN:
(纸本)9781665448994
From a non-central panorama, 3D lines can be recovered by geometric reasoning. However, their sensitivity to noise and the complex geometric modeling required has led these panoramas being very little investigated. In this work we present a novel approach for 3D layout recovery of indoor environments using single non-central panoramas. We obtain the boundaries of the structural lines of the room from a non-central panorama using deep learning and exploit the properties of non-central projection systems in a new geometrical processing to recover the scaled layout. We solve the problem for Manhattan environments, handling occlusions, and also for Atlanta environments in an unified method. The experiments performed improve the state-of-the-art methods for 3D layout recovery from a single panorama. Our approach is the first work using deep learning with non-central panoramas and recovering the scale of single panorama layouts.
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embedd...
详细信息
ISBN:
(纸本)9781665448994
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Pro-crustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
Social media images are generally transformed by filtering to obtain aesthetically more pleasing appearances. However, CNNs generally fail to interpret both the image and its filtered version as the same in the visual...
详细信息
ISBN:
(纸本)9781665448994
Social media images are generally transformed by filtering to obtain aesthetically more pleasing appearances. However, CNNs generally fail to interpret both the image and its filtered version as the same in the visual analysis of social media images. We introduce Instagram Filter Removal Network (IFRNet) to mitigate the effects of image filters for social media analysis applications. To achieve this, we assume any filter applied to an image substantially injects a piece of additional style information to it, and we consider this problem as a reverse style transfer problem. The visual effects of filtering can be directly removed by adaptively normalizing external style information in each level of the encoder. Experiments demonstrate that IFRNet outperforms all compared methods in quantitative and qualitative comparisons, and has the ability to remove the visual effects to a great extent. Additionally, we present the filter classification performance of our proposed model, and analyze the dominant color estimation on the images unfiltered by all compared methods.
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body po...
详细信息
ISBN:
(纸本)9781665448994
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person's bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. We evaluated the proposed approach on the challenging WILDTRACK dataset. It obtained a MODA of 0.569 and an F-score of 0.78, superior to state-of-the-art generalizable detection techniques.
One of the major challenges of style transfer is the appropriate image features supervision between the output image and the input images (style and content). An efficient strategy would be to define an object map bet...
详细信息
ISBN:
(纸本)9781665448994
One of the major challenges of style transfer is the appropriate image features supervision between the output image and the input images (style and content). An efficient strategy would be to define an object map between the objects of the style and the content images. However, such a mapping is not well established when there are semantic objects of different types and numbers in the style and the content images. It also leads to content mismatch in the style transfer output, which could reduce the visual quality of the results. We propose an object-based style transfer approach, called DeepObjStyle, for the style supervision in the training data-independent framework. DeepObjStyle preserves the semantics of the objects and achieves better style transfer in the challenging scenario when the style and the content images have a mismatch of image features. We also perform style transfer of images containing a word cloud to demonstrate that DeepObjStyle enables an appropriate image features supervision. We validate the results using quantitative comparisons and user studies.
We propose a novel architecture to handle the problem of multi frame super-resolution (MFSR). The proposed framework is known as Enhanced Burst Super-Resolution (EBSR), which divides the MFSR problem into three parts:...
详细信息
ISBN:
(纸本)9781665448994
We propose a novel architecture to handle the problem of multi frame super-resolution (MFSR). The proposed framework is known as Enhanced Burst Super-Resolution (EBSR), which divides the MFSR problem into three parts: alignment, fusion, and reconstruction. We propose a Feature Enhanced Pyramid Cascading and Deformable convolution (FEPCD) module to align multiple low-resolution burst images in the feature level. And then the aligned features are fused by a Cross Non-Local Fusion (CNLF) module. Finally, the SR image is reconstructed by the Long Range Concatenation Network (LRCN). In addition, we build a cascading residual pathway structure (CR) to improve the performance. We conduct several experiments to analyze and demonstrate these modules. Our EBSR model won the champion in the real track and second place in the synthetic track in the NTIRE21 Burst Super-Resolution Challenge.
Deep learning approaches currently achieve the state-of-the-art results on camera-based vital signs measurement. One of the main challenges with using neural models for these applications is the lack of sufficiently l...
详细信息
ISBN:
(纸本)9781665448994
Deep learning approaches currently achieve the state-of-the-art results on camera-based vital signs measurement. One of the main challenges with using neural models for these applications is the lack of sufficiently large and diverse datasets. Limited data increases the chances of overfitting models to the available data which in turn can harm generalization. In this paper, we show that the generalizability of imaging photoplethysmography models can be improved by augmenting the training set with "magnified" videos. These augmentations are specifically designed to reveal useful features for recovering the photoplethysmogram. We show that using augmentations of this form is more effective at improving model robustness than other commonly used data augmentation approaches. We show better within-dataset and especially cross-dataset performance with our proposed data augmentation approach on three publicly available datasets.
Advances in adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. However, the robust accuracy of present state-of-the-art defenses is far from the requirements in criti...
详细信息
ISBN:
(纸本)9781665448994
Advances in adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. However, the robust accuracy of present state-of-the-art defenses is far from the requirements in critical applications such as robotics and autonomous navigation systems. Further, in practical use cases, network prediction alone might not suffice, and assignment of a confidence value for the prediction can prove crucial. In this work, we propose a generic method for introducing stochasticity in the network predictions, and utilize this for smoothing decision boundaries and rejecting low confidence predictions, thereby boosting the robustness on accepted samples. The proposed Feature Level Stochastic Smoothing based classification also results in a boost in robustness without rejection over existing adversarial training methods. Finally, we combine the proposed method with adversarial detection methods, to achieve the benefits of both approaches.
暂无评论