CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, wh...
详细信息
ISBN:
(纸本)9781665445092
CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, which together disentangle image content from position on target manifold. We rely on naive physics-inspired models to guide the training while allowing private model/translations features. CoMoGAN can be used with any GAN backbone and allows new types of image translation, such as cyclic image translation like timelapse generation, or detached linear translation. On all datasets, it outperforms the literature.
Detecting poorly textured objects and estimating their 3D pose reliably is still a very challenging problem. We introduce a simple but powerful approach to computing descriptors for object views that efficiently captu...
详细信息
ISBN:
(纸本)9781467369640
Detecting poorly textured objects and estimating their 3D pose reliably is still a very challenging problem. We introduce a simple but powerful approach to computing descriptors for object views that efficiently capture both the object identity and 3D pose. By contrast with previous manifold-based approaches, we can rely on the Euclidean distance to evaluate the similarity between descriptors, and therefore use scalable Nearest Neighbor search methods to efficiently handle a large number of objects under a large range of poses. To achieve this, we train a Convolutional Neural Network to compute these descriptors by enforcing simple similarity and dissimilarity constraints between the descriptors. We show that our constraints nicely untangle the images from different objects and different views into clusters that are not only well-separated but also structured as the corresponding sets of poses: The Euclidean distance between descriptors is large when the descriptors are from different objects, and directly related to the distance between the poses when the descriptors are from the same object. These important properties allow us to outperform state-of-the-art object views representations on challenging RGB and RGB-D data.
Online dictionary learning is particularly useful for processing large-scale and dynamic data in computervision. It, however faces the major difficulty to incorporate robust functions, rather than the square data fit...
详细信息
ISBN:
(纸本)9780769549897
Online dictionary learning is particularly useful for processing large-scale and dynamic data in computervision. It, however faces the major difficulty to incorporate robust functions, rather than the square data fitting term, to handle outliers in training data. In this paper we propose a new online framework enabling the use of l(1) sparse data fitting term in robust dictionary learning, notably enhancing the usability and practicality of this important technique. Extensive experiments have been carried out to validate our new framework.
We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art [16]. Like Pseudo Labe...
详细信息
ISBN:
(纸本)9781665445092
We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art [16]. Like Pseudo Labels, Meta Pseudo Labels has a teacher network to generate pseudo labels on unlabeled data to teach a student network. However, unlike Pseudo Labels where the teacher is fixed, the teacher in Meta Pseudo Labels is constantly adapted by the feedback of the student's performance on the labeled dataset. As a result, the teacher generates better pseudo labels to teach the student.(1)
This paper presents a stereo matching approach for a novel multi-perspective panoramic stereo vision system, making use of asynchronous and non-simultaneous stereo imaging towards real-time 3D 3600 vision. The method ...
详细信息
ISBN:
(纸本)9781467369640
This paper presents a stereo matching approach for a novel multi-perspective panoramic stereo vision system, making use of asynchronous and non-simultaneous stereo imaging towards real-time 3D 3600 vision. The method is designed for events representing the scenes visual contrast as a sparse visual code allowing the stereo reconstruction of high resolution panoramic views. We propose a novel cost measure for the stereo matching, which makes use of a similarity measure based on event distributions. Thus, the robustness to variations in event occurrences was increased. An evaluation of the proposed stereo method is presented using distance estimation of panoramic stereo views and ground truth data. Furthermore, our approach is compared to standard stereo methods applied on event-data. Results show that we obtain 3D reconstructions of 1024 x 3600 round views and outperform depth reconstruction accuracy of state-of-the-art methods on event data.
This paper introduces a new method for object recognition which is based on a recurrent neural network trained in a supervised mode. The RNN inputs 3-dimensional laser scanner data sequentially, in a natural temporal ...
详细信息
ISBN:
(纸本)9781424439942
This paper introduces a new method for object recognition which is based on a recurrent neural network trained in a supervised mode. The RNN inputs 3-dimensional laser scanner data sequentially, in a natural temporal order in which the laser returns arrive to the scanner The method is illustrated on a two-class problem with real data.
We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data-such as depth from stereo image pairs. The framework uses a dense, overlapping set of image...
详细信息
ISBN:
(纸本)9781467369640
We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data-such as depth from stereo image pairs. The framework uses a dense, overlapping set of image regions at multiple scales and a "local model," such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues.
As consumer depth sensors become widely available, estimating scene flow from RGBD sequences has received increasing attention. Although the depth information allows the recovery of 3D motion from a single view, it po...
详细信息
ISBN:
(纸本)9781467369640
As consumer depth sensors become widely available, estimating scene flow from RGBD sequences has received increasing attention. Although the depth information allows the recovery of 3D motion from a single view, it poses new challenges. In particular, depth boundaries are not well aligned with RGB image edges and therefore not reliable cues to localize 2D motion boundaries. In addition, methods that extend the 2D optical flow formulation to 3D still produce large errors in occlusion regions. To better use depth for occlusion reasoning, we propose a layered RGBD scene flow method that jointly solves for the scene segmentation and the motion. Our key observation is that the noisy depth is sufficient to decide the depth ordering of layers, thereby avoiding a computational bottleneck for RGB layered methods. Furthermore, the depth enables us to estimate a per-layer 3D rigid motion to constrain the motion of each layer. Experimental results on both the Middlebury and real-world sequences demonstrate the effectiveness of the layered approach for RGBD scene flow estimation.
In this paper, we tackle the problem of performing inference in graphical models whose energy is a polynomial function of continuous variables. Our energy minimization method follows a dual decomposition approach, whe...
详细信息
ISBN:
(纸本)9780769549897
In this paper, we tackle the problem of performing inference in graphical models whose energy is a polynomial function of continuous variables. Our energy minimization method follows a dual decomposition approach, where the global problem is split into subproblems defined over the graph cliques. The optimal solution to these subproblems is obtained by making use of a polynomial system solver. Our algorithm inherits the convergence guarantees of dual decomposition. To speed up optimization, we also introduce a variant of this algorithm based on the augmented Lagrangian method. Our experiments illustrate the diversity of computervision problems that can be expressed with polynomial energies, and demonstrate the benefits of our approach over existing continuous inference methods.
This work demonstrates a novel mobile robot architecture that uses environment-based sensor network that provides a third-person perception. The demonstration is motivated by the idea that a mobile robot working in th...
详细信息
This work demonstrates a novel mobile robot architecture that uses environment-based sensor network that provides a third-person perception. The demonstration is motivated by the idea that a mobile robot working in the area tune in to broadcasts from the video camera network to receive sensor data.
暂无评论