In this paper we reformulate few-shot classification as a reconstruction problem in latent space. The ability of the network to reconstruct a query feature map from support features of a given class predicts membershi...
详细信息
ISBN:
(纸本)9781665445092
In this paper we reformulate few-shot classification as a reconstruction problem in latent space. The ability of the network to reconstruct a query feature map from support features of a given class predicts membership of the query in that class. We introduce a novel mechanism for few-shot classification by regressing directly from support features to query features in closed form, without introducing any new modules or large-scale learnable parameters. The resulting Feature Map Reconstruction Networks are both more performant and computationally efficient than previous approaches. We demonstrate consistent and substantial accuracy gains on four fine-grained benchmarks with varying neural architectures. Our model is also competitive on the non-fine-grained mini-ImageNet and tiered-ImageNet benchmarks with minimal bells and whistles.
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be co...
详细信息
ISBN:
(纸本)9781728132938
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. We formulate the problem as energy minimization in a deep structured model, which can be solved efficiently in the GPU by unrolling a Gaussian-Newton solver. Our experiments in the challenging KITTI scene flow dataset show that we outperform the state-of-the-art by a very large margin, while being 800 times faster.
Many problems in science and engineering can be formulated in terms of geometric patterns in high-dimensional spaces. We present high-dimensional convolutional networks (ConvNets) for patternrecognition problems that...
详细信息
ISBN:
(纸本)9781728171685
Many problems in science and engineering can be formulated in terms of geometric patterns in high-dimensional spaces. We present high-dimensional convolutional networks (ConvNets) for patternrecognition problems that arise in the context of geometric registration. We first study the effectiveness of convolutional networks in detecting linear subspaces in high-dimensional spaces with up to 32 dimensions: much higher dimensionality than prior applications of ConvNets. We then apply high-dimensional ConvNets to 3D registration under rigid motions and image correspondence estimation. Experiments indicate that our high-dimensional ConvNets outperform prior approaches that relied on deep networks based on global pooling operators.
In this work we study convex relaxations of quadratic optimisation problems over permutation matrices. While existing semidefinite programming approaches can achieve remarkably tight relaxations, they have the strong ...
详细信息
ISBN:
(纸本)9781538664209
In this work we study convex relaxations of quadratic optimisation problems over permutation matrices. While existing semidefinite programming approaches can achieve remarkably tight relaxations, they have the strong disadvantage that they lift the original nxn-dimensional variable to an n(2) x n(2)-dimensional variable, which limits their practical applicability. In contrast, here we present a lifting free convex relaxation that is provably at least as tight as existing (lifting-free) convex relaxations. We demonstrate experimentally that our approach is superior to existing convex and non-convex methods for various problems, including image arrangement and multi-graph matching.
There have been many successful implementations of neural style transfer in recent years. In most of these works, the stylization process is confined to the pixel domain. However, we argue that this representation is ...
详细信息
ISBN:
(纸本)9781665445092
There have been many successful implementations of neural style transfer in recent years. In most of these works, the stylization process is confined to the pixel domain. However, we argue that this representation is unnatural because paintings usually consist of brushstrokes rather than pixels. We propose a method to stylize images by optimizing parameterized brushstrokes instead of pixels and further introduce a simple differentiable rendering mechanism. Our approach significantly improves visual quality and enables additional control over the stylization process such as controlling the flow of brushstrokes through user input. We provide qualitative and quantitative evaluations that show the efficacy of the proposed parameterized representation. Code is available at https://***/CompVis/brushstroke-parameterizedstyle-transfer.
We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of ...
详细信息
ISBN:
(纸本)9781665445092
We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizations show our trained model is able to predict the underlying goals in video of unintentional action. We also propose a method to "automatically correct" unintentional action by leveraging gradient signals of our model to adjust latent trajectories. Although the model is trained with minimal supervision, it is competitive with or outperforms baselines trained on large (supervised) datasets of successfully executed goals, showing that observing unintentional action is crucial to learning about goals in video.
Although recent inpainting approaches have demonstrated significant improvement with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing region...
详细信息
ISBN:
(纸本)9781665445092
Although recent inpainting approaches have demonstrated significant improvement with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions. To address these issues, we propose an external-internal inpainting scheme with a monochromic bottleneck that helps image inpainting models remove these artifacts. In the external learning stage, we reconstruct missing structures and details in the monochromic space to reduce the learning dimension. In the internal learning stage, we propose a novel internal color propagation method with progressive learning strategies for consistent color restoration. Extensive experiments demonstrate that our proposed scheme helps image inpainting models produce more structure-preserved and visually compelling results.
We propose a novel method for estimating the global rotations of the cameras independently of their positions and the scene structure. When two calibrated cameras observe five or more of the same points, their relativ...
详细信息
ISBN:
(纸本)9781665445092
We propose a novel method for estimating the global rotations of the cameras independently of their positions and the scene structure. When two calibrated cameras observe five or more of the same points, their relative rotation can be recovered independently of the translation. We extend this idea to multiple views, thereby decoupling the rotation estimation from the translation and structure estimation. Our approach provides several benefits such as complete immunity to inaccurate translations and structure, and the accuracy improvement when used with rotation averaging. We perform extensive evaluations on both synthetic and real datasets, demonstrating consistent and significant gains in accuracy when used with the state-of-the-art rotation averaging method.
A recent strand of work in view synthesis uses deep learning to generate multiplane images-a camera-centric, layered 3D representation-given two or more input images at known viewpoints. We apply this representation t...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171685
A recent strand of work in view synthesis uses deep learning to generate multiplane images-a camera-centric, layered 3D representation-given two or more input images at known viewpoints. We apply this representation to single-view view synthesis, a problem which is more challenging but has potentially much wider application. Our method learns to predict a multiplane image directly from a single image input, and we introduce scale-invariant view synthesis for supervision, enabling us to train on online video. We show this approach is applicable to several different datasets, that it additionally generates reasonable depth maps, and that it learns to fill in content behind the edges of foreground objects in background layers.
Conversational facial expression recognition entails challenges such as handling of facial dynamics, small available datasets, low-intensity and fine-grained emotional expressions and extreme face angle. Towards addre...
详细信息
ISBN:
(纸本)9798350365474
Conversational facial expression recognition entails challenges such as handling of facial dynamics, small available datasets, low-intensity and fine-grained emotional expressions and extreme face angle. Towards addressing these challenges, we propose the Masking Action Units and Reconstructing multiple Angles (MAURA) pre-training. MAURA is an efficient self-supervised method that permits the use of small datasets, while preserving end-toend conversational facial expression recognition with vision Transformer. MAURA masks videos using the location with active Action Units and reconstructs synchronized multi-view videos, thus learning the dependencies between muscle movements and encoding information, which might only be visible in few frames and/or in certain views. Based on one view (e.g., frontal), the encoder reconstructs other views (e.g., top, down, laterals). Such masking and reconstructing strategy provides a powerful representation, beneficial in facial expression downstream tasks. Our experimental analysis shows that we consistently outperform the state-of-the-art in the challenging settings of low-intensity and fine-grained conversational facial expression recognition on four datasets including in-the-wild DFEW, CMU-MOSEI, MFA and multi-view MEAD. Our results suggest that MAURA is able to learn robust and generic video representations.
暂无评论