Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set ...
详细信息
ISBN:
(纸本)9781665448994
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the met...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the metric consistency from training to testing, is an essential component to the success of metric-based few-shot learning. Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains. The resulting approach, called rectified metric propagation (ReMP), further optimizes an attentive prototype propagation network, and applies a repulsive force to make confident predictions. Extensive experiments demonstrate that the proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of ...
详细信息
ISBN:
(纸本)9781665448994
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph's adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter. The experimental results show that the proposed method achieves state-of-the-art performance on the publicly available FCVID and YLI-MED datasets(1).
We present WiCV 2018 - Women in computervision Workshop to increase the visibility and inclusion of women researchers in computervision field, organized in conjunction with CVPR 2018. computervision and machine lea...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We present WiCV 2018 - Women in computervision Workshop to increase the visibility and inclusion of women researchers in computervision field, organized in conjunction with CVPR 2018. computervision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration, and to provide mentorship and give opportunities to female-identifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations.
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective m...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective method to better preserve details of the clothing and person by introducing an additional fitting step after geometric warping. This minimal modification enables disentangling representations of the clothing from the wearer, hence we are able to preserve the wearer-agnostic structure and details of the clothing, to fit a garment naturally to a variety of poses and body shapes. Moreover, we propose a novel evaluation framework applicable to any metric, to better reflect the semantics of clothes fitting. From extensive experiments, we empirically verify that the proposed method not only learns to disentangle clothing from the wearer, but also preserves details of the clothing on the try-on results.
In recent years, with the advent of cheap and accurate RGBD (RGB plus Depth) active sensors like the Microsoft Kinect and devices based on time-of-flight (ToF) technology, there has been increasing interest in 3D-base...
详细信息
ISBN:
(纸本)9780769549903
In recent years, with the advent of cheap and accurate RGBD (RGB plus Depth) active sensors like the Microsoft Kinect and devices based on time-of-flight (ToF) technology, there has been increasing interest in 3D-based applications. At the same time, several effective improvements to passive stereo vision algorithms have been proposed in the literature. Despite these facts and the frequent deployment of stereo vision for many research activities, it is often perceived as a bulky and expensive technology not well suited to consumer applications. In this paper, we will review a subset of state-of-the-art stereo vision algorithms that have the potential to fit a target computing architecture based on low-cost field-programmable gate arrays (FPGAs), without additional external devices (e. g., FIFOs, DDR memories, etc.). Mapping these algorithms into a similar low-power, low-cost architecture would make RGBD sensors based on stereo vision suitable to a wider class of application scenarios currently not addressed by this technology.
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus...
详细信息
ISBN:
(纸本)9781665448994
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and ob...
详细信息
ISBN:
(纸本)9781728193601
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, d...
详细信息
ISBN:
(纸本)9781665448994
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, due to the necessity of quantisation. To overcome this difficulty in training, researchers have used various approximations to the quantisation step. However, little work has studied the mechanism of quantisation approximation itself. We address this issue, identifying three gaps arising in the quantisation approximation problem. These gaps are visualised, and show the effect of applying different quantisation approximation methods. Following this analysis, we propose a Soft-STE quantisation approximation method, which closes these gaps and demonstrates better performance than other quantisation approaches on the Kodak dataset.
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the n...
详细信息
ISBN:
(纸本)9781665448994
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases.
暂无评论