Line art plays a fundamental role in illustration and design, and allows for iteratively polishing designs. However, as they lack color, they can have issues in conveying final designs. In this work, we propose an int...
详细信息
ISBN:
(纸本)9781665448994
Line art plays a fundamental role in illustration and design, and allows for iteratively polishing designs. However, as they lack color, they can have issues in conveying final designs. In this work, we propose an interactive colorization approach based on a conditional generative adversarial network that takes both the line art and color hints as inputs to produce a high-quality colorized image. Our approach is based on a U-net architecture with a multi-discriminator framework. We propose a Concatenation and Spatial Attention module that is able to generate more consistent and higher quality of line art colorization from user given hints. We evaluate on a large-scale illustration dataset and comparison with existing approaches corroborate the effectiveness of our approach.
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set ...
详细信息
ISBN:
(纸本)9781665448994
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect use...
详细信息
ISBN:
(纸本)9781665448994
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect users' privacy. This paper demonstrated a people segmentation framework called CE-PeopleSeg, which employed an efficient segmentation method, structural pruning, and dynamic frame skipping techniques, leading to a fast inference speed on CPU. Our extensive experiments show that the proposed CE-PeopleSeg can achieve a high prediction mIoU of 87.9% on Supervised People Dataset while reaching a real-time inference speed of 32.40 fps on CPU with very low usage of 10%. Our code would be released at https://***/geekJZY/***.
We study event-based sensors in the context of spacecraft guidance and control during a descent on Moon-like terrains. For this purpose, we develop a simulator reproducing the event-based camera outputs when exposed t...
详细信息
ISBN:
(纸本)9781665448994
We study event-based sensors in the context of spacecraft guidance and control during a descent on Moon-like terrains. For this purpose, we develop a simulator reproducing the event-based camera outputs when exposed to synthetic images of a space environment. We find that it is possible to reconstruct, in this context, the divergence of optical flow vectors (and therefore the time to contact) and use it in a simple control feedback scheme during simulated descents. The results obtained are very encouraging, albeit insufficient to meet the stringent safety constraints and modelling accuracy imposed upon space missions. We thus conclude by discussing future work aimed at addressing these limitations.
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the met...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the metric consistency from training to testing, is an essential component to the success of metric-based few-shot learning. Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains. The resulting approach, called rectified metric propagation (ReMP), further optimizes an attentive prototype propagation network, and applies a repulsive force to make confident predictions. Extensive experiments demonstrate that the proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
The objective of this work is to annotate sign instances across a broad vocabulary in continuous sign language. We train a Transformer model to ingest a continuous signing stream and output a sequence of written token...
详细信息
ISBN:
(纸本)9781665445092
The objective of this work is to annotate sign instances across a broad vocabulary in continuous sign language. We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens on a largescale collection of signing footage with weakly-aligned subtitles. We show that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation. Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language;(2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary;(3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition;(4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least ...
详细信息
ISBN:
(纸本)9781665448994
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least Squares GANs (RaLSGANs) into Rate-Distortion Optimization for end-to-end training, to achieve perceptual image compression. We also compare two types of discriminator networks and visualize their reconstructed images. Experimental results have validated our method optimized by RaLSGANs can achieve higher subjective quality compared to PSNR, MS-SSIM or LPIPS-optimized models.
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters dow...
详细信息
ISBN:
(纸本)9781665448994
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters down to one to three. The relative pose can then be expressed using varying calibration constraints on the fundamental matrix, with entries that are polynomial in the parameters. We can then apply standard techniques based on the action matrix and Sturm sequences to construct our solvers. This enables efficient solvers for a large class of relative pose problems with radial distortion, using a common framework. We evaluate a number of these solvers for robust two-view inlier and epipolar geometry estimation, used as minimal solvers in RANSAC.
This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent wi...
详细信息
ISBN:
(纸本)9781665448994
This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent with a constructive proof outline showing that binary neural networks are universal function approximators. 71.24% top 1 accuracy on the 2012 ImageNet validation set was achieved with a 2 step training procedure and implementation strategies optimized for binary operands are provided.
For convolutional neural networks (CNNs), a common hypothesis that explains both their generalization capability and their characteristic brittleness is that these models are implicitly regularized to rely on impercep...
详细信息
ISBN:
(纸本)9781665448994
For convolutional neural networks (CNNs), a common hypothesis that explains both their generalization capability and their characteristic brittleness is that these models are implicitly regularized to rely on imperceptible high-frequency patterns, more than humans would do. This hypothesis has seen some empirical validation, but most works do not rigorously divide the image frequency spectrum. We present a model to divide the spectrum in disjointed discs based on the distribution of energy and apply simple feature importance procedures to test whether high-frequencies are more important than lower ones. We find evidence that mid or high-level frequencies are disproportionately important for CNNs. The evidence is robust across different datasets and networks. Moreover, we find the diverse effects of the network's attributes, such as architecture and depth, on frequency bias and robustness in general.
暂无评论