In practical applications, the generalization capability of face anti-spoofing (FAS) models on unseen domains is of paramount importance to adapt to diverse camera sensors, device drift, environmental variation, and u...
详细信息
In practical applications, the generalization capability of face anti-spoofing (FAS) models on unseen domains is of paramount importance to adapt to diverse camera sensors, device drift, environmental variation, and unpredictable attack types. Recently, various domain generalization (DG) methods have been developed to improve the generalization capability of FAS models via training on multiple source domains. These DG methods commonly require collecting sufficient real-world attack samples of different attack types for each source domain. This work aims to learn a FAS model without using any real-world attack sample in any source domain but can generalize well to the unseen domain, which can significantly reduce the learning cost. Toward this goal, we draw inspiration from the theoretical error bound of domain generalization to use negative data augmentation instead of real-world attack samples for training. We show that using only a few types of simple synthesized negative samples, e.g., color jitter and color mask, the learned model can achieve competitive performance over state-of-the-art DG methods trained using real-world attack samples. Moreover, a dynamic global common loss and a local contrast loss are proposed to prompt the model to learn a compact and common feature representation for real face samples from different source domains, which can further improve the generalization capability. Experimental results of extensive cross-dataset testing demonstrate that our method can even outperform state-of-the-art DG methods using real-world attack samples for training. The code for reproducing the results of our method is available at https://***/WeihangWANG/NDA-FAS.
Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of ...
详细信息
ISBN:
(纸本)9781728198354
Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodal features in the joint semantic space in a learnable manner;and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on the datasets of CUB and COCO when compared with the state-of-the-art approaches.
暂无评论