Among the different deepfake generation techniques, flow-based methods appear as natural candidates. Due to the property of invertibility, flow-based methods eliminate the necessity of person-specific training and are...
详细信息
ISBN:
(纸本)9781665448994
Among the different deepfake generation techniques, flow-based methods appear as natural candidates. Due to the property of invertibility, flow-based methods eliminate the necessity of person-specific training and are able to reconstruct any input image almost perfectly to human perception. We present a method for deepfake generation based on facial expression transfer using flow-based generative models. Our approach relies on simple latent vector operations akin to the ones used for attribute manipulation, but for transferring expressions between identity source-target pairs. We show the feasibility of this approach using a pre-trained Glow model and small sets of source and target images, not necessarily considered during prior training. We also provide an evaluation pipeline of the generated images in terms of similarities between identities and Action Units encoding the expression to be transferred. Our results show that an efficient expression transfer is feasible by using the proposed approach setting up a first precedent in deepfake content creation, and its evaluation, independently of the training identities.
In this paper, we propose a novel reference based image super-resolution approach via Variational AutoEncoder (RefVAE). Existing state-of-the-art methods mainly focus on single image super-resolution which cannot perf...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose a novel reference based image super-resolution approach via Variational AutoEncoder (RefVAE). Existing state-of-the-art methods mainly focus on single image super-resolution which cannot perform well on large upsampling factors, e.g., 8x. We propose a reference based image super-resolution, for which any arbitrary image can act as a reference for super-resolution. Even using random map or low-resolution image itself the proposed RefVAE can transfer the knowledge from the reference to the super-resolved images. Depending upon different references, the proposed method can generate different versions of super-resolved images from a hidden super-resolution space. Besides using different datasets for some standard evaluations with PSNR and SSIM, we also took part in the NTIRE2021 SR Space challenge [21] and have provided results of the randomness evaluation of our approach. Compared to other state-of-the-art methods, our approach achieves higher diverse scores.
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring in...
详细信息
ISBN:
(纸本)9781665448994
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://***/Riretta/Image_Colourisation_WiCV_2021.
Language prior plays an important role in the way humans detect and recognize text in the wild. Current scene text recognition methods do use lexicons to improve recognition performance, but their naive approach of ca...
详细信息
ISBN:
(纸本)9781665445092
Language prior plays an important role in the way humans detect and recognize text in the wild. Current scene text recognition methods do use lexicons to improve recognition performance, but their naive approach of casting the output into a dictionary word based purely on the edit distance has many limitations. In this paper, we present a novel approach to incorporate a dictionary in both the training and inference stage of a scene text recognition system. We use the dictionary to generate a list of possible outcomes and find the one that is most compatible with the visual appearance of the text. The proposed method leads to a robust scene text recognition model, which is better at handling ambiguous cases encountered in the wild, and improves the overall performance of state-of-the-art scene text spotting frameworks. Our work suggests that incorporating language prior is a potential approach to advance scene text detection and recognition methods. Besides, we contribute VinText, a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset will serve as a challenging benchmark for measuring the applicability and robustness of scene text detection and recognition algorithms.
Recent deep learning models have shown remarkable performance in image classification. While these deep learning systems are getting closer to practical deployment, the common assumption made about data is that it doe...
详细信息
ISBN:
(纸本)9781665445092
Recent deep learning models have shown remarkable performance in image classification. While these deep learning systems are getting closer to practical deployment, the common assumption made about data is that it does not carry any sensitive information. This assumption may not hold for many practical cases, especially in the domain where an individual's personal information is involved, like healthcare and facial recognition systems. We posit that selectively removing features in this latent space can protect the sensitive information and provide better privacy-utility trade-off. Consequently, we propose DISCO which learns a dynamic and data driven pruning filter to selectively obfuscate sensitive information in the feature space. We propose diverse attack schemes for sensitive inputs & attributes and demonstrate the effectiveness of DISCO against state-of-the-art methods through quantitative and qualitative evaluation. Finally, we also release an evaluation benchmark dataset of 1 million sensitive representations to encourage rigorous exploration of novel attack and defense schemes at https://***/splitlearning/InferenceBenchmark.
Image-text retrieval is a widely studied topic in the field of computervision due to the exponential growth of multimedia data, whose core concept is to measure the similarity between images and text. However, most e...
详细信息
ISBN:
(纸本)9798350390155;9798350390162
Image-text retrieval is a widely studied topic in the field of computervision due to the exponential growth of multimedia data, whose core concept is to measure the similarity between images and text. However, most existing retrieval methods heavily rely on cross-attention mechanisms for cross-modal fine-grained alignment, which takes into account excessive irrelevant regions and treats prominent and non-significant words equally. This paper aims to investigate an alignment approach that reduces the involvement of non-significant fragments in images and text while enhancing the alignment of prominent fragments. For this purpose, we introduce the Cross-Modal Prominent Fragments Enhancement Aligning Network(CPFEAN). In practice, we first design a novel intra-modal fragments relationship reasoning method, and subsequently employ our proposed alignment mechanism to compute the similarity between images and text. Extensive quantitative comparative experiments on MS-COCO and Flickr30K datasets demonstrate that our approach outperforms state-of-the-art methods.
Keystroke dynamics is a powerful behavioral biometric capable of user authentication based on typing patterns. As larger keystroke datasets become available, machine learning and deep learning algorithms are becoming ...
详细信息
ISBN:
(纸本)9781665448994
Keystroke dynamics is a powerful behavioral biometric capable of user authentication based on typing patterns. As larger keystroke datasets become available, machine learning and deep learning algorithms are becoming popular. Knowledge of every possible impostor is not known during training which means that keystroke dynamics is an open set recognition problem. Treating open set recognition problems as closed set (assuming samples from all impostors are present) can cause models to incur data leakage, which can provide unrealistic overestimates of performance. It is a common problem in machine learning and can cause models to report higher accuracies than would be expected in the real world. In this paper, we outline open set recognition and discuss how, if not handled properly, it can lead to data leakage. The performance of common machine learning methods, such as SVM and MLP are investigated with and without leakage to clearly demonstrate the differences in performance. A synthetic dataset and a publicly available keystroke dynamics fixed-text dataset are used for research transparency and reproducibility.
Inspired by the fact that human eyes continue to develop tracking ability in early and middle childhood, we propose to use tracking as a proxy task for a computervision system to learn the visual representations. Mod...
详细信息
ISBN:
(纸本)9781665445092
Inspired by the fact that human eyes continue to develop tracking ability in early and middle childhood, we propose to use tracking as a proxy task for a computervision system to learn the visual representations. Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations that would help with video-related tasks. In the proposed pretraining framework, we cut an image patch from a given video and let it scale and move according to a pre-set trajectory. The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame. We discover that using multiple image patches simultaneously brings clear benefits. We further increase the difficulty of the game by randomly making patches invisible. Extensive experiments on mainstream benchmarks demonstrate the superior performance of CtP against other video pretraining methods. In addition, CtP-pretrained features are less sensitive to domain gaps than those trained by a supervised action recognition task. When both trained on Kinetics-400, we are pleasantly surprised to find that CtP-pretrained representation achieves much higher action classification accuracy than its fully supervised counterpart on Something-Something dataset.
Object recognition represents a significant area of investigation within the field of computervision, with applications spanning industrial detection, traffic supervision, remote sensing, biomedicine and numerous oth...
详细信息
Under mild conditions on the noise level of the measurements, rotation averaging satisfies strong duality, which enables global solutions to be obtained via semidefinite programming (SDP) relaxation. However, generic ...
详细信息
ISBN:
(纸本)9781665445092
Under mild conditions on the noise level of the measurements, rotation averaging satisfies strong duality, which enables global solutions to be obtained via semidefinite programming (SDP) relaxation. However, generic solvers for SDP are rather slow in practice, even on rotation averaging instances of moderate size, thus developing specialised algorithms is vital. In this paper, we present a fast algorithm that achieves global optimality called rotation coordinate descent (RCD). Unlike block coordinate descent (BCD) which solves SDP by updating the semidefinite matrix in a row-by-row fashion, RCD directly maintains and updates all valid rotations throughout the iterations. This obviates the need to store a large dense semidefinite matrix. We mathematically prove the convergence of our algorithm and empirically show its superior efficiency over state-of-the-art global methods on a variety of problem configurations. Maintaining valid rotations also facilitates incorporating local optimisation routines for further speed-ups. Moreover, our algorithm is simple to implement(1).
暂无评论