The goal of text spotting is to perform text detection and recognition in an end-to-end manner. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape...
详细信息
Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Net...
详细信息
ISBN:
(数字)9781728148038
ISBN:
(纸本)9781728148045
Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow's feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available.
Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods [34] demonstrate that using global context for re-weighting feature channels c...
详细信息
Video-based face recognition has attracted a great deal of attention in recent years due to its wide applications. The challenge of video-based face recognition comes from several aspects. First, video data involves m...
详细信息
ISBN:
(纸本)9781479913329
Video-based face recognition has attracted a great deal of attention in recent years due to its wide applications. The challenge of video-based face recognition comes from several aspects. First, video data involves many frames, which increases data size and processing complexity. Second, key frames extracted from videos are usually of high intra-personal discrepancy due to variations in expressions, poses, and illuminations. In order to address these problems, we propose a novel semantic based subspace model to improve the performance of video based face recognition. The basic idea is to construct an appropriate low-dimensional subspace for each person, upon which a semantic model is built to classify the key frames of the person into specific class. After the semantic classification, the key frames belonging to the same classes, i.e. the same semantics, are used to train the linear classifiers for recognition. Extensive experiments on a large face video database (XM2VTS) clearly show that our approach obtains a significant performance improvement over the traditional approaches.
In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is desi...
详细信息
In recent years, face recognition systems have faced increasingly security threats, making it essential to employ Face Anti-spoofing (FAS) to protect against various types of attacks in traditional scenarios like phon...
In recent years, face recognition systems have faced increasingly security threats, making it essential to employ Face Anti-spoofing (FAS) to protect against various types of attacks in traditional scenarios like phone unlocking, face payment and self-service security inspection. However, further exploration is required to fully secure FAS in long-distance settings. In this paper, we propose two contributions to enhance the security of face recognition systems: Dynamic Feature Queue (DFQ) and Progressive Training Strategy (PTS). DFQ converts the conventional binary classification task into a multi-classification task. It treats live samples as a closed set and attack samples as an open set by using a dynamic queue that stores the features of spoofing samples and updates them. On the other hand, PTS targets difficult samples and iteratively adds them in batches for training. The proposed PTS divides the entire training set into blocks, trains only a small portion of the data, and gradually increases the training data with each stage while also incorporating low-scoring positive samples and high-scoring spoof samples from the test set. These two contributions complement each other by enhancing the model’s ability to generalize and defend against various types of attacks, making the face recognition system more secure and reliable. Our proposed methods have achieved top performance on ACER metric with 4.73% on the SuHiFiMask dataset [11] and won the first prize in Surveillance Face Anti-spoofing track of the Challenge@CVPR 2023.
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation ...
详细信息
Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produc...
详细信息
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? Wha...
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.
Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement. However, most of them still suffer from two main problems: expensive computational ...
详细信息
暂无评论