检索结果-内蒙古大学图书馆

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction

学校读者我要写书评

暂无评论

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reco...

International Conference on computer vision (ICCV)

作者： Xiaoxing Zeng Xiaojiang Peng Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology University of Chinese Academy of Sciences China

ISBN: (数字)9781728148038

ISBN: (纸本)9781728148045

Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow's feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available.

关键词： Three-dimensional displays Face Image reconstruction Shape Solid modeling Two dimensional displays Training data

Learning to predict context-adaptive convolution for semantic segmentation

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Liu, Jianbo He, Junjun Ren, Jimmy S. Qiao, Yu Li, Hongsheng CUHK-SenseTime Joint Laboratory Chinese University of Hong Kong Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences SenseTime Research

Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods [34] demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K. Copyright © 2020, The Authors. All rights reserved.

关键词： Convolution

A semantic model for video based face recognition

学校读者我要写书评

暂无评论

A semantic model for video based face recognition

International Conference on Information and Automation (ICIA)

作者： Dihong Gong Kai Zhu Zhifeng Li Yu Qiao Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences The Chinese University of Hong Kong Hong Kong

ISBN: (纸本)9781479913329

Video-based face recognition has attracted a great deal of attention in recent years due to its wide applications. The challenge of video-based face recognition comes from several aspects. First, video data involves many frames, which increases data size and processing complexity. Second, key frames extracted from videos are usually of high intra-personal discrepancy due to variations in expressions, poses, and illuminations. In order to address these problems, we propose a novel semantic based subspace model to improve the performance of video based face recognition. The basic idea is to construct an appropriate low-dimensional subspace for each person, upon which a semantic model is built to classify the key frames of the person into specific class. After the semantic classification, the key frames belonging to the same classes, i.e. the same semantics, are used to train the linear classifiers for recognition. Extensive experiments on a large face video database (XM2VTS) clearly show that our approach obtains a significant performance improvement over the traditional approaches.

关键词： Face Training Face recognition Semantics Video sequences Principal component analysis Probes

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Mo, Ningkai Gan, Wanshui Yokoya, Naoto Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China The University of Tokyo Japan RIKEN Japan

In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is designed in a simple architecture that efficiently extracts point-wise features from RGB-D data using a fully convolutional network, called XYZNet, and directly regresses the 6D pose without any post refinement. In the case of symmetric object, one object has multiple ground-truth poses, and this one-to-many relationship may lead to estimation ambiguity. In order to solve this ambiguity problem, we design a symmetry-invariant pose distance metric, called average (maximum) grouped primitives distance or A(M)GPD. The proposed A(M)GPD loss can make the regression network converge to the correct state, i.e., all minima in the A(M)GPD loss surface are mapped to the correct poses. Extensive experiments on YCB-Video and TLESS datasets demonstrate the proposed framework's substantially superior performance in top accuracy and low computational cost. The relevant code is available in https://***/GANWANSHUI/***. Copyright © 2022, The Authors. All rights reserved.

关键词： Computational efficiency

Dynamic Feature Queue for Surveillance Face Anti-spoofing via Progressive Training

学校读者我要写书评

暂无评论

Dynamic Feature Queue for Surveillance Face Anti-spoofing vi...

IEEE computer Society Conference on computer vision and pattern recognition Workshops (CVPRW)

作者： keyao Wang Mouxiao Huang Guosheng Zhang Haixiao Yue Gang Zhang Yu Qiao Department of Computer Vision Technology (VIS) Baidu Inc. ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences

In recent years, face recognition systems have faced increasingly security threats, making it essential to employ Face Anti-spoofing (FAS) to protect against various types of attacks in traditional scenarios like phone unlocking, face payment and self-service security inspection. However, further exploration is required to fully secure FAS in long-distance settings. In this paper, we propose two contributions to enhance the security of face recognition systems: Dynamic Feature Queue (DFQ) and Progressive Training Strategy (PTS). DFQ converts the conventional binary classification task into a multi-classification task. It treats live samples as a closed set and attack samples as an open set by using a dynamic queue that stores the features of spoofing samples and updates them. On the other hand, PTS targets difficult samples and iteratively adds them in batches for training. The proposed PTS divides the entire training set into blocks, trains only a small portion of the data, and gradually increases the training data with each stage while also incorporating low-scoring positive samples and high-scoring spoof samples from the test set. These two contributions complement each other by enhancing the model’s ability to generalize and defend against various types of attacks, making the face recognition system more secure and reliable. Our proposed methods have achieved top performance on ACER metric with 4.73% on the SuHiFiMask dataset [11] and won the first prize in Surveillance Face Anti-spoofing track of the Challenge@CVPR 2023.

关键词：

Learning dynamical human-joint affinity for 3D pose estimation in videos

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Zhang, Junhao Wang, Yali Zhou, Zhipeng Luan, Tianyu Wang, Zhe Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of California Irvine United States Shanghai AI Laboratory Shanghai China

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Qin, Jin Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training. Copyright © 2023, The Authors. All rights reserved.

关键词： Textures

DegAE: A New Pretraining Paradigm for Low-Level vision

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level Vision

Conference on computer vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement. However, most of them still suffer from two main problems: expensive computational cost in high resolution images and unsatisfactory performance in simultaneous enhancement and denoising. To address these problems, we propose BDCE, a bootstrap diffusion model that exploits the learning of the distribution of the curve parameters instead of the normal-light image itself. Specifically, we adopt the curve estimation method to handle the high-resolution images, where the curve parameters are estimated by our bootstrap diffusion model. In addition, a denoise module is applied in each iteration of curve adjustment to denoise the intermediate enhanced result of each iteration. We evaluate BDCE on commonly used benchmark datasets, and extensive experiments show that it achieves state-of-the-art qualitative and quantitative performance. Copyright © 2023, The Authors. All rights reserved.

关键词： Image enhancement