检索结果-内蒙古大学图书馆

TransFER: learning Relation-aware Facial Expression Representations with Transformers

学校读者我要写书评

暂无评论

TransFER: Learning Relation-aware Facial Expression Represen...

International Conference on Computer Vision (ICCV)

作者： Fanglei Xue Qiangchang Wang Guodong Guo University of Chinese Academy of Sciences Beijing China Chinese Academy of Sciences Beijing China West Virginia University Morgantown USA Institute of Deep Learning Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China

ISBN: (纸本)9781665428132

Facial expression recognition (FER) has received increasing interest in computer vision. We propose the Trans-FER model which can learn rich relation-aware local representations. It mainly consists of three components: Multi-Attention Dropping (MAD), ViT-FER, and Multi-head Self-Attention Dropping (MSAD). First, local patches play an important role in distinguishing various expressions, however, few existing works can locate discriminative and diverse local patches. This can cause serious problems when some patches are invisible due to pose variations or viewpoint changes. To address this issue, the MAD is proposed to randomly drop an attention map. Consequently, models are pushed to explore diverse local patches adaptively. Second, to build rich relations between different local patches, the Vision Transformers (ViT) are used in FER, called ViT-FER. Since the global scope is used to reinforce each local patch, a better representation is obtained to boost the FER performance. Thirdly, the multi-head self-attention allows ViT to jointly attend to features from different information subspaces at different positions. Given no explicit guidance, however, multiple self-attentions may extract similar relations. To address this, the MSAD is proposed to randomly drop one self-attention module. As a result, models are forced to learn rich relations among diverse local patches. Our proposed TransFER model outperforms the state-of-the-art methods on several FER benchmarks, showing its effectiveness and usefulness.

关键词： Computer vision Adaptation models Face recognition Computational modeling Transfer learning Computer architecture Benchmark testing

CASIA-SURF: A Large-Scale Multi-Modal Benchmark for Face Anti-Spoofing

学校读者我要写书评

暂无评论

IEEE Transactions on Biometrics, Behavior, and Identity Scie...

IEEE Transactions on Biometrics, Behavior, and Identity Science 2020年第2期2卷 182-193页

作者： Zhang, Shifeng Liu, Ajian Wan, Jun Liang, Yanyan Guo, Guodong Escalera, Sergio Escalante, Hugo Jair Li, Stan Z. National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing100190 China Faculty of Information Technology Macau University of Science and Technology 999078 China Baidu Research and National Engineering Laboratory for Deep Learning Technology and Application Institute of Deep Learning Beijing100085 China Óptica y Electrónica Instituto Nacional de Astrofísica Puebla08007 Mexico Computer Science Department CINVESTAV-Zacatenco Mexico City07360 Mexico University of Chinese Academy of Sciences Beijing100049 China Westlake University Hangzhou310024 China

Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of 1,000 subjects with 21,000 videos and each sample has 3 modalities (i.e., RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://***/***/face-anti-spoofing/welcome/challengecvpr2019?authuser=0. © 2019 IEEE.

关键词： Face recognition

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

学校读者我要写书评

暂无评论

Self-supervised Monocular Depth Estimation for All Day Image...

International Conference on Computer Vision (ICCV)

作者： Lina Liu Xibin Song Mengmeng Wang Yong Liu Liangjun Zhang Institute of Cyber-Systems and Control Zhejiang University China Baidu Research China National Engineering Laboratory of Deep Learning Technology and Application China Huzhou Institue of Zhejiang University China

ISBN: (纸本)9781665428132

Remarkable results have been achieved by DCNN based self-supervised depth estimation approaches. However, most of these approaches can only handle either day-time or night-time images, while their performance degrades for all-day images due to large domain shift and the variation of illumination between day and night images. To relieve these limitations, we propose a domain-separated network for self-supervised depth estimation of all-day images. Specifically, to relieve the negative influence of disturbing terms (illumination, etc.), we partition the information of day and night image pairs into two complementary sub-spaces: private and invariant domains, where the former contains the unique information (illumination, etc.) of day and night images and the latter contains essential shared information (texture, etc.). Meanwhile, to guarantee that the day and night images contain the same information, the domain-separated network takes the day-time images and corresponding night-time images (generated by GAN) as input, and the private and invariant feature extractors are learned by orthogonality and similarity loss, where the domain gap can be alleviated, thus better depth maps can be expected. Meanwhile, the reconstruction and photometric losses are utilized to estimate complementary information and depth maps effectively. Experimental results demonstrate that our approach achieves state-of-the-art depth estimation results for all-day images on the challenging Oxford RobotCar dataset, proving the superiority of our proposed approach. Code and data split are available at https://***/LINA-lln/ADDS-DepthNet.

关键词： Computer vision Codes Estimation Lighting Generative adversarial networks Feature extraction Data mining

Self-supervised monocular depth estimation for all day images using domain separation

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Liu, Lina Song, Xibin Wang, Mengmeng Liu, Yong Zhang, Liangjun Institute of Cyber-Systems and Control Zhejiang University China Baidu Research China Huzhou Institue of Zhejiang University China National Engineering Laboratory of Deep Learning Technology and Application China

关键词： Textures

Nested Collaborative learning for Long-Tailed Visual Recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Li, Jun Tan, Zichang Wan, Jun Lei, Zhen Guo, Guodong CBSR&NLPR Institute of Automation Chinese Academy of Sciences Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences Beijing China Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China Centre for Artificial Intelligence and Robotics Hong Kong Institute of Science&Innovation Chinese Academy of Sciences Hong Kong

The networks trained on the long-tailed dataset vary remarkably, despite the same training settings, which shows the great uncertainty in long-tailed learning. To alleviate the uncertainty, we propose a Nested Collaborative learning (NCL), which tackles the problem by collaboratively learning multiple experts together. NCL consists of two core components, namely Nested Individual learning (NIL) and Nested Balanced Online Distillation (NBOD), which focus on the individual supervised learning for each single expert and the knowledge transferring among multiple experts, respectively. To learn representations more thoroughly, both NIL and NBOD are formulated in a nested way, in which the learning is conducted on not just all categories from a full perspective but some hard categories from a partial perspective. Regarding the learning in the partial perspective, we specifically select the negative categories with high predicted scores as the hard categories by using a proposed Hard Category Mining (HCM). In the NCL, the learning from two perspectives is nested, highly related and complementary, and helps the network to capture not only global and robust features but also meticulous distinguishing ability. Moreover, self-supervision is further utilized for feature enhancement. Extensive experiments manifest the superiority of our method with outperforming the state-of-the-art whether by using a single model or an ensemble. Code is available at https://***/Bazinga699/NCL Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

Experimental and Numerical Study on the Compressive Behaviour of Cfrp-Confined Columns with Recycled Waste Concrete

学校读者我要写书评

暂无评论

SSRN

SSRN 2024年

作者： Xu, Y. Wang, Xin Dong, Jiangfeng Guan, Zhongwei Wang, Qingyuan Key Laboratory of C & PC Structures Ministry of Education National and Local Unified Engineering Research Center for Basalt Fiber Production and Application Technology Southeast University Nanjing211189 China MOE Key Laboratory of Deep Earth Science and Engineering School of Architecture and Environment Sichuan University Chengdu610065 China Sichuan Provincial Engineering Research Center of City Solid Waste Energy and Building Materials Conversion and Utilization Technology Chengdu University Chengdu610106 China Advanced Materials Research Centre TII Abu Dhabi United Arab Emirates

This study investigates how externally wrapping CFRP enhances the mechanical and structural performance of recycled concrete structures, aiming to optimize the use of recycled aggregate concrete in civil engineering. Here, an experimental and numerical study is undertaken on the axial compressive behaviour of 30 RAC columns and 15 BFRC columns confined by CFRP sheets. The study focuses on effects of RCA replacement ratio, cross-sectional shape, CFRP wrapping form and BF addition on the compressive properties of specimens. The results show that the increase of RCA replacement ratio leads to the decrease of the ultimate hoop strain by 15.2~47.8%. The partial and full wrapping of CFRP, as well as the number of layers, significantly affect the bearing capacity and axial deformation capacity of column specimens. The partial and full wrappings of CFRP provide the enhancements of the ultimate bearing capacity by 80~153% and 119~206%, and the ultimate axial displacement by 49~113% and 15~75%, respectively. Also, single-layer and double-layer CFRP confinements offer the corresponding enhancements by 80~150% and 124~206%, and 15~50% and 27~113%, respectively. Moreover, with a uniform confinement, the enhancement rate on the ultimate strain and bearing capacity of the circular column specimens is higher than the square column counterparts. The addition of BF can further improve the axial compression performance of the core RAC column well, also inhibit the confinement of CFRP. In addition, the ultimate bearing capacity model and the finite element model of RAC confined by CFRP are established, which are validated against the experimental results with good correlation. © 2024, The Authors. All rights reserved.

关键词： Bearing capacity

Out-of-town recommendation with travel intention modeling

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Xin, Haoran Lu, Xinjiang Xu, Tong Liu, Hao Gu, Jingjing Dou, Dejing Xiong, Hui University of Science and Technology of China China Business Intelligence Lab Baidu Research China National Engineering Laboratory of Deep Learning Technology and Application China Nanjing University of Aeronautics and Astronautics China Rutgers University United States

Out-of-town recommendation is designed for those users who leave their home-town areas and visit the areas they have never been to before. It is challenging to recommend Point-of-Interests (POIs) for out-of-town users since the out-of-town check-in behavior is determined by not only the user’s home-town preference but also the user’s travel intention. Besides, the user’s travel intentions are complex and dynamic, which leads to big difficulties in understanding such intentions precisely. In this paper, we propose a TRAvel-INtention-aware Out-of-town Recommendation framework, named TRAINOR. The proposed TRAINOR framework distinguishes itself from existing out-of-town recommenders in three aspects. First, graph neural networks are explored to represent users’ home-town check-in preference and geographical constraints in out-of-town check-in behaviors. Second, a user-specific travel intention is formulated as an aggregation combining home-town preference and generic travel intention together, where the generic travel intention is regarded as a mixture of inherent intentions that can be learned by Neural Topic Model (NTM). Third, a non-linear mapping function, as well as a matrix factorization method, are employed to transfer users’ home-town preference and estimate out-of-town POI’s representation, respectively. Extensive experiments on real-world data sets validate the effectiveness of the TRAINOR framework. Moreover, the learned travel intention can deliver meaningful explanations for understanding a user’s travel purposes. © 2021, CC BY-NC-SA.

关键词： Graph neural networks

Spatial object recommendation with hints: When spatial granularity matters

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Luo, Hui Zhou, Jingbo Bao, Zhifeng Li, Shuangli Culpepper, J. Shane Ying, Haochao Liu, Hao Xiong, Hui RMIT University Australia Business Intelligence Lab Baidu Research National Engineering Laboratory of Deep Learning Technology and Application China University of Science and Technology of China China Zhejiang University China Rutgers University United States

Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different stages of data exploration. In this paper, we study how to support top-k spatial object recommendations at varying levels of spatial granularity, enabling spatial objects at varying granularity, such as a city, suburb, or building, as a Point of Interest (POI). To solve this problem, we propose the use of a POI tree, which captures spatial containment relationships between POIs. We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level. Each task consists of two subtasks: (i) attribute-based representation learning;(ii) interaction-based representation learning. The first subtask learns the feature representations for both users and POIs, capturing attributes directly from their profiles. The second subtask incorporates user-POI interactions into the model. Additionally, MPR can provide insights into why certain recommendations are being made to a user based on three types of hints: user-aspect, POI-aspect, and interaction-aspect. We empirically validate our approach using two real-life datasets, and show promising performance improvements over several state-of-the-art methods. Copyright © 2021, The Authors. All rights reserved.

关键词： Forestry

IDARTS: Interactive Differentiable Architecture Search

学校读者我要写书评

暂无评论

IDARTS: Interactive Differentiable Architecture Search

International Conference on Computer Vision (ICCV)

作者： Song Xue Runqi Wang Baochang Zhang Tian Wang Guodong Guo David Doermann Beihang University Beijing China Jiangsu Key Laboratory of Image and Video Understanding for Social Safety Nanjing University of Science and Technology Nanjing China Lobachevsky State University of Nizhni Novgorod Nizhni Novgorod Russian Federation National Engineering Laboratory for Deep Learning Technology and Application Institute of Deep Learning Baidu Research Beijing China University at Buffalo USA

ISBN: (纸本)9781665428132

Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end. However, the intrinsic relationship between the architecture’s parameters is neglected, leading to a sub-optimal optimization process. The reason lies in the fact that the gradient descent method used in DARTS ignores the coupling relationship of the parameters and therefore degrades the optimization. In this paper, we address this issue by formulating DARTS as a bi-linear optimization problem and introducing an Interactive Differentiable Architecture Search (IDARTS). We first develop a backtracking backpropagation process, which can decouple the relationships of different kinds of parameters and train them in the same framework. The backtracking method coordinates the training of different parameters that fully explore their interaction and optimize training. We present experiments on the CIFAR10 and ImageNet datasets that demonstrate the efficacy of the IDARTS approach by achieving a top-1 accuracy of 76.52% on ImageNet without additional search cost vs. 75.8% with the state-of-the-art PC-DARTS.

关键词： Training Couplings Backpropagation Backtracking Computer vision Costs Computer architecture