咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Precise Facial Landmark Detect... 收藏
arXiv

Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

作     者:Wan, Jun Liu, He Wu, Yujia Lai, Zhihui Min, Wenwen Liu, Jun 

作者机构:School of Information Engineering Zhongnan University of Economics and Law Wuhan430073 China College of Computer Science and Software Engineering Shenzhen University Shenzhen518060 China School of Information Science and Technology Sanda University Shanghai201209 China School of Information Science and Engineering Yunnan University Yunnan Kunming650091 China Information Systems Technology and Design Pillar Singapore University of Technology and Design Singapore487372 Singapore 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Deep neural networks 

摘      要:At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://***/GERMINO-LiuHe/DSAT. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分