咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Multi-Granularity Cross Transf... 收藏
SSRN

Multi-Granularity Cross Transformer Network for Person Re-Identification

作     者:Li, Yanping Miao, Duoqian Zhang, Hongyun Zhou, Jie Zhao, Cairong 

作者机构:Department of Computer Science and Technology Tongji University Shanghai200092 China College of Computer Science and Software Engineering Shenzhen University Nanshan District Guangdong Province Shenzhen City China 

出 版 物:《SSRN》 

年 卷 期:2023年

核心收录:

主  题:Large dataset 

摘      要:Person re-identification (Re-ID) aims to retrieve the same person in the gallery. Great efforts have been made to learn salient feature representations from global structure patterns. Transformer has been introduced to the Re-ID task due to its strong long-range dependency modeling ability. However, using a plain Transformer structure to extract global features will ignore discriminative semantic information implied in various local structures in the global feature maps of pedestrian images. To address this issue, we present a Multi-granularity Cross Transformer Network (MCTN) that progressively learns salient features of different local structures in a global context. Specifically, the network mainly consists of two new designs, i.e., a Multi-granularity Convolutional Layer (MCL) and a Pyramidal Cross Transformer learning layer (PCT). The MCL is intended to simulate human vision to investigate salient pedestrian features at various granularities. The PCT is designed to mine local information in the global structure from a coarse-to-fine perspective. Furthermore, considering that deep layers pay attention to more semantic information, no more fine-grained attention learning is required to avoid overfitting. The shallow layers, on the other hand, focus on details but also have a lot of semantic information that hasn t been mined yet. Consequently, a Hierarchical Aggregation Strategy (HAS) is introduced to fuse features learned by cross attention learning at different stages. Pedestrian features learned in shallow layers will serve as global priors for semantics learning in deep layers. We evaluate our method on four large-scale Re-ID datasets, and the experimental results reveal that the proposed method outperforms the state-of-the-art methods. © 2023, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分