版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Computer Science and Technology Tongji University Shanghai200092 China College of Computer Science and Software Engineering Shenzhen University Nanshan District Guangdong Province Shenzhen City China
出 版 物:《SSRN》
年 卷 期:2023年
核心收录:
摘 要:Person re-identification (Re-ID) aims to retrieve the same person in the gallery. Great efforts have been made to learn salient feature representations from global structure patterns. Transformer has been introduced to the Re-ID task due to its strong long-range dependency modeling ability. However, using a plain Transformer structure to extract global features will ignore discriminative semantic information implied in various local structures in the global feature maps of pedestrian images. To address this issue, we present a Multi-granularity Cross Transformer Network (MCTN) that progressively learns salient features of different local structures in a global context. Specifically, the network mainly consists of two new designs, i.e., a Multi-granularity Convolutional Layer (MCL) and a Pyramidal Cross Transformer learning layer (PCT). The MCL is intended to simulate human vision to investigate salient pedestrian features at various granularities. The PCT is designed to mine local information in the global structure from a coarse-to-fine perspective. Furthermore, considering that deep layers pay attention to more semantic information, no more fine-grained attention learning is required to avoid overfitting. The shallow layers, on the other hand, focus on details but also have a lot of semantic information that hasn t been mined yet. Consequently, a Hierarchical Aggregation Strategy (HAS) is introduced to fuse features learned by cross attention learning at different stages. Pedestrian features learned in shallow layers will serve as global priors for semantics learning in deep layers. We evaluate our method on four large-scale Re-ID datasets, and the experimental results reveal that the proposed method outperforms the state-of-the-art methods. © 2023, The Authors. All rights reserved.