咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >AMA: attention-based multi-fea... 收藏

AMA: attention-based multi-feature aggregation module for action recognition

作     者:Yu, Mengyun Chen, Ying 

作者机构:Jiangnan Univ Minist Educ Key Lab Adv Proc Control Light Ind Wuxi 214000 Jiangsu Peoples R China 

出 版 物:《SIGNAL IMAGE AND VIDEO PROCESSING》 (信号,图像与视频处理)

年 卷 期:2023年第17卷第3期

页      面:619-626页

核心收录:

学科分类:0808[工学-电气工程] 1002[医学-临床医学] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 

基  金:National Natural Science Foundation of China 

主  题:Action recognition Channel excitation Spatial-temporal aggregation Convolution neural network 

摘      要:Spatial information learning, temporal modeling and channel relationships capturing are important for action recognition in videos. In this work, an attention-based multi-feature aggregation (AMA) module that encodes the above features in a unified module is proposed, which contains a spatial-temporal aggregation (STA) structure and a channel excitation (CE) structure. STA mainly employs two convolutions to model spatial and temporal features, respectively. The matrix multiplication in STA has the ability of capturing long-range dependencies. The CE learns the importance of each channel, so as to bias the allocation of available resources toward the informative features. AMA module is simple yet efficient enough that can be inserted into a standard ResNet architecture without any modification. In this way, the representation of the network can be enhanced. We equip ResNet-50 with AMA module to build an effective AMA Net with limited extra computation cost, only 1.002 times that of ResNet-50. Extensive experiments indicate that AMA Net outperforms the state-of-the-art methods on UCF101 and HMDB51, which is 6.2% and 10.0% higher than the baseline. In short, AMA Net achieves the high accuracy of 3D convolutional neural networks and maintains the complexity of 2D convolutional neural networks simultaneously.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分