文献详情 >Hierarchical Motion-Enhanced M... 收藏

Hierarchical Motion-Enhanced Matching Framework for Few-Shot Action Recognition

作者：Gao, Hailiang Xie, Guo-Sen Yan, Rui Cui, Qiongjie Qu, Hongyu Shu, Xiangbo

作者机构：Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China

出版物：《IEEE TRANSACTIONS ON MULTIMEDIA》 (IEEE Trans Multimedia)

年卷期：2025年第27卷

页面：2450-2462页

核心收录：

学科分类：0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：National Natural Science Foundation of China [62222207, 62072245, 61932020] Natural Science Foundation of Jiangsu Province [BK20211520] Open Foundation of the Key lab (center) of Anhui Jianzhu University Anhui Province Key Laboratory of Intelligent Building & Building Energy Saving [IBES2024KF02]

主　　题：Motion segmentation Videos Dynamics Few shot learning Prototypes Transformers Image recognition Graph convolutional networks Feature extraction Face recognition Action recognition few-shot action recognition few-shot learning transformer

摘要：Few-Shot Action Recognition (FSAR) aims to recognize novel class action with limited annotated training data from the same class. Most FSAR methods subconsciously follow the few-shot image classification solutions by solely focusing on appearance-level matching between support and query videos, such as part-level matching, frame-level matching, and segment-level matching. However, these methods, almost always, have two main limitations: 1) generally ignore the relationship among these part-, frame- and segment-level features and 2) may mismatch the same class actions under fast-term and slow-term dynamics. To this end, we present a novel Hierarchical Motion-enhanced Matching (HM2) framework to hierarchically learn the relation-aware multi-modal features, and jointly promote the multi-modal matching, including appearance-level matching on segments, frames, and parts, as well as the motion-level matching on dynamics. Specifically, we first propose a new Hierarchical Tokenizer (HT) to learn multi-modal features, namely utilizing a hierarchical Transformer to learn appearance-level features, along with a Slow-Fast Aware Motion (SFAM) strategy to learn motion-level features covering fast- and slow-term dynamics. Next, we propose a new Relation-aware Matcher (RM) to match the multi-modal features, by leveraging a Hierarchical Relational Graph Convolutional Network (H-RGCN) to capture the relationship among these appearance-level features. Further, a Dual Sample-to-Class Matching (DSCM) strategy is proposed to measure the bidirectional similarities among appearance- and motion-modal features by sample-to-class matching and class-to-sample matching. Extensive experiments on four golden FSAR datasets demonstrate significant performance improvements of HM2 compared with the state-of-the-art methods.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Hierarchical Motion-Enhanced Matching Framework for Few-Shot Action Recognition

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Hierarchical Motion-Enhanced Matching Framework for Few-Shot Action Recognition

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：