With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive ***,one of the main challenges is to effectively extract complementary features from ...
详细信息
With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive ***,one of the main challenges is to effectively extract complementary features from different modalities for action *** this work,a novel multimodal supervised learning framework based on convolution neural networks(Conv Nets)is proposed to facilitate extracting the compensation features from different modalities for human action *** on information aggregation mechanism and deep Conv Nets,our recognition framework represents spatial-temporal information from the base modalities by a designed framedifferenceaggregation spatial-temporal module(FDA-STM),that the networks bridges information from skeleton data through a multimodal supervised compensation block(SCB)to supervise the extraction of compensation *** evaluate the proposed recognition framework on three human action datasets,including NTU RGB+D 60,NTU RGB+D 120,and *** results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.
暂无评论