咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Query-centric Audio-Visual Cog... 收藏
arXiv

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning

作     者:Tu, Yunbin Li, Liang Su, Li Huang, Qingming 

作者机构:School of Computer Science and Technology University of Chinese Academy of Sciences Beijing China Key Laboratory of AI Safety of CAS Institute of Computing Technology Chinese Academy of Sciences Beijing China Peng Cheng Laboratory Shenzhen China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Multimedia systems 

摘      要:Video has emerged as a favored multimedia format on the internet. To better gain video contents, a new topic HIREST is presented, including video retrieval, moment retrieval, moment segmentation, and step-captioning. The pioneering work chooses the pre-trained CLIP-based model for video retrieval, and leverages it as a feature extractor for other three challenging tasks solved in a multi-task learning paradigm. Nevertheless, this work struggles to learn the comprehensive cognition of user-preferred content, due to disregarding the hierarchies and association relations across modalities. In this paper, guided by the shallow-to-deep principle, we propose a query-centric audio-visual cognition (QUAG) network to construct a reliable multi-modal representation for moment retrieval, segmentation and step-captioning. Specifically, we first design the modality-synergistic perception to obtain rich audio-visual content, by modeling global contrastive alignment and local fine-grained interaction between visual and audio modalities. Then, we devise the query-centric cognition that uses the deep-level query to perform the temporal-channel filtration on the shallow-level audio-visual representation. This can cognize user-preferred content and thus attain a query-centric audio-visual representation for three tasks. Extensive experiments show QUAG achieves the SOTA results on HIREST. Further, we test QUAG on the query-based video summarization task and verify its good generalization. The code is available at https://***/tuyunbin/QUAG. © 2024, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分