咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Learning Video Salient Object ... 收藏

Learning Video Salient Object Detection Progressively From Unlabeled Videos

作     者:Xu, Binwei Jiang, Qiuping Liang, Haoran Zhang, Dingwen Liang, Ronghua Chen, Peng 

作者机构:Ningbo Univ Fac Informat Sci & Engn Ningbo 315211 Peoples R China Zhejiang Univ Technol Coll Comp Sci & Technol Hangzhou 310023 Peoples R China Northwestern Polytech Univ Sch Automat Brain & Artificial Intelligence Lab Xian 710072 Peoples R China 

出 版 物:《IEEE TRANSACTIONS ON MULTIMEDIA》 (IEEE Trans Multimedia)

年 卷 期:2025年第27卷

页      面:2423-2435页

核心收录:

学科分类:0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Natural Science Foundation of China [62406154, 62271277] Natural Science Foundation of Ningbo [2024J209, 2022J081] Postdoctoral Fellowship Program of CPSF [GZC20240766] Natural Science Foundation of Zhejiang [LR22F020002] 

主  题:Image segmentation Deformation Spatiotemporal phenomena Object detection Motion segmentation Optical flow Annotations Labeling Image annotation Dynamics Location optical flow segmentation video salient object detection weakly supervised learning 

摘      要:Recently, deep learning-based video salient object detection (VSOD) has achieved some breakthroughs, but these methods rely on expensive annotated videos with pixel-wise annotations or weak annotations. In this paper, based on the similarities and differences between VSOD and image salient object detection (SOD), we propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation. To efficiently use the knowledge learned in the SOD dataset for VSOD efficiently, we introduce dynamic saliency to compensate for the lack of motion information of SOD during the locating process while maintaining the same fine segmenting process. Specifically, we utilize the coarse locating model trained on the image dataset, to identify frames with both static and dynamic saliency. Locating results of these frames are selected as spatiotemporal location labels. Moreover, by tracking salient objects in adjacent frames, the number of spatiotemporal location labels is increased. On the basis of these location labels, a two-stream locating network with an optical flow branch is proposed to capture salient objects in videos. The results with respect to five public benchmarks demonstrate that our method outperforms the state-of-the-art weakly and unsupervised methods.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分