版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Nanjing Univ Informat Sci & Technol Sch Comp Sci Nanjing 210044 Peoples R China Minist Educ Engn Res Ctr Digital Forens Jiangsu Collaborat Innovat Ctr Atmospher Environm Nanjing 210044 Peoples R China Jinan Univ Coll Cyber Secur Guangzhou 510632 Peoples R China New Jersey Inst Technol Dept Elect & Comp Engn Newark NJ 07102 USA
出 版 物:《IEEE TRANSACTIONS ON MULTIMEDIA》 (IEEE Trans Multimedia)
年 卷 期:2025年第27卷
页 面:2503-2515页
核心收录:
学科分类:0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China [62172233 62472231 62122032]
主 题:Object removal forgery detection video passive forensics 3D convolution 3D convolution hybrid encoder-decoder model hybrid encoder-decoder model spatiotemporal localization spatiotemporal localization spatiotemporal localization
摘 要:With the growing popularity of high-resolution (HR) video and the continuous growth of network bandwidth, the challenge of object removal detection in HR videos has attracted significant attention. Expert forgers leverage the rich detail in HR videos for meticulous pixel manipulation and apply sophisticated postprocessing techniques to hide high-frequency artifacts, thereby making forgery detection and localization more difficult when existing schemes are used. Additionally, the end-to-end framework simplifies the detection and localization process, which has not been considered in previous work. To solve the above issues, a spatiotemporal encoder-decoder network (SEDN) is proposed for end-to-end object removal forgery detection in HR videos. In the SEDN, a new model composed of a 3D asymmetric dual-stream network (3D-ADSN) and Transformer is proposed. The 3D-ADSN is utilized as the encoder, which fully integrates the high-frequency and low-frequency spatiotemporal information of videos. Transformer is utilized as the decoder to capture the global structure spatiotemporal information of the long-range feature sequence obtained by the encoder. This network combination successfully achieves simultaneous detection in the temporal and spatial domains without any additional postprocessing calculations. The experimental results demonstrate the better performance of the SEDN at different resolutions.