咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >(Image Present) ForgerySleuth:... 收藏
arXiv

(Image Present) ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

作     者:Sun, Zhihao Jiang, Haoran Chen, Haoran Cao, Yixin Qiu, Xipeng Wu, Zuxuan Jiang, Yu-Gang 

作者机构:Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China School of Computer Science Fudan University China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Large datasets 

摘      要:(Image Present) Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly applied to the IMD task, M-LLMs often produce reasoning texts that suffer from hallucinations and overthinking. To address this, in this work, we propose ForgerySleuth, which leverages M-LLMs to perform comprehensive clue fusion and generate segmentation outputs indicating specific regions that are tampered with. Moreover, we construct the ForgeryAnalysis dataset through the Chain-of-Clues prompt, which includes analysis and reasoning text to upgrade the image manipulation detection task. A data engine is also introduced to build a larger-scale dataset for the pre-training phase. Our extensive experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in generalization, robustness, and explainability. The resource is available at https://***/sunzhihao18/ForgerySleuth. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分