咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Prompting Large Language Model... 收藏
arXiv

Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering

作     者:Hu, Zhongjian Yang, Peng Li, Bing Liu, Fengyuan 

作者机构:School of Computer Science and Engineering Southeast University China School of Information Management and Artificial Intelligence Zhejiang University of Finance and Economics China Southeast University Monash University Joint Graduate School China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Visual languages 

摘      要:Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分