检索结果-内蒙古大学图书馆

RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering

INFORMATION fusion 2025年 118卷

作者： Chen, Weipeng Huang, Xu Liu, Zifeng Liu, Jin Yo, Lan Wuhan Univ Sch Comp Sci Wuhan 430072 Peoples R China Hubei Univ Key Lab Intelligent Sensing Syst & Secur Wuhan 430062 Peoples R China

Knowledge-based Visual Question Answering (KB-VQA) expands traditional VQA by utilizing world knowledge from external sources when the image alone is insufficient to infer a correct answer. Existing methods face challenges due to low recall rates, limiting the ability to gather essential information for accurate answers. While increasing the amount of retrieved knowledge entries can enhance recall, it often introduces irrelevant information, adversely impairing model performance. To overcome these challenges, we propose RK-VQA, which comprises two components: First, a zero-shot weighted hybrid knowledge retrieval method that integrates local and global visual features with textual features from image-question pairs, enhancing the quality of knowledge retrieval and improving recall rates. Second, a rational knowledge-aware fusion- in-decoder architecture enhances answer generation by focusing on rational knowledge and reducing the influence of irrelevant information. Specifically, we develop a rational module to extract rational features, subsequently utilized to prioritize pertinent information via a novel rational knowledge-aware attention mechanism. We evaluate our RK-VQA on the OK-VQA, which is the largest knowledge-based VQA dataset. The results demonstrate that RK-VQA achieves significant results, recording an accuracy of 64.11%, surpassing the previous best result by 2.03%.

关键词： Knowledge-based VQA Knowledge retrieval fusion-in-decoder

来源：评论

学校读者我要写书评

暂无评论

Leveraging Comment Retrieval for Code Summarization 1

引用

45th European Conference on Information Retrieval (ECIR)

作者： Hou, Shifu Chen, Lingwei Ju, Mingxuan Ye, Yanfang Univ Notre Dame Notre Dame IN 46556 USA Wright State Univ Dayton OH 45435 USA

ISBN: (数字)9783031282386

ISBN: (纸本)9783031282379;9783031282386

Open-source code often suffers from mismatched or missing comments, leading to difficult code comprehension, and burdening software development and maintenance. In this paper, we design a novel code summarization model CodeFiD to address this laborious challenge. Inspired by retrieval-augmented methods for open-domain question answering, CodeFiD first retrieves a set of relevant comments from code collections for a given code, and then aggregates presentations of code and these comments to produce a natural language sentence that summarizes the code behaviors. Different from current code summarization works that focus on improving code representations, our model resorts to external knowledge to enhance code summarizing performance. Extensive experiments on public code collections demonstrate the effectiveness of CodeFiD by outperforming state-of-the-art counterparts across all programming languages.

关键词： Code summarization Comment retrieval Heterogeneous graph neural network fusion-in-decoder

来源：评论

学校读者我要写书评

暂无评论

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation 23

FiD-Light: Efficient and Effective Retrieval-Augmented Text ...

引用

46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

作者： Hofstatter, Sebastian Chen, Jiecao Raman, Karthik Zamani, Hamed Cohere Vienna Austria Bytedance Inc Culver City CA USA Google Res Mountain View CA USA Univ Massachusetts Amherst MA 01003 USA

ISBN: (纸本)9781450394086

Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented FiD model, while maintaining the same level of effectiveness. Our FiD-Light model constrains the information flow from the encoder (which encodes passages separately) to the decoder (using concatenated encoded representations). Furthermore, we adapt FiD-Light with re-ranking capabilities through textual source pointers, to improve the top-ranked provenance precision. Our experiments on a diverse set of seven knowledge intensive tasks (KILT) show FiD-Light consistently improves the Pareto frontier between query latency and effectiveness. FiD-Light with source pointing sets substantial new state-of-the-art results on six KILT tasks for combined text generation and provenance retrieval evaluation, while maintaining high efficiency.

关键词： Retrieval Augmented Generation KILT fusion-in-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：