文献详情 >Cross-modal transformer with l... 收藏

Cross-modal transformer with language query for referring image segmentation

作者：Zhang, Wenjing Tan, Quange Li, Pengxin Zhang, Qi Wang, Rong

作者机构：Peoples Publ Secur Univ China Sch Informat & Cyber Secur Beijing 434020 Peoples R China Minist Publ Secur Key Lab Secur Prevent Technol & Risk Assessment Beijing 434020 Peoples R China

出版物：《NEUROCOMPUTING》 (神经计算)

年卷期：2023年第536卷

页面：191-205页

核心收录：

学科分类：08[工学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：National Natural Science Foundation of China Fundamental Research Funds for the Central Universities [2019JKF426]

主　　题：Referring image segmentation Deep interaction Cross -modal transformer Semantics -guided detail enhancement

摘要：Referring image segmentation (RIS) aims to predict a segmentation mask for a target specified by a nat-ural language expression. However, the existing methods failed to implement deep interaction between vision and language is needed in RIS, resulting inaccurate segmentation. To address the problem, a cross -modal transformer (CMT) with language queries for referring image segmentation is proposed. First, a cross-modal encoder of CMT is designed for intra-modal and inter-modal interaction, capturing context-aware visual features. Secondly, to generate compact visual-aware language queries, a language-query encoder (LQ) embeds key visual cues into linguistic features. In particular, the combina-tion of the cross-modal encoder and language query encoder realizes the mutual guidance of vision and language. Finally, the cross-modal decoder of CMT is constructed to learn multimodal features of the ref-erent from the context-aware visual features and visual-aware language queries. In addition, a semantics-guided detail enhancement (SDE) module is constructed to fuse the semantic-rich multimodal features with detail-rich low-level visual features, which supplements the spatial details of the predicted segmentation masks. Extensive experiments on four referring image segmentation datasets demonstrate the effectiveness of the proposed method.(c) 2023 Elsevier B.V. All rights reserved.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Cross-modal transformer with language query for referring image segmentation

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Cross-modal transformer with language query for referring image segmentation

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：