文献详情 >Exploring Fine-Grained Image-T... 收藏

Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

作者：Lei, Sen Xiao, Xinyu Zhang, Tianlin Li, Heng-Chao Shi, Zhenwei Zhu, Qing

作者机构：Southwest Jiaotong Univ Sch Informat Sci & Technol Chengdu 611756 Peoples R China Co Ant Grp Hangzhou 688688 Peoples R China AVIC Luoyang Inst Electroopt Equipment Luoyang 471000 Peoples R China Beihang Univ Image Proc Ctr Sch Astronaut State Key Lab Virtual Real Technol & Syst Beijing 100191 Peoples R China Southwest Jiaotong Univ Fac Geosci & Engn Chengdu 611756 Peoples R China

出版物：《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》 (IEEE Trans Geosci Remote Sens)

年卷期：2025年第63卷

核心收录：

学科分类：0808[工学-电气工程] 1002[医学-临床医学] 08[工学] 0708[理学-地球物理学] 0816[工学-测绘科学与技术]

基　　金：National Natural Science Foundation of China [42230102, 62271418, 62125102, U24B20177] Natural Science Foundation of Sichuan Province [2023NSFSC0030] Fellowship of China National Postdoctoral Program for Innovative Talents [BX20240291] Ant Group Research Fund

主　　题：Remote sensing Image segmentation Visualization Feature extraction Linguistics Transformers Electronic mail Adaptation models Object recognition Grounding Fine-grained image-text alignment referring image segmentation remote sensing images

摘要：Given a language expression, referring remote sensing image segmentation (RRSIS) aims to identify ground objects and assign pixelwise labels within the imagery. One of the key challenges for this task is to capture discriminative multimodal features via image-text alignment. However, the existing RRSIS methods use one vanilla and coarse alignment, where the language expression is directly extracted to be fused with the visual features. In this article, we argue that a fine-grained image-text alignment can improve the extraction of multimodal information. To this point, we propose a new RRSIS method to fully exploit the visual and linguistic representations. Specifically, the original referring expression is regarded as context text, which is further decoupled into the ground object and spatial position texts. The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts, obtaining better discriminative multimodal representation. Meanwhile, to handle the various scales of ground objects in remote sensing, we introduce a text-aware multiscale enhancement module (TMEM) to adaptively perform cross-scale fusion and intersections. We evaluate the effectiveness of the proposed method on two public referring remote sensing datasets including RefSegRS and RRSIS-D, and our method obtains superior performance over several state-of-the-art methods. The code will be publicly available at https://***/Shaosifan/FIANet.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：