检索结果-内蒙古大学图书馆

Dual variational network for unsupervised cross-modal hashing

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2025年第5-6期16卷 3729-3746页

作者： Deng, Xuran Liu, Zhihang Li, Pandeng Univ Sci & Technol China Hefei 230026 Peoples R China

Cross-modal retrieval is a natural and highly valuable need in the current multimedia content explosion era. This paper addresses the problem of unsupervised cross-modal hashing retrieval which enables efficient retrieval across different modalities (e.g., image-text) without class labels. Most previous methods try to align visual and text binary representations in the joint Hamming space, by independently learning encoding functions for respective modality domains. However, since the paired training data describes the same object from different modalities, one modality data exactly plays a complementary role in learning encoding function for the other modality data, which has been less explored. This paper presents a novel cross-modal retrieval framework, called deep dual variational hashing (DDVH), by exploring dual variational mappings between modalities to bridge the inherent modality gap. Specifically, DDVH consists of two sub-modules, which are visual variational mapping (VVM) and textual variational mapping (TVM). VVM generates semantic-preserved binary codes for visual modality samples via the Gaussian latent embeddings, and TVM learns visual-guided binary codes for the corresponding text modality data. These two sub-modules can be jointly optimized under the cyclic consistency mechanism. Such a dual variational mapping strategy enables DDVH to generate unified binary representations for two modalities by visual-semantic interaction in the Hamming space. Comprehensive experiments on three benchmarks demonstrate that our proposed DDVH approach yields significant improvements compared to the state-of-the-art methods.

关键词： Cross-modal retrieval Unsupervised hashing Variational mapping Dual learning visual-semantic interaction

来源：评论

学校读者我要写书评

暂无评论

Learning consensus-aware semantic knowledge for remote sensing image captioning

引用

PATTERN RECOGNITION 2024年 145卷

作者： Li, Yunpeng Zhang, Xiangrong Cheng, Xina Tang, Xu Jiao, Licheng Xidian Univ Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Shaanxi Peoples R China

Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.

关键词： Cross-modal understanding visual-semantic interaction Remote sensing image captioning Graph convolutional network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：