文献详情 >EAES: Effective Augmented Embe... 收藏

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

作者：Khang Nguyen Bui, Doanh C. Truc Trinh Vo, Nguyen D.

作者机构：Vietnam Natl Univ Ho Chi Minh City VNUHCM Univ Informat Technol Ho Chi Minh City 7000 Vietnam Vietnam Natl Univ Ho Chi Minh City VNUHCM Ho Chi Minh City 700000 Vietnam

出版物：《IEEE ACCESS》 (IEEE Access)

年卷期：2022年第10卷

页面：32443-32452页

核心收录：

基　　金：VNUHCM-University of Information Technology's Scientific Research Support Fund

主　　题：Optical character recognition software Visualization Feature extraction Adaptation models Transformers Training Semantics Image captioning text-based image captioning bottom-up top-down grid feature multimodal transformer m4c

摘要：Text-based Image Captioning has been a novel problem since 2020. This topic remains challenging because it requires the model to comprehend not only the visual context but also the scene texts that appear in an image. Therefore, the ways image and scene texts are embedded into the main model for training is crucial. Based on the M4C-Captioner model, this paper proposes the simple but effective EAES embedding module for effectively embedding images and scene texts into the multimodal Transformer layers. In detail, our EAES module contains two significant sub-modules: Objects-augmented and Grid features augmentation. With the Objects-augmented module, we provide the relative geometry feature, representing the relation between objects and between OCR tokens. Furthermore, we extract the grid features for an image with the Grid features augmentation module and combine it with visual objects, which help the model focus on both salient objects and the general context of an image, leading to better performance. We use the TextCaps dataset as the benchmark to prove the effectiveness of our approach on five standard metrics: BLEU4, METEOR, ROUGE-L, SPICE and CIDEr. Without bells and whistles, our method achieves 20.21% on the BLEU4 metric and 85.78% on the CIDEr metric, 1.31% and 4.78% higher, respectively, than the baseline M4C-Captioner method. Furthermore, the results are incredibly competitive with other methods on METEOR, ROUGE-L and SPICE metrics. Source code is available at https://***/UIT-Together/EAES_m4c.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：