咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Towards pure end-to-end learni... 收藏
arXiv

Towards pure end-to-end learning for recognizing multiple text sequences from an image

作     者:Xu, Zhenlong Zhou, Shuigeng Cheng, Zhanzhan Bai, Fan Niu, Yi Pu, Shiliang 

作者机构:Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai200433 China Hikvision Research Institute China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2019年

核心收录:

主  题:Probability distributions 

摘      要:Here we address a challenging problem: recognizing multiple text sequences from an image by pure end-to-end learning. It is twofold: 1) Multiple text sequences recognition. Each image may contain multiple text sequences of different content, location and orientation, and we try to recognize all the text sequences contained in the image. 2) Pure end-to-end (PEE) learning. We solve the problem in a pure end-to-end learning way where each training image is labeled by only text transcripts of all contained sequences, without any geometric annotations. Most existing works recognize multiple text sequences from an image in a non-end-to-end (NEE) or quasi-end-toend (QEE) way, in which each image is trained with both text transcripts and text locations. Only recently, a PEE method was proposed to recognize text sequences from an image where the text sequence was split to several lines in the image. However, it cannot be directly applied to recognizing multiple text sequences from an image. So in this paper, we propose a pure end-to-end learning method to recognize multiple text sequences from an image. Our method directly learns multiple sequences of probability distribution conditioned on each input image, and outputs multiple text transcripts with a well-designed decoding strategy. To evaluate the proposed method, we constructed several datasets mainly based on an existing public dataset and two real application scenarios. Experimental results show that the proposed method can effectively recognize multiple text sequences from images, and outperforms CTCbased and attention-based baseline methods. Copyright © 2019, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分