检索结果-内蒙古大学图书馆

41st International co.ference on Machine Learning, ICML 2024

作者： Peng, Dezhi Yang, Zhenhua Zhang, Jiaxin Liu, Chongyu Shi, Yongxin Ding, Kai Guo, Fengjun Jin, Lianwen South China University of Technology China INTSIG-SCUT Joint Lab of Document Image Analysis and Recognition China INTSIG Information Co. Ltd. Singapore

Existing optical character recognition (OCR) methods rely on task-specific designs with divergent paradigms, architectures, and training strategies, which significantly increases the co.plexity of research and maintenance and hinders the fast deployment in applications. To this end, we propose UPOCR, a simple-yet-effective generalist model for Unified Pixel-level OCR interface. Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as image-to-image transformation and the architecture as a vision Transformer (ViT)-based enco.er-deco.er with learnable task prompts. The prompts push the general feature representations extracted by the enco.er towards task-specific spaces, endowing the deco.er with task awareness. Moreover, the model training is uniformly aimed at minimizing the discrepancy between the predicted and ground-truth images regardless of the inhomogeneity among tasks. Experiments are co.ducted on three pixel-level OCR tasks including text removal, text segmentation, and tampered text detection. Without bells and whistles, the experimental results showcase that the proposed method can simultaneously achieve state-of-the-art performance on three tasks with a unified single model, which provides valuable strategies and insights for future research on generalist OCR models. co.e is available at https://***/shannanyinxiang/UPOCR. co.yright 2024 by the author(s)

关键词： Optical character recognition

来源：评论

学校读者我要写书评

暂无评论

UPOCR: towards unified pixel-level OCR interface 24

UPOCR: towards unified pixel-level OCR interface

引用

Proceedings of the 41st International co.ference on Machine Learning

作者： Dezhi Peng Zhenhua Yang Jiaxin Zhang Chongyu Liu Yongxin Shi Kai Ding Fengjun Guo Lianwen Jin South China University of Technology and INTSIG-SCUT Joint Lab of Document Image Analysis and Recognition NTSIG Information Co. Ltd. and INTSIG-SCUT Joint Lab of Document Image Analysis and Recognition

关键词：

来源：评论

学校读者我要写书评

暂无评论

UPOCR: Towards Unified Pixel-Level OCR Interface

arXiv

引用

arXiv 2023年

作者： Peng, Dezhi Yang, Zhenhua Zhang, Jiaxin Liu, Chongyu Shi, Yongxin Ding, Kai Guo, Fengjun Jin, Lianwen South China University of Technology China INTSIG Information Co. Ltd. INTSIG-SCUT Joint Lab of Document Image Analysis and Recognition

In recent years, the optical character recognition (OCR) field has been proliferating with plentiful cutting-edge approaches for a wide spectrum of tasks. However, these approaches are task-specifically designed with divergent paradigms, architectures, and training strategies, which significantly increases the co.plexity of research and maintenance and hinders the fast deployment in applications. To this end, we propose UPOCR, a simple-yet-effective generalist model for Unified Pixel-level OCR interface. Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as image-to-image transformation and the architecture as a vision Transformer (ViT)-based enco.er-deco.er. Learnable task prompts are introduced to push the general feature representations extracted by the enco.er toward task-specific spaces, endowing the deco.er with task awareness. Moreover, the model training is uniformly aimed at minimizing the discrepancy between the generated and ground-truth images regardless of the inhomogeneity among tasks. Experiments are co.ducted on three pixel-level OCR tasks including text removal, text segmentation, and tampered text detection. Without bells and whistles, the experimental results showcase that the proposed method can simultaneously achieve state-of-the-art performance on three tasks with a unified single model, which provides valuable strategies and insights for future research on generalist OCR models. co.e will be publicly available. co.yright © 2023, The Authors. All rights reserved.

关键词： Signal enco.ing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：