检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Shah, Ayush Kumar Amador, Bryan Manrique Dey, Abhisek Creekmore, Ming Ocampo, Blake Denmark, Scott Zanibbi, Richard Document and Pattern Recognition Lab Rochester Institute of Technology NY United States Department of Chemistry University of Illinois Urbana-ChampaignIL United States

Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character recognition (OCR), or vectorization. We use the parser to annotate raster images and then train a new multi-task neural network for recognizing molecules in raster images. We evaluate our parsers using SMILES and standard benchmarks, along with a novel evaluation protocol comparing molecular graphs directly that supports automatic error compilation and reveals errors missed by SMILES-based evaluation. On the synthetic USPTO benchmark, our born-digital parser obtains a recognition rate of 98.4% (1% higher than previous models) and our relatively simple neural parser for raster images obtains a rate of 85% using less training data than existing neural approaches (thousands vs. millions of molecules). © 2023, CC BY-SA.

关键词： Rasterization

来源：评论

学校读者我要写书评

暂无评论

Local and Global Graph Modeling with Edge-weighted Graph Attention Network for Handwritten Mathematical Expression recognition

arXiv

引用

arXiv 2024年

作者： Xie, Yejing Zanibbi, Richard Mouchère, Harold Nantes Universite Ecole Centrale Nantes CNRS LS2N UMR 6004 NantesF-44300 France Document and Pattern Recognition Lab Rochester Institute of Technology RochesterNY United States

In this paper, we present a novel approach to Handwritten Mathematical Expression recognition (HMER) by leveraging graph-based modeling techniques. We introduce a End-to-end model with an Edge-weighted Graph Attention Mechanism (EGAT), designed to perform simultaneous node and edge classification. This model effectively integrates node and edge features, facilitating the prediction of symbol classes and their relationships within mathematical expressions. Additionally, we propose a stroke-level Graph Modeling method for both local (LGM) and global (GGM) information, which applies an end-to-end model to Online HMER tasks, transforming the recognition problem into node and edge classification tasks in graph structure. By capturing both local and global graph features, our method ensures comprehensive understanding of the expression structure. Through the combination of these components, our system demonstrates superior performance in symbol detection, relation classification, and expression-level recognition. © 2024, CC BY.

关键词： Network theory (graphs)

来源：评论

学校读者我要写书评

暂无评论

Survey on Handwritten Mathematical Expression recognition in the Last Decade: Grammar- and Graph-Based Parsing, and the Rise of Encoder-Decoder Models and Graph Neural Networks

SSRN

引用

SSRN 2023年

作者： Truong, Thanh-Nghia Nguyen, Cuong Tuan Zanibbi, Richard Mouchère, Harold Nakagawa, Masaki University Research Administrator Center Department of Computer Science Rochester Institute of Technology NY United States Tokyo University of Agriculture and Technology Tokyo Japan FPT University HCMC Campus Viet Nam Document and Pattern Recognition Lab Rochester Institute of Technology New York United States LS2N - UMR CNRS 6004 University of Nantes Nantes France

Machine recognition of handwritten mathematical expressions (HMEs) is an area that has attracted interest owing to steady progress in handwriting recognition and the rapid emergence of pen- and touch-based devices. HME recognition can be considered as an extension of text recognition with a two-dimensional structure of characters and symbols. This survey examines the solutions proposed for recognizing online and offline HMEs over the last decade. A common strategy for recognizing HMEs involves splitting the recognition process into four key tasks: symbol segmentation, symbol classification, spatial relation classification, and structural analysis. Recently, encoder–decoder models using Deep Neural Networks (DNNs) have become popular and can perform all key tasks simultaneously and achieve high performance. Furthermore, evaluation methods and benchmark datasets were used to explore the implicit dependencies among key tasks. Finally, we point out some limitations of the current systems and present a future outlook on the DNN approach and the utilization of context to support HMEs parsing. © 2023, The Authors. All rights reserved.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

LPGA: Line-of-sight parsing with graph-based attention for math formula recognition 15

LPGA: Line-of-sight parsing with graph-based attention for m...

引用

15th IAPR International Conference on document Analysis and recognition, ICDAR 2019

作者： Mahdavi, Mahshad Condon, Michael Davila, Kenny Zanibbi, Richard Document and Pattern Recognition Lab Rochester Institute of Technology RochesterNY United States Department of CSE University at Buffalo BuffaloNY United States

ISBN: (纸本)9781728128610

We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available. © 2019 IEEE.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Trainable Spectrally Initializable Matrix Transformations in...

引用

International Conference on pattern recognition

作者： Michele Alberti Angela Botros Narayan Schutz Rolf Ingold Marcus Liwicki Mathias Seuret Document Image and Voice Analysis Group (DIVA) University of Fribourg Switzerland V7 Ltd London United Kingdom ARTORG Center for Biomedical Engineering Research University of Bern Switzerland EISLAB Machine Learning Luleå University of Technology Sweden Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Germany

In this work, we introduce a new architectural component to Neural Network (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

关键词： Program processors Discrete Fourier transforms Transforms documentation pattern recognition Discrete cosine transforms Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection 15

ICDAR 2019 CROHME + TFD: Competition on recognition of handw...

引用

15th IAPR International Conference on document Analysis and recognition, ICDAR 2019

作者： Mahdavi, Mahshad Zanibbi, Richard Mouchere, Harold Viard-Gaudin, Christian Garain, Utpal Document and Pattern Recognition Lab Rochester Institute of Technology RochesterNY United States Christian Viard-Gaudin LS2N-UMR CNRS 6004 University of Nantes Nantes France Computer Vision and Pattern Recognition Unit Centre for Artif. Intel. and Mach. Leaning Indian Statistical Institute Kolkata India

ISBN: (纸本)9781728128610

We summarize the tasks, protocol, and outcome for the 6th Competition on recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME + TFD 2019, participants chose between two tasks for recognizing handwritten formulas from 1) online stroke data, or 2) images generated from the handwritten strokes. To compare LATEX strings and the labeled directed trees over strokes (label graphs) used in previous CROHMEs, we convert LATEX and stroke-based label graphs to label graphs defined over symbols (symbol-level label graphs, or symLG). More than thirty (33) participants registered for the competition, with nineteen (19) teams submitting results. The strongest formula recognition results were produced by the USTC-iFLYTEK research team, for both stroke-based (81%) and image-based (77%) input. For the new typeset formula detection task, the Samsung R&D Institute Ukraine (Team 2) obtained a very strong F-score (93%). System performance has improved since the last CROHME-still, the competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task. © 2019 IEEE.

关键词： Trees (mathematics)

来源：评论

学校读者我要写书评

暂无评论

LPGA: Line-of-Sight Parsing with Graph-Based Attention for Math Formula recognition

LPGA: Line-of-Sight Parsing with Graph-Based Attention for M...

引用

International Conference on document Analysis and recognition

作者： Mahshad Mahdavi Michael Condon Kenny Davila Richard Zanibbi Document and Pattern Recognition Lab Rochester Institute of Technology Rochester NY USA Department of CSE University at Buffalo Buffalo NY USA

关键词： Visualization Feature extraction Grammar Image recognition Vegetation Image segmentation pattern recognition

来源：评论

学校读者我要写书评

暂无评论

ICDAR 2019 CROHME + TFD: Competition on recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handw...

引用

International Conference on document Analysis and recognition

作者： Mahshad Mahdavi Richard Zanibbi Harold Mouchere Christian Viard-Gaudin Utpal Garain Document and Pattern Recognition Lab Rochester Institute of Technology Rochester NY USA University of Nantes Nantes France LS2N - UMR CNRS 6004 University of Nantes Nantes France Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

We summarize the tasks, protocol, and outcome for the 6th Competition on recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME + TFD 2019, participants chose between two tasks for recognizing handwritten formulas from 1) online stroke data, or 2) images generated from the handwritten strokes. To compare LATEX strings and the labeled directed trees over strokes (label graphs) used in previous CROHMEs, we convert LATEX and stroke-based label graphs to label graphs defined over symbols (symbol-level label graphs, or symLG). More than thirty (33) participants registered for the competition, with nineteen (19) teams submitting results. The strongest formula recognition results were produced by the USTC-iFLYTEK research team, for both stroke-based (81%) and image-based (77%) input. For the new typeset formula detection task, the Samsung R&D Institute Ukraine (Team 2) obtained a very strong F-score (93%). System performance has improved since the last CROHME - still, the competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task.

关键词： Task analysis Handwriting recognition Tools Measurement Layout Image recognition

来源：评论

学校读者我要写书评

暂无评论

labeling, Cutting, Grouping: An Efficient Text Line Segmentation Method for Medieval Manuscripts

Labeling, Cutting, Grouping: An Efficient Text Line Segmenta...

引用

International Conference on document Analysis and recognition

作者： Michele Alberti Lars Vögtlin Vinaychandran Pondenkandath Mathias Seuret Rolf Ingold Marcus Liwicki Document Image and Voice Analysis Group (DIVA) University of Fribourg Switzerland University of Fribourg Fribourg Switzerland Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany Machine Learning Group Luleå University of Technology Sweden

This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.

关键词： Semantics Task analysis Image segmentation Degradation Layout Image color analysis Noise reduction

来源：评论

学校读者我要写书评

暂无评论

Balancing usability and security in a video CAPTCHA 09

Balancing usability and security in a video CAPTCHA

引用

5th Symposium On Usable Privacy and Security, SOUPS 2009

作者： Kluever, Kurt Alfred Zanibbi, Richard Google Inc. 76 Ninth Ave. New York NY 10011 United States Document and Pattern Recognition Lab. Department of Computer Science Rochester Institute of Technology Rochester NY 14623 United States

ISBN: (纸本)9781605587363

We present a technique for using content-based video labeling as a CAPTCHA task. Our CAPTCHAs are generated from YouTube videos, which contain labels (tags) supplied by the person that uploaded the video. They are graded using a video's tags, as well as tags from related videos. In a user study involving 184 participants, we were able to increase the human success rate on our video CAPTCHA from roughly 70% to 90%, while keeping the success rate of a tag frequency-based attack fixed at around 13%. Through a different parameterization of the challenge generation and grading algorithms, we were able to reduce the success rate of the same attack to 2%, while still increasing the human success rate from 70% to 75%. The usability and security of our video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants (60%) indicated that they found the video CAPTCHAs more enjoyable than traditional CAPTCHAs in which distorted text must be transcribed.

关键词： Grading

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：