Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new...
详细信息
In this paper, we present a novel approach to Handwritten Mathematical Expression recognition (HMER) by leveraging graph-based modeling techniques. We introduce a End-to-end model with an Edge-weighted Graph Attention...
详细信息
Machine recognition of handwritten mathematical expressions (HMEs) is an area that has attracted interest owing to steady progress in handwriting recognition and the rapid emergence of pen- and touch-based devices. HM...
详细信息
We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to r...
详细信息
In this work, we introduce a new architectural component to Neural Network (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated ...
详细信息
In this work, we introduce a new architectural component to Neural Network (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.
We summarize the tasks, protocol, and outcome for the 6th Competition on recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME...
详细信息
We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to r...
详细信息
We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.
We summarize the tasks, protocol, and outcome for the 6th Competition on recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME...
详细信息
We summarize the tasks, protocol, and outcome for the 6th Competition on recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME + TFD 2019, participants chose between two tasks for recognizing handwritten formulas from 1) online stroke data, or 2) images generated from the handwritten strokes. To compare LATEX strings and the labeled directed trees over strokes (label graphs) used in previous CROHMEs, we convert LATEX and stroke-based label graphs to label graphs defined over symbols (symbol-level label graphs, or symLG). More than thirty (33) participants registered for the competition, with nineteen (19) teams submitting results. The strongest formula recognition results were produced by the USTC-iFLYTEK research team, for both stroke-based (81%) and image-based (77%) input. For the new typeset formula detection task, the Samsung R&D Institute Ukraine (Team 2) obtained a very strong F-score (93%). System performance has improved since the last CROHME - still, the competition results suggest that recognition of handwritten formulae remains a difficult structural patternrecognition task.
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a s...
详细信息
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.
We present a technique for using content-based video labeling as a CAPTCHA task. Our CAPTCHAs are generated from YouTube videos, which contain labels (tags) supplied by the person that uploaded the video. They are gra...
详细信息
ISBN:
(纸本)9781605587363
We present a technique for using content-based video labeling as a CAPTCHA task. Our CAPTCHAs are generated from YouTube videos, which contain labels (tags) supplied by the person that uploaded the video. They are graded using a video's tags, as well as tags from related videos. In a user study involving 184 participants, we were able to increase the human success rate on our video CAPTCHA from roughly 70% to 90%, while keeping the success rate of a tag frequency-based attack fixed at around 13%. Through a different parameterization of the challenge generation and grading algorithms, we were able to reduce the success rate of the same attack to 2%, while still increasing the human success rate from 70% to 75%. The usability and security of our video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants (60%) indicated that they found the video CAPTCHAs more enjoyable than traditional CAPTCHAs in which distorted text must be transcribed.
暂无评论