Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discri...
详细信息
Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discriminative training can be applied to these models to improve them further. Alternatively, Segmental Conditional Random Fields have the advantage of being discriminative as well as segmental. The novelty of this work is the investigation of Segmental Conditional Random Fields for handwriting recognition. In addition, Multi-Layer Perceptrons and Long Short Term Memory Recurrent Neural Networks are compared for the observations generation in this framework. Various types of features are investigated in the segmental models for handwriting recognition. Furthermore, class-based language model features are proposed to extend this model. Visual features based on moments are extracted at a word level to make the model more robust. Experimental results on English handwriting show a relative reduction of 13.7% in terms of word error rate w.r.t. the baseline system. The proposed system also outperforms the Gaussian Hidden Markov Models trained discriminatively using the minimum phone error criterion by a relative reduction of 6.9% in terms of word error rate.
Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic ...
详细信息
ISBN:
(数字)9781728199665
ISBN:
(纸本)9781728199672
Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.
In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes' theorem. One component is a traditional ungrounded response generatio...
In this paper we present the handwriting recognition systems submitted by the LIMSI to the HTRtS 2014 contest. The systems for both the restricted and unrestricted tracks consisted of combination of several optical mo...
详细信息
In this paper we present the handwriting recognition systems submitted by the LIMSI to the HTRtS 2014 contest. The systems for both the restricted and unrestricted tracks consisted of combination of several optical models. We extracted handcrafted features as well as pixels values with a sliding window. We trained Deep Neural Networks (DNNs) and Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs), which where plugged as the optical model in Hidden Markov Models (HMMs). We propose a novel method to build language models that can cope with hyphenation in the text. The combination was performed from lattices generated from the different systems. We were the only team participating in both tracks and ranked second in each. The final Word Error Rates were 15.0% and 11.0% for the restricted (resp. unrestricted) track. We studied the impact of adding data for optical and language modeling. After the evaluation, we also used the same corpus for the language model as the winning team and obtained comparable results.
In recent years, Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) trained with the Connectionist Temporal Classification (CTC) objective won many international handwriting recognition evaluations. The CTC ...
详细信息
In recent years, Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) trained with the Connectionist Temporal Classification (CTC) objective won many international handwriting recognition evaluations. The CTC algorithm is based on a forward-backward procedure, avoiding the need of a segmentation of the input before training. The network outputs are characters labels, and a special non-character label. On the other hand, in the hybrid Neural Network / Hidden Markov Models (NN/HMM) framework, networks are trained with framewise criteria to predict state labels. In this paper, we show that CTC training is close to forward-backward training of NN/HMMs, and can be extended to more standard HMM topologies. We apply this method to Multi-Layer Perceptrons (MLPs), and investigate the properties of CTC, namely the modeling of character by single labels and the role of the special label.
In recent years, Handwritten Text recognition (HTR) has captured a lot of attention among the researchers of the computer vision community. Current state-of-the-art approaches for offline HTR are based on Convolutiona...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
In recent years, Handwritten Text recognition (HTR) has captured a lot of attention among the researchers of the computer vision community. Current state-of-the-art approaches for offline HTR are based on Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, deep models such as CRNNs, Recur-rent Neural Networks (RNNs) are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. Besides, they usually have millions of parameters which require huge amount of data, and computational resource. Recently, a new class of neural net-work architecture, called Gated Convolutional Neural Networks (Gated-CNN), has demonstrated potentials to complement CRNN methods in modeling. Therefore, in this paper, we present a new architecture for HTR, based on Gated-CNN, with fewer parameters and fewer layers, which is able to outperform the current state-of-the-art architectures for HTR. The experiment validates that the proposed model has statistically significant recognition results, surpassing previous HTR systems by an average of 33% over five important handwritten benchmark datasets. Moreover, the proposed model is able to achieve satisfactory recognition rates even in case of few training data. Finally, its compact architecture requires less computational resources, which can be applied for real-world applications that have hardware limitations, such as robots and smartphones.
In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we addres...
详细信息
We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed model is designed to benefit from the scenario where the trai...
详细信息
ISBN:
(纸本)9781538646595
We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed model is designed to benefit from the scenario where the training data are available in diverse domains as is the case for YouTube speech recognition. The two core components of our model are an ensemble of parallel long short-term memory (LSTM) expert layers for each domain and another LSTM based network which generates state dependent mixture weights for combining expert LSTM states by linear interpolation. The resulting model is a recurrent adaptive mixture model (RADMM) of domain experts. We train our model on 4.4B words from YouTube speech recognition data. We report results on the YouTube speech recognition test set. Compared with a background LSTM model, we obtain up to 12% relative improvement in perplexity and an improvement in word error rate from 12.3% to 12.1% while using a lattice rescoring with strong pruning.
暂无评论