In this paper, we present two improvements to the beam search approach for solving homophonic substitution ciphers presented in Nuhn et al. (2013): An improved rest cost estimation together with an optimized strategy ...
详细信息
In this work, we tackle the problem of language and translation models domainadaptation without explicit bilingual indomain training data. In such a scenario, the only information about the domain can be induced from ...
详细信息
Text line segmentation is the process by which text lines in a document image are localized and extracted. It is an important step in off-line Handwritten Text recognition (HTR) given that the input of these systems i...
Text line segmentation is the process by which text lines in a document image are localized and extracted. It is an important step in off-line Handwritten Text recognition (HTR) given that the input of these systems is the line image of the text to be transcribed. A myriad of solutions to the text line segmentation problem have been proposed in the literature. Although these solutions may differ greatly on what is actually applied to perform the segmentation, they can be classified by the level of precision and detail in the final extracted lines. In this paper we study the influence and real needs of different levels of precision and detail in the segmentation solutions in a real HTR task. We test three technics of text line segmentation whose output range from a simple rectangle for each line to a perfect fitted polygon surrounding the detected lines. Experiments have been carried out with a historical collection and results show that good HTR accuracy can be obtained with simple extraction algorithms.
This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the German!English translation task of the ACL 2014 Eighth Workshop on Statistical Machine Translation (WM...
详细信息
We present a novel toolkit that implements the long short-term memory (LSTM) neural network concept for language modeling. The main goal is to provide a software which is easy to use, and which allows fast training of...
详细信息
This paper investigates the application of vector space models (VSMs) to the standard phrase-based machine translation pipeline. VSMs are models based on continuous word representations embedded in a vector space. We ...
详细信息
This work presents two different translation models using recurrent neural networks. The first one is a word-based approach using word alignments. Second, we present phrase-based translation models that are more consi...
详细信息
Manual analysis and decryption of enciphered documents is a tedious and error prone work. Often-even after spending large amounts of time on a particular cipher-no decipherment can be found. Automating the decryption ...
详细信息
This paper describes the RWTH system for large vocabulary Arabic handwriting recognition. The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding....
详细信息
This paper describes the RWTH system for large vocabulary Arabic handwriting recognition. The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding. The feature extraction is based on Recurrent Neural Networks (RNNs) which estimate the posterior distribution over the character labels for each observation. Discriminative training using the Minimum Phone Error (MPE) criterion is used to train the HMMs. The recognition is done with the help of n-gram language Models (LMs) trained using in-domain text data. Unsupervised writer adaptation is also performed using the Constrained Maximum Likelihood Linear Regression (CMLLR) feature adaptation. The RWTH Arabic handwriting recognition system gave competitive results in previous handwriting recognition competitions. The used techniques allows to improve the performance of the system participating in the OpenHaRT 2013 evaluation.
We present a method for training an off-line handwriting recognition system in an unsupervised manner. For an isolated word recognition task, we are able to bootstrap the system without any annotated data. We then ret...
详细信息
We present a method for training an off-line handwriting recognition system in an unsupervised manner. For an isolated word recognition task, we are able to bootstrap the system without any annotated data. We then retrain the system using the best hypothesis from a previous recognition pass in an iterative fashion. Our approach relies only on a prior language model and does not depend on an explicit segmentation of words into characters. The resulting system shows a promising performance on a standard dataset in comparison to a system trained in a supervised fashion for the same amount of training data.
暂无评论