A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups parti...
详细信息
A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups participated in these two tracks. The handwritten images for this contest were drawn from an English data set which is currently being considered in the Tran scriptorium project. The goal of this project is to develop innovative, efficient and cost-effective solutions for the transcription of historical handwritten document images, focusing on four languages: English, Spanish, German and Dutch. For the English language, the so-called "Bentham collection" is being considered in Tran scriptorium. It encompasses a large set of manuscripts written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands (Bentham himself and his secretaries) and entails significant variabilities and difficulties regarding the quality of text images and writing styles. Training and test data were provided in the form of carefully segmented line images, along with the corresponding transcripts. The three participants achieved very good results, with transcription word error rates ranging from 15.0% down to 8.6%.
In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network. This work is motivated by the fact that the commonly used binary system vot...
详细信息
We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences,...
详细信息
We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translati...
详细信息
Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stab...
详细信息
The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can he...
详细信息
Search is a central component of any statistical machine translation system. We describe the search for phrase-based SMT in detail and show its importance for achieving good translation quality. We introduce an explic...
详细信息
This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the translation task of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT 2012). We ...
详细信息
ISBN:
(纸本)9781622765928
This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the translation task of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT 2012). We participated in the evaluation campaign for the French-English and German-English language pairs in both translation directions. Both hierarchical and phrase-based SMT systems are applied. A number of different techniques are evaluated, including an insertion model, different lexical smoothing methods, a discriminative reordering extension for the hierarchical system, reverse translation, and system combination. By application of these methods we achieve considerable improvements over the respective baseline systems.
Data processing is an important step in various natural language processing tasks. As the commonly used datasets in named entity recognition contain only a limited number of samples, it is important to obtain addition...
详细信息
Compared to sentence-level systems, document-level neural machine translation (NMT) models produce a more consistent output across a document and are able to better resolve ambiguities within the input. There are many...
详细信息
暂无评论