Recent studies have demonstrated the power of neural networks for different fields of artificial intelligence. In most fields, such as machine translation or speech recognition, neural networks outperform previously u...
详细信息
Tokenization or segmentation is a wide concept that covers simple processes such as separating punctuation from words, or more sophisticated processes such as applying morphological knowledge. Neural Machine Translati...
详细信息
Deep Neural Networks (DNNs) have achieved state-of-the-art accuracy performance in many tasks. However, recent works have pointed out that the outputs provided by these models are not well-calibrated, seriously limiti...
详细信息
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual...
详细信息
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is obtaining the draft transcription by dictating the contents to an Automatic Speech recognition system. When both sources (image and speech) are available, a multimodal combination is possible, and an iterative process can be used in order to refine the final hypothesis. In this work, a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.
A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups parti...
详细信息
A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups participated in these two tracks. The handwritten images for this contest were drawn from an English data set which is currently being considered in the Tran scriptorium project. The goal of this project is to develop innovative, efficient and cost-effective solutions for the transcription of historical handwritten document images, focusing on four languages: English, Spanish, German and Dutch. For the English language, the so-called "Bentham collection" is being considered in Tran scriptorium. It encompasses a large set of manuscripts written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands (Bentham himself and his secretaries) and entails significant variabilities and difficulties regarding the quality of text images and writing styles. Training and test data were provided in the form of carefully segmented line images, along with the corresponding transcripts. The three participants achieved very good results, with transcription word error rates ranging from 15.0% down to 8.6%.
We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences,...
详细信息
We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translati...
详细信息
Neural machine translation has meant a revolution of the field. Nevertheless, post-editing the outputs of the system is mandatory for tasks requiring high translation quality. Post-editing offers a unique opportunity ...
详细信息
Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revisio...
详细信息
The objective of Native language Identification is to determine the native language of the author of a text that he or she wrote in another language. By contrast, language Variety Identification aims at classifying te...
详细信息
The objective of Native language Identification is to determine the native language of the author of a text that he or she wrote in another language. By contrast, language Variety Identification aims at classifying texts representing different varieties of a single language. We postulate that both tasks may be reduced to a single objective, which is to identify the language variety of the text. We design a general approach that combines string kernels and word embeddings, which capture different characteristics of texts. The results of our experiments show that the approach achieves excellent results on both tasks, without any task-specific adaptations.
暂无评论