The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct predict...
详细信息
Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium ...
详细信息
ISBN:
(纸本)9781622765928
Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual...
详细信息
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is obtaining the draft transcription by dictating the contents to an Automatic Speech recognition system. When both sources (image and speech) are available, a multimodal combination is possible, and an iterative process can be used in order to refine the final hypothesis. In this work, a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustnes...
详细信息
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluen- cies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets.
Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation ...
详细信息
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The ana...
详细信息
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.
A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups parti...
详细信息
A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups participated in these two tracks. The handwritten images for this contest were drawn from an English data set which is currently being considered in the Tran scriptorium project. The goal of this project is to develop innovative, efficient and cost-effective solutions for the transcription of historical handwritten document images, focusing on four languages: English, Spanish, German and Dutch. For the English language, the so-called "Bentham collection" is being considered in Tran scriptorium. It encompasses a large set of manuscripts written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands (Bentham himself and his secretaries) and entails significant variabilities and difficulties regarding the quality of text images and writing styles. Training and test data were provided in the form of carefully segmented line images, along with the corresponding transcripts. The three participants achieved very good results, with transcription word error rates ranging from 15.0% down to 8.6%.
This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modeling. In the ...
详细信息
Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stab...
详细信息
In this paper we describe a novel HMM-based system for off-line handwriting recognition. We adapt successful techniques from the domains of large vocabulary speech recognition and image object recognition: moment-base...
详细信息
ISBN:
(纸本)9780769549993
In this paper we describe a novel HMM-based system for off-line handwriting recognition. We adapt successful techniques from the domains of large vocabulary speech recognition and image object recognition: moment-based image normalization, writer adaptation, discriminative feature extraction and training, and open-vocabulary recognition. We evaluate those methods and examine their cumulative effect on the recognition performance. The final system outperforms current state-of-the-art approaches on two standard evaluation corpora for English and French handwriting.
暂无评论