The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct predict...
详细信息
Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium ...
详细信息
ISBN:
(纸本)9781622765928
Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integ...
详细信息
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustnes...
详细信息
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluen- cies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets.
In this paper, we present a unified search strategy for open vocabulary handwriting recognition using weighted finite state transducers. Additionally to a standard word-level language model we introduce a separate n-g...
详细信息
ISBN:
(纸本)9781479903573
In this paper, we present a unified search strategy for open vocabulary handwriting recognition using weighted finite state transducers. Additionally to a standard word-level language model we introduce a separate n-gram character-level language model for out-of-vocabulary word detection and recognition. The probabilities assigned by those two models are combined into one Bayes decision rule. We evaluate the proposed method on the IAM database of English handwriting. An improvement from 22.2% word error rate to 17.3% is achieved comparing to the closed-vocabulary scenario and the best published result.
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech r...
详细信息
ISBN:
(纸本)9781457705380
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech recognition systems developed by RWTH for the English, French, and German language which attained the best word error rates for English and German, and competitive results for the French task in the 2010 Quaero evaluation for BC and BN data. At the same time, the RWTH German system used the least amount of training data among all participants. Large reductions in word error rate were obtained by the incorporation of the new Bottleneck Multilayer Perceptron (MLP) features for all three languages. Additional improvements were obtained for the German system by applying a new language modeling technique, decomposing words into sublexical components.
In this paper we describe a novel HMM-based system for off-line handwriting recognition. We adapt successful techniques from the domains of large vocabulary speech recognition and image object recognition: moment-base...
详细信息
ISBN:
(纸本)9780769549993
In this paper we describe a novel HMM-based system for off-line handwriting recognition. We adapt successful techniques from the domains of large vocabulary speech recognition and image object recognition: moment-based image normalization, writer adaptation, discriminative feature extraction and training, and open-vocabulary recognition. We evaluate those methods and examine their cumulative effect on the recognition performance. The final system outperforms current state-of-the-art approaches on two standard evaluation corpora for English and French handwriting.
Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation ...
详细信息
This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modeling. In the ...
详细信息
Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stab...
详细信息
暂无评论