In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken language Translation (IWSLT) 2011 is presented. W...
详细信息
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingua...
详细信息
In this paper we study several advanced techniques and models for Arabic-to-English statistical machine translation. We examine how the challenges imposed by this particular language pair and translation direction can...
详细信息
In this paper, we dissect the influence of several target-side dependency-based extensions to hierarchical machine translation, including a dependency language model (LM). We pursue a non-restrictive approach that doe...
详细信息
In this work, we present novel warping algorithms for full 2D pixel-grid deformations for face recognition. Due to high variation in face appearance, face recognition is considered a very difficult task, especially if...
详细信息
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high language Model (LM) perplexities. This poses a ch...
详细信息
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.
In this paper, we propose a new method for computing and applying language model look-ahead in a dynamic network decoder, exploiting the sparseness of backing-off n-gram language models. Only partial (sparse) look-ahe...
详细信息
ISBN:
(纸本)9781457705380
In this paper, we propose a new method for computing and applying language model look-ahead in a dynamic network decoder, exploiting the sparseness of backing-off n-gram language models. Only partial (sparse) look-ahead tables are computed, with a size that depends on the number of words that have an n-gram score in the language model for a specific context, rather than a constant, vocabulary dependent size. Since high order backing-off language models are inherently sparse, this mechanism reduces the runtime- and memory effort of computing the look-ahead tables by magnitudes. A modified decoding algorithm is required to apply these sparse LM look-ahead tables efficiently. We show that sparse LM look-ahead is much more efficient than the classical method, and that full n-gram look-ahead becomes favorable over lower order look-ahead even when many distinct LM contexts appear during decoding.
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech r...
详细信息
ISBN:
(纸本)9781457705380
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech recognition systems developed by RWTH for the English, French, and German language which attained the best word error rates for English and German, and competitive results for the French task in the 2010 Quaero evaluation for BC and BN data. At the same time, the RWTH German system used the least amount of training data among all participants. Large reductions in word error rate were obtained by the incorporation of the new Bottleneck Multilayer Perceptron (MLP) features for all three languages. Additional improvements were obtained for the German system by applying a new language modeling technique, decomposing words into sublexical components.
Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique gl...
详细信息
Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique global optimum of the objective function from any initialization. For large-scale applications, considerations in the limit of infinite iterations are not sufficient. We show that log-linear training can be a highly ill-conditioned optimization problem, resulting in extremely slow convergence. Conversely, the optimization problem can be preconditioned by feature transformations. Making use of our convergence analysis, we improve our log-linear speech recognition system and achieve a strong reduction of its training time. In addition, we validate our analysis on a continuous handwriting recognition task.
The use of statically compiled search networks for ASR systems using huge vocabularies and complex language models often becomes challenging in terms of memory requirements. Dynamic network decoders introduce addition...
详细信息
ISBN:
(纸本)9781457705380
The use of statically compiled search networks for ASR systems using huge vocabularies and complex language models often becomes challenging in terms of memory requirements. Dynamic network decoders introduce additional computations in favor of significantly lower memory consumption. In this paper we investigate the properties of two well-known search strategies for dynamic network decoding, namely history conditioned tree search and WFST-based search using dynamic transducer composition. We analyze the impact of the differences in search graph representation, search space structure, and language model look-ahead techniques. Experiments on an LVCSR task illustrate the influence of the compared properties.
暂无评论