Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high language Model (LM) perplexities. This poses a ch...
详细信息
German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and language Model (LM) p...
详细信息
German is a highly inflectional language, where a large number of words can be generated from the same root. It makes a liberal use of compounding leading to high Out-of-vocabulary (OOV) rates, and poor language Model...
详细信息
In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken language Translation (IWSLT) 2011 is presented. W...
详细信息
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingua...
详细信息
In this paper we study several advanced techniques and models for Arabic-to-English statistical machine translation. We examine how the challenges imposed by this particular language pair and translation direction can...
详细信息
In this paper, we dissect the influence of several target-side dependency-based extensions to hierarchical machine translation, including a dependency language model (LM). We pursue a non-restrictive approach that doe...
详细信息
In this work, we present novel warping algorithms for full 2D pixel-grid deformations for face recognition. Due to high variation in face appearance, face recognition is considered a very difficult task, especially if...
详细信息
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high language Model (LM) perplexities. This poses a ch...
详细信息
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.
Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique gl...
详细信息
Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique global optimum of the objective function from any initialization. For large-scale applications, considerations in the limit of infinite iterations are not sufficient. We show that log-linear training can be a highly ill-conditioned optimization problem, resulting in extremely slow convergence. Conversely, the optimization problem can be preconditioned by feature transformations. Making use of our convergence analysis, we improve our log-linear speech recognition system and achieve a strong reduction of its training time. In addition, we validate our analysis on a continuous handwriting recognition task.
暂无评论