检索结果-内蒙古大学图书馆

Workshop on Statistical Machine Translation

作者： Joern Wuebker Mei-Yuh Hwang Chris Quirk Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Microsoft Corporation Redmond WA USA

ISBN: (纸本)9781622765928

Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.

关键词： reduced mass Model trains Heuristics Tables

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of Neural Networks in combination with n-gram language models

Performance analysis of Neural Networks in combination with ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ilya Oparin Martin Sundermeyer Hermann Ney Jean-Luc Gauvain LIMSI CNRS Spoken Language Processing Group France Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Germany

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

关键词： Artificial neural networks History Analytical models Training data Vocabulary Interpolation

来源：评论

学校读者我要写书评

暂无评论

The RWTH Aachen Machine Translation System for WMT 2012 12

The RWTH Aachen Machine Translation System for WMT 2012

引用

Workshop on Statistical Machine Translation

作者： Matthias Huck Stephan Peitz Markus Freitag Malte Nuhn Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University D-52056 Aachen Germany

ISBN: (纸本)9781622765928

This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the translation task of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT 2012). We participated in the evaluation campaign for the French-English and German-English language pairs in both translation directions. Both hierarchical and phrase-based SMT systems are applied. A number of different techniques are evaluated, including an insertion model, different lexical smoothing methods, a discriminative reordering extension for the hierarchical system, reverse translation, and system combination. By application of these methods we achieve considerable improvements over the respective baseline systems.

关键词： machine translation system machine translation Surface mount technology Hierarchical application methods Translations Translation Translation Process smoothing methods Hierarchical systems

来源：评论

学校读者我要写书评

暂无评论

Skin-color based videos categorization

引用

International Journal of Computer Science Issues 2012年第1 1-3期9卷 473-477页

作者： Khan, Rehanullah Maqsood, Asad Khan, Zeeshan Ishaq, Muhammad Arif, Arsalan Sarhad University of Science and Information Technology Peshawar Pakistan RWTH Aachen Human Language Technology and Pattern Recognition Peshawar Pakistan UET Mardan Peshawar Pakistan

On dedicated websites, people can upload videos and share it with the rest of the world. Currently these videos are categorized manually by the help of the user community. In this paper, we propose a combination of color spaces with the Bayesian network approach for robust detection of skin color followed by an automated video categorization. Experimental results show that our method can achieve satisfactory performance for categorizing videos based on skin color. © 2012 International Journal of Computer Science Issues.

关键词： Bayesian networks

来源：评论

学校读者我要写书评

暂无评论

Mobile music modeling, analysis and recognition

Mobile music modeling, analysis and recognition

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Pavel Golik Boulos Harb Ananya Misra Michael Riley Alex Rudnick Eugene Weinstein Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany Google Inc. New York NY USA School of Informatics and Computing Indiana University Bloomington IN USA

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.

关键词： Training Accuracy Hidden Markov models Music Speech recognition USA Councils

来源：评论

学校读者我要写书评

暂无评论

A convergence analysis of log-linear training and its application to speech recognition

A convergence analysis of log-linear training and its applic...

引用

2011 IEEE Workshop on Automatic Speech recognition and Understanding, ASRU 2011

作者： Wiesler, S. Schluter, R. Ney, H. Human Language Technology and Pattern Recognition RWTH Aachen University of Technology 52056 Aachen Germany

ISBN: (纸本)9781467303675

Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique global optimum of the objective function from any initialization. For large-scale applications, considerations in the limit of infinite iterations are not sufficient. We show that log-linear training can be a highly ill-conditioned optimization problem, resulting in extremely slow convergence. Conversely, the optimization problem can be preconditioned by feature transformations. Making use of our convergence analysis, we improve our log-linear speech recognition system and achieve a strong reduction of its training time. In addition, we validate our analysis on a continuous handwriting recognition task. © 2011 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Lexicon Models for Hierarchical Phrase-Based Machine Translation 8

Lexicon Models for Hierarchical Phrase-Based Machine Transla...

引用

8th International Workshop on Spoken language Translation, IWSLT 2011

作者： Huck, Matthias Mansour, Saab Wiesler, Simon Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and-given the word alignment matrix-relies on pure relative frequencies [1];the IBM model 1 lexicon [2];a regularized version of IBM model 1;a triplet lexicon model variant [3];and a discriminatively trained word lexicon model [4]. We explore source-to-target models with phrase-level as well as sentence-level scoring and target-to-source models with scoring on phrase level only. For the first two types of lexicon models, we compare several scoring variants. All models are used during search, i.e. they are incorporated directly into the log-linear model combination of the decoder. Phrase table smoothing with triplet lexicon models and with discriminative word lexicons are novel contributions. We also propose a new regularization technique for IBM model 1 by means of the Kullback-Leibler divergence with the empirical unigram distribution as regularization term. Experiments are carried out on the large-scale NIST Chinese→English translation task and on the English→French and Arabic→English IWSLT TED tasks. For Chinese→English and English→French, we obtain the best results by using the discriminative word lexicon to smooth our phrase tables. © IWSLT 2011. All rights reserved.

关键词： Computer aided language translation

来源：评论

学校读者我要写书评

暂无评论

The RWTH 2010 Quaero ASR evaluation system for English, French, and German

The RWTH 2010 Quaero ASR evaluation system for English, Fren...

引用

36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011

作者： Sundermeyer, M. Nussbaum-Thom, M. Wiesler, S. Plahl, C. El-Desoky Mousa, A. Hahn, S. Nolden, D. Schlüter, R. Ney, H. Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Germany

ISBN: (纸本)9781457705397

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Modeling Punctuation Prediction as Machine Translation 8

Modeling Punctuation Prediction as Machine Translation

引用

8th International Workshop on Spoken language Translation, IWSLT 2011

作者： Peitz, Stephan Freitag, Markus Mauser, Arne Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

Punctuation prediction is an important task in Spoken language Translation. The output of speech recognition systems does not typically contain punctuation marks. In this paper we analyze different methods for punctuation prediction and show improvements in the quality of the final translation output. In our experiments we compare the different approaches and show improvements of up to 0.8 BLEU points on the IWSLT 2011 English French Speech Translation of Talks task using a translation system to translate from unpunctuated to punctuated text instead of a language model based punctuation prediction method. Furthermore, we do a system combination of the hypotheses of all our different approaches and get an additional improvement of 0.4 points in BLEU. © IWSLT 2011. All rights reserved.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

Combining Translation and language Model Scoring for Domain-Specific Data Filtering 8

Combining Translation and Language Model Scoring for Domain-...

引用

8th International Workshop on Spoken language Translation, IWSLT 2011

作者： Mansour, Saab Wuebker, Joern Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

The increasing popularity of statistical machine translation (SMT) systems is introducing new domains of translation that need to be tackled. As many resources are already available, domain adaptation methods can be applied to utilize these recourses in the most beneficial way for the new domain. We explore adaptation via filtering, using the cross-entropy scores to discard irrelevant sentences. We focus on filtering for two important components of an SMT system, namely the language model (LM) and the translation model (TM). Previous work has already applied LM cross-entropy based scoring for filtering. We argue that LM cross-entropy might be appropriate for LM filtering, but not as much for TM filtering. We develop a novel filtering approach based on a combined TM and LM cross-entropy scores. We experiment with two large-scale translation tasks, the Arabic-to-English and English-to-French IWSLT 2011 TED Talks MT tasks. For LM filtering, we achieve strong perplexity improvements which carry over to the translation quality with improvements up to +0.4% BLEU. For TM filtering, the combined method achieves small but consistent improvements over the standalone methods. As a side effect of adaptation via filtering, the fully fledged SMT system vocabulary size and phrase table size are reduced by a factor of at least 2 while up to +0.6% BLEU improvement is observed. © IWSLT 2011. All rights reserved.

关键词： Computer aided language translation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：