检索结果-内蒙古大学图书馆

arXiv 2022年

作者： Gao, Yingbo Herold, Christian Yang, Zijian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation. Copyright © 2022, The Authors. All rights reserved.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

Improving Long Context Document-Level Machine Translation

arXiv

引用

arXiv 2023年

作者： Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many works have been published on the topic of document-level NMT, but most restrict the system to only local context, typically including just the one or two preceding sentences as additional information. This might be enough to resolve some ambiguous inputs, but it is probably not sufficient to capture some document-level information like the topic or style of a conversation. When increasing the context size beyond just the local context, there are two challenges: (i) the memory usage increases exponentially (ii) the translation performance starts to degrade. We argue that the widely-used attention mechanism is responsible for both issues. Therefore, we propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption. For evaluation, we utilize targeted test sets in combination with novel evaluation techniques to analyze the translations in regards to specific discourse-related phenomena. We find that our approach is a good compromise between sentence-level NMT vs attending to the full context, especially in low resource scenarios. Copyright © 2023, The Authors. All rights reserved.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

On the alignment problem in multi-head attention-based neural machine translation

arXiv

引用

arXiv 2018年

作者： Alkhouli, Tamer Bretschner, Gabriel Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionaryguided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to 3:8% BLEU improvement when using the dictionary, in comparison to 2:4% BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of 1:8 without loss in translation performance. We carry out experiments on the shared WMT 2016 English!Romanian news task and the BOLT Chinese!English discussion forum task. Copyright © 2018, The Authors. All rights reserved.

关键词： Alignment

来源：评论

学校读者我要写书评

暂无评论

Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic

Morpheme-based feature-rich language models using Deep Neura...

引用

2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013

作者： El-Desoky Mousa, Amr Kuo, Hong-Kwang Jeff Mangu, Lidia Soltau, Hagen Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University 52056 Aachen Germany IBM T. J. Watson Research Center Yorktown Heights NY 10598 United States

ISBN: (纸本)9781479903566

Egyptian Arabic (EA) is a colloquial version of Arabic. It is a low-resource morphologically rich language that causes problems in Large Vocabulary Continuous Speech recognition (LVCSR). Building LMs on morpheme level is considered a better choice to achieve higher lexical coverage and better LM probabilities. Another approach is to utilize information from additional features such as morphological tags. On the other hand, LMs based on Neural Networks (NNs) with a single hidden layer have shown superiority over the conventional n-gram LMs. Recently, Deep Neural Networks (DNNs) with multiple hidden layers have achieved better performance in various tasks. In this paper, we explore the use of feature-rich DNN-LMs, where the inputs to the network are a mixture of words and morphemes along with their features. Significant Word Error Rate (WER) reductions are achieved compared to the traditional word-based LMs. © 2013 IEEE.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Exploring Kernel functions in the softmax layer for contextual word classification

arXiv

引用

arXiv 2019年

作者： Gao, Yingbo Herold, Christian Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of Neural Networks in combination with n-gram language models

Performance analysis of Neural Networks in combination with ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ilya Oparin Martin Sundermeyer Hermann Ney Jean-Luc Gauvain LIMSI CNRS Spoken Language Processing Group France Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Germany

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

关键词： Artificial neural networks History Analytical models Training data Vocabulary Interpolation

来源：评论

学校读者我要写书评

暂无评论

Improvement of Context Dependent Modeling for Arabic Handwriting recognition

Improvement of Context Dependent Modeling for Arabic Handwri...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Mahdi Hamdani Patrick Doetsch Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

This paper proposes the improvement of context dependent modeling for Arabic handwriting recognition. Since the number of parameters in context dependent models is huge, CART trees are used for state tying. This work is based on a new set of questions for the CART tree construction based on a "lossy mapping" categorization of the Arabic shapes. The used system is a combination of Hidden Markov Models and Recurrent Neural Networks using the hybrid approach. A comparison between a Neural network trained using the baseline labels and another one based on the CART tree labels is done. The experimental results show that the use of the CART labels for the Neural Network training beneficial. The lossy mapping based CART tree performed better than the baseline system. An absolute improvement of 2.9% in terms of Word Error Rate is performed on the test set of the Open Hart database.

关键词： Hidden Markov models Context Handwriting recognition Context modeling Training Shape Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Powerful extensions to CRFS for grapheme to phoneme conversion

Powerful extensions to CRFS for grapheme to phoneme conversi...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Stefan Hahn Patrick Lehnen Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

ISBN: (纸本)9781457705380

Conditional Random Fields (CRFs) have proven to per form well on natural language processing tasks like name transliteration, concept tagging or grapheme-to-phoneme (g2p) conversion. The aim of this paper is to propose some extension to the state-of-the-art CRF systems for these tasks. Since the number of features can grow rapidly, a method for features selection is very helpful to boost performance. A combination of L1 and L2 regularization (elastic net) has been adopted and implemented within the Rprop optimization algorithm. Usually, dependencies on the target side are limited to bigram dependencies since the computational complexity grows exponentially with the history length. We present a modified CRF decoding where a conventional language model on target side is integrated into the CRF search process. Thus, larger contexts can be taken into account. Besides these two main parts, the already published margin-extension to the CRF training criterion has been adopted.

关键词： Training Context Context modeling Optimization Natural language processing Software algorithms Interpolation

来源：评论

学校读者我要写书评

暂无评论

A comparative analysis of dynamic network decoding

A comparative analysis of dynamic network decoding

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： David Rybach Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

ISBN: (纸本)9781457705380

The use of statically compiled search networks for ASR systems using huge vocabularies and complex language models often becomes challenging in terms of memory requirements. Dynamic network decoders introduce additional computations in favor of significantly lower memory consumption. In this paper we investigate the properties of two well-known search strategies for dynamic network decoding, namely history conditioned tree search and WFST-based search using dynamic transducer composition. We analyze the impact of the differences in search graph representation, search space structure, and language model look-ahead techniques. Experiments on an LVCSR task illustrate the influence of the compared properties.

关键词： Decoding Transducers Hidden Markov models History Speech recognition Context Vocabulary

来源：评论

学校读者我要写书评

暂无评论

Comparison and combination of different CRBE based MLP features for LVCSR

Comparison and combination of different CRBE based MLP featu...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zoltán Tüske Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

Multi Layer Perceptron (MLP) features extracted from different types of critical band energies (CRBE) - derived from MFCC, GT, and PLP pipeline - are compared on French broadcast news and conversational speech recognition task. Though the MLP structure is kept fixed, ROVER combination of different CRBE based systems leads to 4% relative improvement. Furthermore, aiming at the combination of state-of-the-art features based on various signal analysis methods into one single stream, posterior feature space based combination technique is proposed. The speaker normalized features originated from different CRBEs are merged after additional MLP training by Dempster-Shafer rule. The performance of these posterior features unifying the different CRBE based features is superior to the best single CRBE based posterior features by 6% relative. Further results reveal that the concatenated cepstral and unified posterior features perform nearly as well as the ROVER combination of the different CRBE based systems.

关键词： Feature extraction Mel frequency cepstral coefficient Speech Training Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：