检索结果-内蒙古大学图书馆

Improving language Model Integration for Neural Machine Translation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Herold, Christian Gao, Yingbo Zeineldeen, Mohammad Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can help improve translation quality. However, there has always been the assumption that the translation model also learns an implicit target-side language model during training, which interferes with the external language model at decoding time. Recently, some works on automatic speech recognition have demonstrated that, if the implicit language model is neutralized in decoding, further improvements can be gained when integrating an external language model. In this work, we transfer this concept to the task of machine translation and compare with the most prominent way of including additional monolingual data - namely back-translation. We find that accounting for the implicit language model significantly boosts the performance of language model fusion, although this approach is still outperformed by back-translation. Copyright © 2023, The Authors. All rights reserved.

关键词： Neural machine translation

On Search Strategies for Document-Level Neural Machine Translation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Compared to sentence-level systems, document-level neural machine translation (NMT) models produce a more consistent output across a document and are able to better resolve ambiguities within the input. There are many works on document-level NMT, mostly focusing on modifying the model architecture or training strategy to better accommodate the additional context-input. On the other hand, in most works, the question on how to perform search with the trained model is scarcely discussed, sometimes not mentioned at all. In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding. We start with the most popular document-level NMT approach and compare different decoding schemes, some from the literature and others proposed by us. In the comparison, we are using both, standard automatic metrics, as well as specific linguistic phenomena on three standard document-level translation benchmarks. We find that most commonly used decoding strategies perform similar to each other and that higher quality context information has the potential to further improve the translation. Copyright © 2023, The Authors. All rights reserved.

关键词： Neural machine translation

Document-Level language Models for Machine Translation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Petrick, Frithjof Herold, Christian Petrushkov, Pavel Khadivi, Shahram Ney, Hermann eBay Inc. Aachen Germany Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination. Copyright © 2023, The Authors. All rights reserved.

关键词： Machine translation

Improving Long Context Document-Level Machine Translation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many works have been published on the topic of document-level NMT, but most restrict the system to only local context, typically including just the one or two preceding sentences as additional information. This might be enough to resolve some ambiguous inputs, but it is probably not sufficient to capture some document-level information like the topic or style of a conversation. When increasing the context size beyond just the local context, there are two challenges: (i) the memory usage increases exponentially (ii) the translation performance starts to degrade. We argue that the widely-used attention mechanism is responsible for both issues. Therefore, we propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption. For evaluation, we utilize targeted test sets in combination with novel evaluation techniques to analyze the translations in regards to specific discourse-related phenomena. We find that our approach is a good compromise between sentence-level NMT vs attending to the full context, especially in low resource scenarios. Copyright © 2023, The Authors. All rights reserved.

关键词： Neural machine translation

Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss

学校读者我要写书评

暂无评论

Robust Knowledge Distillation from RNN-T Models with Noisy T...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Mohammad Zeineldeen Kartik Audhkhasi Murali Karthick Baskar Bhuvana Ramabhadran Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany Google LLC New York

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNNT architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.

关键词： Knowledge engineering Training Recurrent neural networks Transducers Production Signal processing Acoustics

Enhancing and Adversarial: Improve ASR with Speaker Labels

学校读者我要写书评

暂无评论

Enhancing and Adversarial: Improve ASR with Speaker Labels

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Wei Zhou Haotian Wu Jingjing Xu Mohammad Zeineldeen Christoph Lüscher Ralf Schlüter Hermann Ney Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5’00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

关键词： Training Artificial neural networks Switches Signal processing Multitasking Acoustics Speech processing

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

学校读者我要写书评

暂无评论

Lattice-Free Sequence Discriminative Training for Phoneme-Ba...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zijian Yang Wei Zhou Ralf Schlüter Hermann Ney Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

关键词： Training Degradation Transducers Error analysis Signal processing Decoding Speech processing

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies 30

学校读者我要写书评

暂无评论

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Vis...

Joint 30th International Conference on Computational Linguistics and 14th International Conference on language Resources and Evaluation, LREC-COLING 2024

作者： Acosta-Triana, José M. Gimeno-Gómez, David Martínez-Hinarejos, Carlos D. ValgrAI - Valencian Graduate School and Research Network of Artificial Intelligence Camino de Vera s/n 3Q Building València46022 Spain Pattern Recognition and Human Language Technologies Research Center Universitat Politècnica de València Camino de Vera s/n València46022 Spain

ISBN: (纸本)9782493814104

More than 7,000 known languages are spoken around the world. However, due to the lack of annotated resources, only a small fraction of them are currently covered by speech technologies. Albeit self-supervised speech representations, recent massive speech corpora collections, as well as the organization of challenges, have alleviated this inequality, most studies are mainly benchmarked on English. This situation is aggravated when tasks involving both acoustic and visual speech modalities are addressed. In order to promote research on low-resource languages for audio-visual speech technologies, we present AnnoTheia, a semi-automatic annotation toolkit that detects when a person speaks on the scene and the corresponding transcription. In addition, to show the complete process of preparing AnnoTheia for a language of interest, we also describe the adaptation of a pre-trained model for active speaker detection to Spanish, using a database not initially conceived for this type of task. The AnnoTheia toolkit, tutorials, and pre-trained models are available at https://***/joactr/AnnoTheia/. © 2024 ELRA language Resource Association: CC BY-NC 4.0.

关键词： Speech recognition

Efficient Utilization of Large Pre-Trained Models for Low Resource ASR

学校读者我要写书评

暂无评论

Efficient Utilization of Large Pre-Trained Models for Low Re...

Acoustics, Speech, and Signal Processing Workshops (ICASSPW), IEEE International Conference on

作者： Peter Vieting Christoph Lüscher Julian Dierkes Ralf Schlüter Hermann Ney Computer Science Department Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data. Following this, hardware limitations and applications give rise to the question how to take advantage of large pre-trained models efficiently and reduce their complexity. In this work, we study a challenging low resource conversational telephony speech corpus from the medical domain in Vietnamese and German. We show the benefits of using unsupervised techniques beyond simple fine-tuning of large pre-trained models, discuss how to adapt them to a practical telephony task including bandwidth transfer and investigate different data conditions for pre-training and fine-tuning. We outperform the project baselines by 22% relative using pre-training techniques. Further gains of 29% can be achieved by refinements of architecture and training and 6% by adding 0.8 h of in-domain adaptation data.

关键词：