检索结果-内蒙古大学图书馆

A comparative study on end-to-end speech to text translation

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Bahar, Parnia Bieschke, Tobias Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department Rwth Aachen University Aachen52074 Germany AppTek GmbH Aachen52062

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pretrained models, and their impact on the final performance, which gives boosts up to 4% in BLEU and 5% in TER. Our experiments are performed on 270h IWSLT TED-talks En→De, and 100h LibriSpeech Audio-books En→Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks. Copyright © 2019, The Authors. All rights reserved.

关键词： Deep learning

The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task 3

学校读者我要写书评

暂无评论

The RWTH Aachen University Filtering System for the WMT 2018...

3rd Conference on Machine Translation, WMT 2018 at the Conference on Empirical Methods in Natural language Processing, EMNLP 2018

作者： Rossenbach, Nick Rosendahl, Jan Kim, Yunsu Graça, Miguel Gokrani, Aman Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781948087810

This paper describes the submission of RWTH Aachen University for the De→En parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU. ©2018 Association for Computational Linguistics

关键词： Heuristic methods

On the Alignment Problem in Multi-Head Attention-Based Neural Machine Translation 3

学校读者我要写书评

暂无评论

On the Alignment Problem in Multi-Head Attention-Based Neura...

3rd Conference on Machine Translation, WMT 2018 at the Conference on Empirical Methods in Natural language Processing, EMNLP 2018

作者： Alkhouli, Tamer Bretschner, Gabriel Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781948087810

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to 3.8% BLEU improvement when using the dictionary, in comparison to 2.4% BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of 1.8 without loss in translation performance. We carry out experiments on the shared WMT 2016 English→Romanian news task and the BOLT Chinese→English discussion forum task. © 2018 Association for Computational Linguistics.

关键词： Alignment

LSTM language models for LVCSR in first-pass decoding and lattice-rescoring

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Beck, Eugen Zhou, Wei Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given word-hypothesis during decoding without recombination. Copyright © 2019, The Authors. All rights reserved.

关键词： Decoding

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018 3

学校读者我要写书评

暂无评论

The RWTH Aachen University Supervised Machine Translation Sy...

3rd Conference on Machine Translation, WMT 2018 at the Conference on Empirical Methods in Natural language Processing, EMNLP 2018

作者： Schamper, Julian Rosendahl, Jan Bahar, Parnia Kim, Yunsu Nix, Arne Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781948087810

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we scored first with respect to all automatic metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8% BLEU over our last year’s submission and by 4.8% BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6% BLEU improvement over the last year’s winning system. We further report results on the Chinese→English task where we improve 2.2% BLEU on average over our baseline systems but stay behind the 2018 winning systems. ©2018 Association for Computational Linguistics

关键词： Neural machine translation

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Lüscher, Christoph Beck, Eugen Irie, Kazuki Kitza, Markus Michel, Wilfried Zeyer, Albert Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task. Detailed descriptions of the system development, including model design, pretraining schemes, training schedules, and optimization approaches are provided for both system architectures. Both hybrid DNN/HMM and attention-based systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures. All our systems are built using RWTH’s open-source toolkits RASR and RETURNN. To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems. Our single hybrid system even outperforms previous results obtained from combining eight single systems. Our comparison shows that on the LibriSpeech 960h task, the hybrid DNN/HMM system outperforms the attention-based system by 15% relative on the clean and 40% relative on the other test sets in terms of word error rate. Moreover, experiments on a reduced 100h-subset of the LibriSpeech training corpus even show a more pronounced margin between the hybrid DNN/HMM and attention-based architectures. Copyright © 2019, The Authors. All rights reserved.

关键词： Hybrid systems

Generating synthetic audio data for attention-based speech recognition systems

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Rossenbach, Nick Zeyer, Albert Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Germany AppTek GmbH 52074 Aachen Aachen52062 Germany

Recent advances in text-to-speech (TTS) led to the development of flexible multi-speaker end-to-end TTS systems. We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS system trained only on the ASR corpora itself. ASR and TTS systems are built separately to show that text-only data can be used to enhance existing end-to-end ASR systems without the necessity of parameter or architecture changes. We compare our method with language model integration of the same text data and with simple data augmentation methods like SpecAugment and show that performance improvements are mostly independent. We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50%. We also show improvements of up to 5% relative WER over our most recent ASR baseline on LibriSpeech-960h. Copyright © 2019, The Authors. All rights reserved.

关键词： Speech synthesis

Investigation on estimation of sentence probability by combining forward, backward and Bi-directional LSTM-RNNs 19

学校读者我要写书评

暂无评论

Investigation on estimation of sentence probability by combi...

19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

作者： Irie, Kazuki Lei, Zhihong Deng, Liuhui Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

A combination of forward and backward long short-term memory (LSTM) recurrent neural network (RNN) language models is a popular model combination approach to improve the estimation of the sequence probability in the second pass N-best list rescoring in automatic speech recognition (ASR). In this work, we further push such an idea by proposing a combination of three models: a forward LSTM language model, a backward LSTM language model and a bi-directional LSTM based gap completion model. We derive such a combination method from a forward backward decomposition of the sequence probability. We carry out experiments on the Switchboard speech recognition task. While we empirically find that such a combination gives slight improvements in perplexity over the combination of forward and backward models, we finally show that a combination of the same number of forward models gives the best perplexity and word error rate (WER) overall. © 2018 International Speech Communication Association. All rights reserved.

关键词： Long short-term memory

On using specaugment for end-to-end speech translation

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Bahar, Parnia Zeyer, Albert Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Aachen52062 Germany AppTek Aachen52062 Germany

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data. Copyright © 2019, The Authors. All rights reserved.

关键词：