检索结果-内蒙古大学图书馆

When and why is document-level context useful in neural machine translation?

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Duc, Yunsu Kim Tran, Thanh Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

Document-level context has received lots of attention for compensating neural machine translation (NMT) of isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration of the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify the causes of improvements by a document-level model in general test sets, clarifying the limit of the usefulness of document-level context in NMT. We show that most of the improvements are not interpretable as utilizing the context. We also show that a minimal encoding is sufficient for the context modeling and very Copyright © 2019, The Authors. All rights reserved.

关键词： Neural machine translation

ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Hanselmann, Harald Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient bounding box estimator that only needs class labels for training. The resulting model achieves new best state-of-the-art recognition accuracies on the Stanford cars and FGVC-Aircraft datasets. Copyright © 2019, The Authors. All rights reserved.

关键词： Embeddings

Effective cross-lingual transfer of neural machine translation models without shared vocabularies

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Kim, Yunsu Gao, Yingbo Ney, Hermann Human Language Technology and Pattern Recognition Group Rwth Aachen University Aachen Germany

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability is limited to cognate languages by sharing their vocabularies. This paper shows effective techniques to transfer a pre-trained NMT model to a new, unrelated language without shared vocabularies. We relieve the vocabulary mismatch by using cross-lingual word embedding, train a more language-agnostic encoder by injecting artificial noises, and generate synthetic data easily from the pre-training data without back-translation. Our methods do not require restructuring the vocabulary or retraining the model. We improve plain NMT transfer by up to +5.1% BLEU in five low-resource translation tasks, outperforming multilingual joint training by a large margin. We also provide extensive ablation studies on pre-trained embedding, synthetic data, vocabulary size, and parameter freezing for a better understanding of NMT transfer. Copyright © 2019, The Authors. All rights reserved.

关键词： Neural machine translation

A comparative study on vocabulary reduction for phrase table smoothing

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Kim, Yunsu Guta, Andreas Wuebker, Joern Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany Lilt Inc

This work systematically analyzes the smoothing effect of vocabulary reduction for phrase translation models. We extensively compare various word-level vocabularies to show that the performance of smoothing is not significantly affected by the choice of vocabulary. This result provides empirical evidence that the standard phrase translation model is extremely sparse. Our experiments also reveal that vocabulary reduction is more effective for smoothing large-scale phrase tables. Copyright © 2019, The Authors. All rights reserved.

关键词：

Comparison of lattice-free and lattice-based sequence discriminative training criteria for LVCSR

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Michel, Wilfried Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52056 Germany

Sequence discriminative training criteria have long been a standard tool in automatic speech recognition for improving the performance of acoustic models over their maximum likelihood / cross entropy trained counterparts. While previously a lattice approximation of the search space has been necessary to reduce computational complexity, recently proposed methods use other approximations to dispense of the need for the computationally expensive step of separate lattice creation. In this work we present a memory efficient implementation of the forward-backward computation that allows us to use unigram word-level language models in the denominator calculation while still doing a full summation on GPU. This allows for a direct comparison of lattice-based and lattice-free sequence discriminative training criteria such as MMI and sMBR, both using the same language model during training. We compared performance, speed of convergence, and stability on large vocabulary continuous speech recognition tasks like Switchboard and Quaero. We found that silence modeling seriously impacts the performance in the lattice-free case and needs special treatment. In our experiments lattice-free MMI comes on par with its lattice-based counterpart. Lattice-based sMBR still outperforms all lattice-free training criteria. Copyright © 2019, The Authors. All rights reserved.

关键词： Acoustic Modeling

A Comparison of Transformer and LSTM Encoder Decoder Models for ASR

学校读者我要写书评

暂无评论

A Comparison of Transformer and LSTM Encoder Decoder Models ...

IEEE Workshop on Automatic Speech recognition and Understanding

作者： Albert Zeyer Parnia Bahar Kazuki Irie Ralf Schlüter Hermann Ney AppTek GmbH Aachen Germany Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany

ISBN: (数字)9781728103068

ISBN: (纸本)9781728103075

We present competitive results using a Transformer encoder-decoder-attention model for end-to-end speech recognition needing less training time compared to a similarly performing LSTM model. We observe that the Transformer training is in general more stable compared to the LSTM, although it also seems to overfit more, and thus shows more problems with generalization. We also find that two initial LSTM layers in the Transformer encoder provide a much better positional encoding. Data-augmentation, a variant of SpecAugment, helps to improve both the Transformer by 33% and the LSTM by 15% relative. We analyze several pretraining and scheduling schemes, which is crucial for both the Transformer and the LSTM models. We improve our LSTM model by additional convolutional layers. We perform our experiments on Lib-riSpeech 1000h, Switchboard 300h and TED-LIUM-v2 200h, and we show state-of-the-art performance on TED-LIUM-v2 for attention based end-to-end models. We deliberately limit the training on LibriSpeech to 12.5 epochs of the training data for comparisons, to keep the results of practical interest, although we show that longer training time still improves more. We publish all the code and setups to run our experiments.

关键词： Decoding Hidden Markov models Training Convergence Encoding Convolutional codes Data models

Training language Models for Long-Span Cross-Sentence Evaluation

学校读者我要写书评

暂无评论

Training Language Models for Long-Span Cross-Sentence Evalua...

IEEE Workshop on Automatic Speech recognition and Understanding

作者： Kazuki Irie Albert Zeyer Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

ISBN: (数字)9781728103068

ISBN: (纸本)9781728103075

While recurrent neural networks can motivate cross-sentence language modeling and its application to automatic speech recognition (ASR), corresponding modifications of the training method for that end are rarely discussed. In fact, even more generally, the impact of training sequence construction strategy in language modeling for different evaluation conditions is typically ignored. In this work, we revisit this basic but fundamental question. We train language models based on long short-term memory recurrent neural networks and Transformers using various types of training sequences and study their robustness with respect to different evaluation modes. Our experiments on 300h Switchboard and Quaero English datasets show that models trained with back-propagation over sequences consisting of concatenation of multiple sentences with state carry-over across sequences effectively outperform those trained with the sentence-level training, both in terms of perplexity and word error rates for cross-utterance ASR.

关键词： Training Context modeling Computational modeling Recurrent neural networks Encoding Switches Standards

A Comparative Study on End-to-End Speech to Text Translation

学校读者我要写书评

暂无评论

A Comparative Study on End-to-End Speech to Text Translation

IEEE Workshop on Automatic Speech recognition and Understanding

作者： Parnia Bahar Tobias Bieschke Hermann Ney AppTek GmbH Aachen Germany Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781728103068

ISBN: (纸本)9781728103075

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pretrained models, and their impact on the final performance, which gives boosts up to 4% in Bleu and 5% in Ter. Our experiments are performed on 270h IWSLT TED-talks En→De, and 100h LibriSpeech Audio-books En→Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.

关键词： Decoding Task analysis Computational modeling Training Speech recognition Data models Feature extraction

A neural, interactive-predictive system for multimodal sequence to sequence tasks

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client–server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive-predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://***/ interactive-seq2seq. Copyright © 2019, The Authors. All rights reserved.

关键词： Websites