检索结果-内蒙古大学图书馆

MLLP-UPV and RWTH Aachen Spanish ASR Systems for the IberSpeech-RTVE 2018 Speech-to-Text Transcription Challenge 4

学校读者我要写书评

暂无评论

MLLP-UPV and RWTH Aachen Spanish ASR Systems for the IberSpe...

4th International Conference on Advances in Speech and language Technologies for Iberian languages, IberSPEECH 2018

作者： Jorge, Javier Martínez-Villaronga, Adrià Golik, Pavel Giménez, Adrià Silvestre-Cerdà, Joan Albert Doetsch, Patrick Císcar, Vicent Andreu Ney, Hermann Juan, Alfons Sanchis, Albert Departament de Sistemes Informàtics i Computació Universitat Politècnica de València Spain Human Language Technology and Pattern Recognition RWTH Aachen University Germany Escola Tècnica Superior d'Enginyeria Informàtica Universitat Politècnica de València Spain

This paper describes the Automatic Speech recognition systems built by the MLLP research group of Universitat Politècnica de València and the HLTPR research group of RWTH Aachen for the IberSpeech-RTVE 2018 Speech-to-Text Transcription Challenge. We participated in both the closed and the open training conditions. The best system built for the closed condition was an hybrid BLSTM-HMM ASR system using one-pass decoding with a combination of a RNN LM and show-adapted n-gram LMs. It was trained on a set of reliable speech data extracted from the train and dev1 sets using MLLP's transLectures-UPV toolkit (TLK) and TensorFlow. This system achieved 20.0% WER on the dev2 set. For the open condition we used approx. 3800 hours of outof- domain training data from multiple sources and trained a one-pass hybrid BLSTM-HMM ASR system using open-source tools RASR and RETURNN developed at RWTH Aachen. This system scored 15.6% WER on the dev2 set. The highlights of these systems include robust speech data filtering for acoustic model training and show-specific language modeling. © 4th International Conference, IberSPEECH 2018.

关键词： Modeling languages

Improved training of end-to-end attention models for speech recognition

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Zeyer, Albert Irie, Kazuki Schlüter, Ralf Ney, Hermann Computer Science Department Rwth Aachen University Human Language Technology and Pattern Recognition Aachen52062 Germany AppTek United States Nnaisense Switzerland

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the testclean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model. Copyright © 2018, The Authors. All rights reserved.

关键词： Long short-term memory

Returnn: The RWTH extensible training framework for universal recurrent neural networks

学校读者我要写书评

暂无评论

Returnn: The RWTH extensible training framework for universa...

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017

作者： Doetsch, Patrick Zeyer, Albert Voigtlaender, Paul Kulikov, Ilia Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52062 Germany

ISBN: (纸本)9781509041176

In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUS. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns. © 2017 IEEE.

关键词： lstm multi-GPU recurrent neural networks rnn software package speech recognition

NMT-Keras: A very flexible toolkit with a focus on interactive NMT and online learning

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particularemphasis on the development of advanced applications of neural machine translation systems, such as interactive-predictive translation protocols and long-term adaptation of the translation system via continuous learning. NMT-Keras is based on an extended version of the popular Keras library, and it runs on Theano and TensorFlow. State-of-the-art neural machine translation models are deployed and used following the high-level framework provided by Keras. Given its high modularity and flexibility, it also has been extended to tackle different problems, such as image and video captioning, sentence classification and visual question answering. Copyright © 2018, The Authors. All rights reserved.

关键词： Neural machine translation

Online learning for effort reduction in interactive neural machine translation

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the system must be revised by a human agent. This can be done in a post-editing stage or following an interactive machine translation protocol. We explore the incremental update of neural machine translation systems during the post-editing or interactive translation processes. Such modifications aim to incorporate the new knowledge, from the edited sentences, into the translation system. Updates to the model are performed on-the-fly, as sentences are corrected, via online learning techniques. In addition, we implement a novel interactive, adaptive system, able to react to single-character interactions. This system greatly reduces the human effort required for obtaining high-quality translations. In order to stress our proposals, we conduct exhaustive experiments varying the amount and type of data available for training. Results show that online learning effectively achieves the objective of reducing the human effort required during the post-editing or the interactive machine translation stages. Moreover, these adaptive systems also perform well in scenarios with scarce resources. We show that a neural machine translation system can be rapidly adapted to a specific domain, exclusively by means of online learning techniques. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep learning

Adapting neural machine translation with parallel synthetic data 2

学校读者我要写书评

暂无评论

Adapting neural machine translation with parallel synthetic ...

2nd Conference on Machine Translation, WMT 2017

作者： Chinea-Ríos, Mara Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

ISBN: (纸本)9781945626968

Recent works have shown that the usage of a synthetic parallel corpus can be effectively exploited by a neural machine translation system. In this paper, we propose a new method for adapting a general neural machine translation system to a specific task, by exploiting synthetic data. The method consists in selecting, from a large monolingual pool of sentences in the source language, those instances that are more related to a given test set. Next, this selection is automatically translated and the general neural machine translation system is fine-tuned with these data. For evaluating the adaptation method, we first conducted experiments in two controlled domains, with common and well-studied corpora. Then, we evaluated our proposal on a real e-commerce task, yielding consistent improvements in terms of translation quality. © 2017 Association for Computational Linguistics

关键词： Neural machine translation

How much does Tokenization affect neural machine translation?

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Domingo, Miguel García-Martínez, Mercedes Helle, Alexandre Casacuberta, Francisco Herranz, Manuel Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain Pangeanic / B.I Europa PangeaMT Technologies Division Valencia Spain

Tokenization or segmentation is a wide concept that covers simple processes such as separating punctuation from words, or more sophisticated processes such as applying morphological knowledge. Neural Machine Translation (NMT) requires a limited-size vocabulary for computational cost and enough examples to estimate word embeddings. Separating punctuation and splitting tokens into words or subwords has proven to be helpful to reduce vocabulary and increase the number of examples of each word, improving the translation quality. Tokenization is more challenging when dealing with languages with no separator between words. In order to assess the impact of the tokenization in the quality of the final translation on NMT, we experimented on five tokenizers over ten language pairs. We reached the conclusion that the tokenization significantly affects the final translation quality and that the best tokenizer differs for different language pairs. Copyright © 2018, The Authors. All rights reserved.

关键词： Neural machine translation

Local system voting feature for machine translation system combination

学校读者我要写书评

暂无评论

arXiv 2017年

作者： Freitag, Markus Peter, Jan-Thorsten Peitz, Stephan Feng, Minwei Ney, Hermann Human Language Technology Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network. This work is motivated by the fact that the commonly used binary system voting models only assign each input system a global weight which is responsible for the global impact of each input system on all translations. This prevents individual systems with low system weights from having influence on the system combination output, although in some situations this could be helpful. Further, words which have only been seen by one or few systems rarely have a chance of being present in the combined output. We train a local system voting model by a neural network which is based on the words themselves and the combinatorial occurrences of the different system outputs. This gives system combination the option to prefer other systems at different word positions even for the same sentence. Copyright © 2017, The Authors. All rights reserved.

关键词： Machine learning

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

学校读者我要写书评

暂无评论

Investigations on byte-level convolutional neural networks f...

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Kazuki Irie Pavel Golik Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Germany

ISBN: (纸本)9781509041183

In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project and measure the performance in terms of perplexity and word error rate. Introducing a convolutional layer consistently improves the results on all languages. Also, there is no degradation from using raw bytes instead of proper Unicode characters, even on syllabic alphabets like Amharic. In addition, we report improvements in word error rate from rescoring lattices and evaluate keyword search performance on several languages.

关键词： language modeling convolutional neural networks speech recognition keyword search modelling languages Speech recognition Key Search Error analysis language Amharic Byte Word