检索结果-内蒙古大学图书馆

A Comparative Study on End-to-End Speech to Text Translation

学校读者我要写书评

暂无评论

A Comparative Study on End-to-End Speech to Text Translation

IEEE Workshop on Automatic Speech recognition and Understanding

作者： Parnia Bahar Tobias Bieschke Hermann Ney AppTek GmbH Aachen Germany Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781728103068

ISBN: (纸本)9781728103075

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pretrained models, and their impact on the final performance, which gives boosts up to 4% in Bleu and 5% in Ter. Our experiments are performed on 270h IWSLT TED-talks En→De, and 100h LibriSpeech Audio-books En→Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.

关键词： Decoding Task analysis Computational modeling Training Speech recognition Data models Feature extraction

uniblock: Scoring and Filtering Corpus with Unicode Block Information

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gao, Yingbo Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

The preprocessing pipelines in Natural language Processing usually involve a step of removing sentences consisted of illegal characters. The definition of illegal characters and the specific removal strategy depend on the task, language, domain, etc, which often lead to tiresome and repetitive scripting of rules. In this paper, we introduce a simple statistical method, uniblock1, to overcome this problem. For each sentence, uniblock generates a fixed-size feature vector using Unicode block information of the characters. A Gaussian mixture model is then estimated on some clean corpus using variational inference. The learned model can then be used to score sentences and filter corpus. We present experimental results on Sentiment Analysis, language Modeling and Machine Translation, and show the simplicity and effectiveness of our method. Copyright © 2019, The Authors. All rights reserved.

关键词： Gaussian distribution

Exploring Kernel functions in the softmax layer for contextual word classification

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gao, Yingbo Herold, Christian Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Support vector machines

Modernizing Historical Documents: a User Study

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Domingo, Miguel Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in human language and the linguistic properties of these documents. Given a historical document, modernization aims to generate a new version of it, written in the modern version of the document’s language. Its goal is to tackle the language barrier, decreasing the comprehension difficulty and making historical documents accessible to a broader audience. In this work, we proposed a new neural machine translation approach that profits from modern documents to enrich its systems. We tested this approach with both automatic and human evaluation, and conducted a user study. Results showed that modernization is successfully reaching its goal, although it still has room for improvement. Copyright © 2019, The Authors. All rights reserved.

关键词： History

Generative models for deep learning with very scarce data

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Maroñas, Juan Paredes, Roberto Ramos, Daniel Pattern Recognition and Human Language Technology Universitat Politecnica de Valencia Valencia Spain AUDIAS Universidad Autonoma de Madrid Madrid Spain

The goal of this paper is to deal with a data scarcity scenario where deep learning techniques use to fail. We compare the use of two well established techniques, Restricted Boltzmann Machines and Variational Auto-encoders, as generative models in order to increase the training set in a classification framework. Essentially, we rely on Markov Chain Monte Carlo (MCMC) algorithms for generating new samples. We show that generalization can be improved comparing this methodology to other state-of-the-art techniques, e.g. semi-supervised learning with ladder networks. Furthermore, we show that RBM is better than VAE generating new samples for training a classifier with good generalization capabilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Supervised learning

OLSR+: A new routing method based on fuzzy logic in flying ad-hoc networks (FANETs)

学校读者我要写书评

暂无评论

Vehicular Communications 2022年 36卷

作者： Rahmani, Amir Masoud Ali, Saqib Yousefpoor, Efat Yousefpoor, Mohammad Sadegh Javaheri, Danial Lalbakhsh, Pooia Hassan Ahmed, Omed Hosseinzadeh, Mehdi Lee, Sang-Woong Future Technology Research Center National Yunlin University of Science and Technology Yunlin Taiwan Department of Information Systems College of Economics and Political Science Sultan Qaboos University Al Khoudh Muscat Oman Department of Computer Engineering Dezful Branch Islamic Azad University Dezful Iran Department of Computer Engineering Chosun University Gwangju 61452 South Korea Department of Data Science and Artificial Intelligence Faculty of Information Technology Monash University Clayton 3800 VIC Australia Department of Information Technology University of Human Development Sulaymaniyah Iraq Pattern Recognition and Machine Learning Lab Gachon University 1342 Seongnamdaero Sujeonggu Seongnam 13120 South Korea

Flying ad-hoc networks (FANETs) have many applications in military, industrial and agricultural areas. Due to specific features of FANETs, such as high-speed nodes, low density of nodes in the network, and rapid changes in the topology, most routing protocols designed for mobile ad hoc networks (MANETs) or vehicular ad hoc networks (VANETs) are not compatible with FANETs. In this paper, we propose a fuzzy logic-based routing approach called OLSR+ for FANETs. In this scheme, we seek to improve the optimized link state routing protocol (OLSR) so that it can efficiently be used in FANETs. OLSR+ includes four main phases: 1) Discovering neighboring nodes. In this phase, we propose a new and efficient technique for estimating the lifetime of the link between two unmanned ariel vehicles (UAVs) based on the link quality, distance, relative velocity, and movement direction. 2) Selecting multipoint relays (MPRs). In this phase, we present a fuzzy mechanism for selecting a set of MPR nodes. According to this mechanism, when a node has higher residual energy, higher link lifetime, and more neighborhood degree compared to others, it achieves more fitness to be selected as MPR. 3) Discovering the network topology. In this phase, we modify the format of the topology control (TC) message and add two fields, including route energy and route lifetime to this message. 4) Calculating the routing table. In OLSR+, we consider two parameters, including route energy and route lifetime, for establishing stable paths. Finally, we simulate OLSR+ using NS3 and compare its performance with two methods, namely greedy optimized link state routing (G-OLSR) and optimized link state routing (OLSR). The simulation results show that OLSR+ successfully reduces delay compared to G-OLSR and OLSR. In addition, it has higher packet delivery rate and throughput than others. Also, it improves energy consumption in the network. However, OLSR+ has more routing overhead than G-OLSR. © 2022 Elsevier Inc.

关键词： Artificial intelligence (AI) Flying ad hoc networks (FANETs) Fuzzy logic Routing Unmanned ariel vehicles (UAVs)

Analysis of deep clustering as preprocessing for automatic speech recognition of sparsely overlapping speech

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Menne, Tobias Sklyar, Ilya Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix dataset, which is the best performance reported thus far to the best of our knowledge. The wsj0-2mix dataset contains simulated cross-talk where the speech of multiple speakers overlaps for almost the entire utterance. In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers. This paper investigates obstacles of applying DPCL as a preprocessing method for ASR in such a scenario of sparsely overlapping speech. To this end we present a data simulation approach, closely related to the wsj0-2mix dataset, generating sparsely overlapping speech datasets of arbitrary overlap ratio. The analysis of applying DPCL to sparsely overlapping speech is an important interim step between the fully overlapping datasets like wsj0-2mix and more realistic ASR datasets, such as CHiME-5 or AMI. Copyright © 2019, The Authors. All rights reserved.

关键词： Speech

The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018 3

学校读者我要写书评

暂无评论

The RWTH Aachen University English-German and German-English...

3rd Conference on Machine Translation, WMT 2018 at the Conference on Empirical Methods in Natural language Processing, EMNLP 2018

作者： Graça, Miguel Kim, Yunsu Schamper, Julian Geng, Jiahui Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781948087810

This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English ↔ German news translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer. ©2018 Association for Computational Linguistics

关键词： Neural machine translation

Improving Neural language Models with Weight Norm Initialization and Regularization 3

学校读者我要写书评

暂无评论

Improving Neural Language Models with Weight Norm Initializa...

3rd Conference on Machine Translation, WMT 2018 at the Conference on Empirical Methods in Natural language Processing, EMNLP 2018

作者： Herold, Christian Gao, Yingbo Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781948087810

Embedding and projection matrices are commonly used in neural language models (NLM) as well as in other sequence processing networks that operate on large vocabularies. We examine such matrices in fine-tuned language models and observe that a NLM learns word vectors whose norms are related to the word frequencies. We show that by initializing the weight norms with scaled log word counts, together with other techniques, lower perplexities can be obtained in early epochs of training. We also introduce a weight norm regularization loss term, whose hyperparameters are tuned via a grid search. With this method, we are able to significantly improve perplexities on two word-level language modeling tasks (without dynamic evaluation): from 54.44 to 53.16 on Penn Treebank (PTB) and from 61.45 to 60.13 on WikiText-2 (WT2). © 2018 Association for Computational Linguistics.

关键词： Modeling languages