检索结果-内蒙古大学图书馆

uniblock: Scoring and Filtering Corpus with Unicode Block Information

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gao, Yingbo Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

The preprocessing pipelines in Natural language Processing usually involve a step of removing sentences consisted of illegal characters. The definition of illegal characters and the specific removal strategy depend on the task, language, domain, etc, which often lead to tiresome and repetitive scripting of rules. In this paper, we introduce a simple statistical method, uniblock1, to overcome this problem. For each sentence, uniblock generates a fixed-size feature vector using Unicode block information of the characters. A Gaussian mixture model is then estimated on some clean corpus using variational inference. The learned model can then be used to score sentences and filter corpus. We present experimental results on Sentiment Analysis, language Modeling and Machine Translation, and show the simplicity and effectiveness of our method. Copyright © 2019, The Authors. All rights reserved.

关键词： Gaussian distribution

Exploring Kernel functions in the softmax layer for contextual word classification

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gao, Yingbo Herold, Christian Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Support vector machines

Interactive-predictive neural multimodal systems?

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revision process. The interactive-predictive framework aims to minimize the human effort spent on this process by considering partial corrections for iteratively refining the hypothesis. In this work, we generalize the interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video captioning. We study the application of this framework to multimodal neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs generated by the automatic systems. Moreover, we deploy our systems in a publicly accessible demonstration, that allows to better understand the behavior of the interactive-predictive framework. Copyright © 2019, The Authors. All rights reserved.

关键词： Deep learning

The Vicomtech-PRHLT Speech Transcription Systems for the IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge 4

学校读者我要写书评

暂无评论

The Vicomtech-PRHLT Speech Transcription Systems for the Ibe...

4th International Conference on Advances in Speech and language Technologies for Iberian languages, IberSPEECH 2018

作者： Arzelus, Haritz Álvarez, Aitor Bernath, Conrad García, Eneritz Granell, Emilio Martínez-Hinarejos, Carlos D. Vicomtech Human Speech and Language Technology Group Spain Pattern Recognition and Human Language Technologies Research Center Universitat Politècnica de València Spain

This paper describes our joint submission to the IberSPEECH-RTVE Speech to Text Transcription Challenge 2018, which calls for automatic speech transcription systems to be evaluated in realistic TV shows. With the aim of building and evaluating systems, RTVE licensed around 569 hours of different TV programs, which were processed, re-aligned and revised in order to discard segments with imperfect transcriptions. This task reduced the corpus to 136 hours that we considered as nearly perfectly aligned audios and that we employed as in-domain data to train acoustic models. A total of 6 systems were built and presented to the evaluation challenge, three systems per condition. These recognition engines are different versions, evolution and configurations of two main architectures. The first architecture includes an hybrid LSTM-HMM acoustic model, where bidirectional LSTMs were trained to provide posterior probabilities for the HMM states. The language model corresponds to modified Kneser- Ney smoothed 3-gram and 9-gram models used for decoding and re-scoring of the lattices respectively. The second architecture includes an End-To-End based recognition system, which combines 2D convolutional neural networks as spectral feature extractor from spectrograms with bidirectional Gated Recurrent Units as RNN acoustic models. A modified Kneser-Ney smoothed 5-gram model was also integrated to re-score the E2E hypothesis. All the systems' outputs were then punctuated using bidirectional RNN models with attention mechanism and capitalized through recasing techniques. © 4th International Conference, IberSPEECH 2018.

关键词： Speech recognition

Exploring E2E speech recognition systems for new languages 4

学校读者我要写书评

暂无评论

Exploring E2E speech recognition systems for new languages

4th International Conference on Advances in Speech and language Technologies for Iberian languages, IberSPEECH 2018

作者： Bernath, Conrad Álvarez, Aitor Arzelus, Haritz Martínez-Hinarejos, Carlos-D. Human Speech and Language Technology Group Vicomtech Spain Pattern Recognition and Human Language Technologies Research Center Universitat Politècnica de València Spain

Over the last few years, advances in both machine learning algorithms and computer hardware have led to significant improvements in speech recognition technology, mainly through the use of Deep Learning paradigms. As it was amply demonstrated in different studies, Deep Neural Networks (DNNs) have already outperformed traditional Gaussian Mixture Models (GMMs) at acoustic modeling in combination with Hidden Markov Models (HMMs). More recently, new attempts have focused on building end-to-end (E2E) speech recognition architectures, especially in languages with many resources like English and Chinese, with the aim of overcoming the performance of LSTM-HMM and more conventional systems. The aim of this work is first to present the different techniques that have been applied to enhance state-of-the-art E2E systems for American English using publicly available datasets. Secondly, we describe the construction of E2E systems for Spanish and Basque, and explain the strategies applied to overcome the problem of the limited availability of training data, especially for Basque as a low-resource language. At the evaluation phase, the three E2E systems are also compared with LSTM-HMM based recognition engines built and tested with the same datasets. © 4th International Conference, IberSPEECH 2018.

关键词： Hidden Markov models

Generative models for deep learning with very scarce data

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Maroñas, Juan Paredes, Roberto Ramos, Daniel Pattern Recognition and Human Language Technology Universitat Politecnica de Valencia Valencia Spain AUDIAS Universidad Autonoma de Madrid Madrid Spain

The goal of this paper is to deal with a data scarcity scenario where deep learning techniques use to fail. We compare the use of two well established techniques, Restricted Boltzmann Machines and Variational Auto-encoders, as generative models in order to increase the training set in a classification framework. Essentially, we rely on Markov Chain Monte Carlo (MCMC) algorithms for generating new samples. We show that generalization can be improved comparing this methodology to other state-of-the-art techniques, e.g. semi-supervised learning with ladder networks. Furthermore, we show that RBM is better than VAE generating new samples for training a classifier with good generalization capabilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Supervised learning

Are automatic metrics robust and reliable in specific machine translation tasks? 21

学校读者我要写书评

暂无评论

Are automatic metrics robust and reliable in specific machin...

21st Annual Conference of the European Association for Machine Translation, EAMT 2018

作者： Chinea-Rios, Mara Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

ISBN: (纸本)9788409019014

We present a comparison of automatic metrics against human evaluations of translation quality in several scenarios which were unexplored up to now. Our experimentation was conducted on translation hypotheses that were problematic for the automatic metrics, as the results greatly diverged from one metric to another. We also compared three different translation technologies. Our evaluation shows that in most cases, the metrics capture the human criteria. However, we face failures of the automatic metrics when applied to some domains and systems. Interestingly, we find that automatic metrics applied to the neural machine translation hypotheses provide the most reliable results. Finally, we provide some advice when dealing with these problematic domains. © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.

关键词： Neural machine translation

Analysis of deep clustering as preprocessing for automatic speech recognition of sparsely overlapping speech

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Menne, Tobias Sklyar, Ilya Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix dataset, which is the best performance reported thus far to the best of our knowledge. The wsj0-2mix dataset contains simulated cross-talk where the speech of multiple speakers overlaps for almost the entire utterance. In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers. This paper investigates obstacles of applying DPCL as a preprocessing method for ASR in such a scenario of sparsely overlapping speech. To this end we present a data simulation approach, closely related to the wsj0-2mix dataset, generating sparsely overlapping speech datasets of arbitrary overlap ratio. The analysis of applying DPCL to sparsely overlapping speech is an important interim step between the fully overlapping datasets like wsj0-2mix and more realistic ASR datasets, such as CHiME-5 or AMI. Copyright © 2019, The Authors. All rights reserved.

关键词： Speech

Active learning for interactive neural machine translation of data streams 22

学校读者我要写书评

暂无评论

Active learning for interactive neural machine translation o...

22nd Conference on Computational Natural language Learning, CoNLL 2018

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

ISBN: (纸本)9781948087728

We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences, those worth to be supervised by a human agent. The user will interactively translate those samples. Once validated, these data is useful for adapting the neural machine translation model. We propose two novel methods for selecting the samples to be validated. We exploit the information from the attention mechanism of a neural machine translation system. Our experiments show that the inclusion of active learning techniques into this pipeline allows to reduce the effort required during the process, while increasing the quality of the translation system. Moreover, it enables to balance the human effort required for achieving a certain translation quality. Moreover, our neural system outperforms classical approaches by a large margin. © 2018 Association for Computational Linguistics.

关键词： Neural machine translation