检索结果-内蒙古大学图书馆

Investigation of large-margin softmax in neural language modeling

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Huo, Jingjing Gao, Yingbo Wang, Weiyue Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept into the softmax is reported to have good properties such as enhanced discriminative power, less overfitting and well-defined geometric intuitions. Nowadays, language modeling is commonly approached with neural networks using softmax and cross entropy. In this work, we are curious to see if introducing large-margins to neural language models would improve the perplexity and consequently word error rate in automatic speech recognition. Specifically, we first implement and test various types of conventional margins following the previous works in face recognition. To address the distribution of natural language data, we then compare different strategies for word vector norm-scaling. After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition. We find that although perplexity is slightly deteriorated, neural language models with large-margin softmax can yield word error rate similar to that of the standard softmax baseline. Finally, expected margins are analyzed through visualization of word vectors, showing that the syntactic and semantic relationships are also preserved. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech recognition

Two-way neural machine translation: A proof of concept for bidirectional translation modeling using a two-dimensional grid

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Bahar, Parnia Brix, Christopher Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Neural translation models have proven to be effective in capturing sufficient information from a source sentence and generating a high-quality target sentence. However, it is not easy to get the best effect for bidirectional translation, i.e., both source-to-target and target-to-source translation using a single model. If we exclude some pioneering attempts, such as multilingual systems, all other bidirectional translation approaches are required to train two individual models. This paper proposes to build a single end-to-end bidirectional translation model using a two-dimensional grid, where the left-to-right decoding generates source-to-target, and the bottom-to-up decoding creates target-to-source output. Instead of training two models independently, our approach encourages a single network to jointly learn to translate in both directions. Experiments on the WMT 2018 German↔English and Turkish↔English translation tasks show that the proposed model is capable of generating a good translation quality and has sufficient potential to direct the research. © 2020, CC-BY.

关键词： Decoding

Sample drop detection for asynchronous devices distributed in space

学校读者我要写书评

暂无评论

Sample drop detection for asynchronous devices distributed i...

European Signal Processing Conference (EUSIPCO)

作者： Tina Raissi Santiago Pascual Maurizio Omologo Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany Universitat Politècnica de Catalunya Barcelona Spain Santiago Pascual is currently at Dolby Laboratories Barcelona Spain Center for Information and Communication Technology (ICT) Fondazione Bruno Kessler (FBK) Trento Italy

ISBN: (数字)9789082797053

ISBN: (纸本)9781728150017

In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes.

关键词： Performance evaluation Array signal processing Speech recognition Synchronization Time-domain analysis Task analysis Microphones

Tight integrated end-to-end training for cascaded speech translation

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Bahar, Parnia Bieschke, Tobias Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation;however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previous studies have proposed using two-stage models by passing the hidden vectors of the recognizer into the decoder of the MT model and ignoring the MT encoder. This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly without ignoring any learned parameters. It is a tightly integrated method that passes renormalized source word posterior distributions as a soft decision instead of one-hot vectors and enables backpropagation. Therefore, it provides both transcriptions and translations and achieves strong consistency between them. Our experiments on four tasks with different data scenarios show that the model outperforms cascade models up to 1.8% in BLEU and 2.0% in TER and is superior compared to direct models. © 2020, CC-BY.

关键词： Backpropagation

Improved robustness to disfluencies in RNN-transducer based speech recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Mendelev, Valentin Raissi, Tina Camporese, Guglielmo Giollo, Manuel Amazon Alexa United States Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Department of Mathematics "Tullio Levi-Civita" University of Padova Italy

Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech

Two Semi-Supervised Training Approaches for Automated Text recognition

学校读者我要写书评

暂无评论

Two Semi-Supervised Training Approaches for Automated Text R...

International Workshop on Frontiers in Handwriting recognition

作者： Gundram Leifert Roger Labahn Joan Andreu Sánchez Computational Intelligence Technology Lab University of Rostock Rostock Germany Pattern Recognition and Human Language Technologies Center Universitat Politécnica de València València Spain

ISBN: (数字)9781728199665

ISBN: (纸本)9781728199672

Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.

关键词： Training Training data Text recognition Layout Task analysis Production Linguistics

Unsupervised training for large vocabulary translation using sparse lexicon and word classes

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Kim, Yunsu Schamper, Julian Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks. Copyright © 2019, The Authors. All rights reserved.

关键词： Maximum principle

HTR-Flor: A Deep Learning System for Offline Handwritten Text recognition

学校读者我要写书评

暂无评论

HTR-Flor: A Deep Learning System for Offline Handwritten Tex...

Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI)

作者： Arthur Flor de Sousa Neto Byron Leite Dantas Bezerra Alejandro Héctor Toselli Estanislau Baptista Lima Escola Politécnica de Pernambuco Universidade de Pernambuco Recife Brazil Pattern Recognition and Human Language Technology Universitat Politècnica de València València Spain

ISBN: (数字)9781728192741

ISBN: (纸本)9781728192758

In recent years, Handwritten Text recognition (HTR) has captured a lot of attention among the researchers of the computer vision community. Current state-of-the-art approaches for offline HTR are based on Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, deep models such as CRNNs, Recur-rent Neural Networks (RNNs) are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. Besides, they usually have millions of parameters which require huge amount of data, and computational resource. Recently, a new class of neural net-work architecture, called Gated Convolutional Neural Networks (Gated-CNN), has demonstrated potentials to complement CRNN methods in modeling. Therefore, in this paper, we present a new architecture for HTR, based on Gated-CNN, with fewer parameters and fewer layers, which is able to outperform the current state-of-the-art architectures for HTR. The experiment validates that the proposed model has statistically significant recognition results, surpassing previous HTR systems by an average of 33% over five important handwritten benchmark datasets. Moreover, the proposed model is able to achieve satisfactory recognition rates even in case of few training data. Finally, its compact architecture requires less computational resources, which can be applied for real-world applications that have hardware limitations, such as robots and smartphones.

关键词： Logic gates Convolution Computer architecture Text recognition Hidden Markov models Optical imaging Computational modeling

Integrated system for the efficient maintenance of urban pavements

学校读者我要写书评

暂无评论

Carreteras

Carreteras 2021年第235期4卷 8-18页

作者： Pellicer Armiñana, Eugenio Paredes Palacios, Roberto Felipo Sanjuán, Jesús Sánchez-Robles Bello, Juan Grupo de Gestión del Proceso Proyecto-Construcción Universitat Politécnica de Valencia Spain Centro de Investigación Pattern Recognition and Human Language Technology Universitat Politécnica de Valencia Spain Pavasal Empresa Constructora S.A. Spain Proyectos y Obras. CPS infraestructuras Movilidad y Medio Ambiente S.L. Spain

The integrated system for the efficient maintenance of urban pavements is an innovation project derived from collaboration between a public university and prívate enterprises. This system automates the tasks of auscultation of urban pavements obtaining images with conventional cameras and later processing them, using deep learning techniques. In addition, it Incorporates optimization and predictive decision making that Includes, not only traditional technical criteria (such as the state of the pavement and weather conditions, for example), but also environmental (CO2emissions) and social criteria (proximity to critical infrastructure). The application is based on a friendly visual interface that allows identification and quantification of pavement deterioration in an automatic way, the prediction of the future state of the network, as well as the establishment of an efficient and proactive maintenance and rehabilitation program of urban pavement in the medium-long term. © 2021 Asociacion Espanola de la Carretera. All rights reserved.

关键词： Interface states