检索结果-内蒙古大学图书馆

A comparative large scale study of MLP features for mandarin ASR

学校读者我要写书评

暂无评论

A comparative large scale study of MLP features for mandarin...

作者： Valente, Fabio Doss, Mathew Magimai Plahl, Christian Ravuri, Suman Wang, Wen IDIAP Research Institute CH-1920 Martigny Switzerland Human Language Technology and Pattern Recognition RWTH Aachen University Germany International Computer Science Institute 1947 Center Street Berkeley CA 94704 United States Speech Technology and Research Laboratory SRI International Menlo Park CA United States

MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigate the scalability of these features in terms of the amount of training data (from 100 hours to 1600 hours) and system complexity (maximum likelihood training, SAT, lattice level combination, and discriminative training). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce relative improvements in the range of 15% - 23% at the different steps of a multipass system when compared to the conventional short-term spectral based features like MFCC and PLP. The largest improvement is obtained using a hierarchical MLP approach. © 2010 ISCA.

关键词： Maximum likelihood

NMT-Keras: A very flexible toolkit with a focus on interactive NMT and online learning

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particularemphasis on the development of advanced applications of neural machine translation systems, such as interactive-predictive translation protocols and long-term adaptation of the translation system via continuous learning. NMT-Keras is based on an extended version of the popular Keras library, and it runs on Theano and TensorFlow. State-of-the-art neural machine translation models are deployed and used following the high-level framework provided by Keras. Given its high modularity and flexibility, it also has been extended to tackle different problems, such as image and video captioning, sentence classification and visual question answering. Copyright © 2018, The Authors. All rights reserved.

关键词： Neural machine translation

Fusion of Visual and Textual Features for Table Header Detection in Handwritten Text Images

学校读者我要写书评

暂无评论

Fusion of Visual and Textual Features for Table Header Detec...

International Conference on Computational Science and Computational Intelligence (CSCI)

作者： Addisson Salazar Jose Ramó n Prieto Enrique Vidal Gonzalo Safont Luis Vergara Institute of Telecommunications and Multimedia Applications iTEAM Universitat Polit&#x00E8 cnica de Val&#x00E8 ncia Valencia Spain Pattern Recognition and Human Language Technology PRHLT Universitat Polit&#x00E8

This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a different modality, namely visual or textual features. Both baseline classifiers implements convolutional neural networks, particularly adopting the U-Net architecture. Four fusion methods are considered: the mean; linear discriminant analysis and random forest as meta-classifiers; and a recently developed method called alpha integration. The testing dataset consisted of 89 page images drawn from the Passau dataset. The improved performance provided by the fusion methods in the specific experiments is interesting considering the complexity of the challenging problem approached. In terms of area under the receiver operating characteristic curve the best results were obtained by alpha integration. This method incorporates least mean square parameter optimization. The improvement is relevant in the context of the targeted problem.

关键词：

Two demonstrations of the machine translation applications to historical documents

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Domingo, Miguel Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

We present our demonstration of two machine translation applications to historical documents. The first task consists in generating a new version of a historical document, written in the modern version of its original language. The second application is limited to a document's orthography. It adapts the document's spelling to modern standards in order to achieve an orthography consistency and accounting for the lack of spelling conventions. We followed an interactive, adaptive framework that allows the user to introduce corrections to the system's hypothesis. The system reacts to these corrections by generating a new hypothesis that takes them into account. Once the user is satisfied with the system's hypothesis and validates it, the system adapts its model following an online learning strategy. This system is implemented following a client-server architecture. We developed a website which communicates with the neural models. All code is open-source and publicly available. © 2021, CC BY-SA.

关键词： History

Modernizing Historical Documents: a User Study

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Domingo, Miguel Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in human language and the linguistic properties of these documents. Given a historical document, modernization aims to generate a new version of it, written in the modern version of the document’s language. Its goal is to tackle the language barrier, decreasing the comprehension difficulty and making historical documents accessible to a broader audience. In this work, we proposed a new neural machine translation approach that profits from modern documents to enrich its systems. We tested this approach with both automatic and human evaluation, and conducted a user study. Results showed that modernization is successfully reaching its goal, although it still has room for improvement. Copyright © 2019, The Authors. All rights reserved.

关键词： History

Online learning for effort reduction in interactive neural machine translation

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the system must be revised by a human agent. This can be done in a post-editing stage or following an interactive machine translation protocol. We explore the incremental update of neural machine translation systems during the post-editing or interactive translation processes. Such modifications aim to incorporate the new knowledge, from the edited sentences, into the translation system. Updates to the model are performed on-the-fly, as sentences are corrected, via online learning techniques. In addition, we implement a novel interactive, adaptive system, able to react to single-character interactions. This system greatly reduces the human effort required for obtaining high-quality translations. In order to stress our proposals, we conduct exhaustive experiments varying the amount and type of data available for training. Results show that online learning effectively achieves the objective of reducing the human effort required during the post-editing or the interactive machine translation stages. Moreover, these adaptive systems also perform well in scenarios with scarce resources. We show that a neural machine translation system can be rapidly adapted to a specific domain, exclusively by means of online learning techniques. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep learning

Improved robustness to disfluencies in RNN-transducer based speech recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Mendelev, Valentin Raissi, Tina Camporese, Guglielmo Giollo, Manuel Amazon Alexa United States Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Department of Mathematics "Tullio Levi-Civita" University of Padova Italy

Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech

Visual Modeling and Feature Adaptation in Sign language recognition

学校读者我要写书评

暂无评论

Visual Modeling and Feature Adaptation in Sign Language Reco...

Informationstechnische Gesellschaft-FachtagungAachen, Germany Sprach-kommunikation

作者： Philippe Dreuw Hermann Ney Human Language Technology and Pattern Recognition RWTH Aachen University

We propose a tracking adaptation to recover from early tracking errors in sign language recognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign language recognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. Similar to speaker dependent feature adaptation methods in automatic speech recognition, we propose an automatic visual alignment of signers for vision-based sign language recognition. Furthermore, the generation of additional virtual training samples is proposed to reduce the lack of data problem in sign language processing, which often leads to "one-shot" trained models. Most state-of-the- art systems are speaker dependent, and consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.

关键词： Levee Adaptation model

Tailored Design of Audio-Visual Speech recognition Models using Branchformers

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gimeno-Gómez, David Martínez-Hinarejos, Carlos D. Pattern Recognition and Human Language Technology research center Universitat Politècnica de València Camino de Vera s/n València46022 Spain

Recent advances in Audio-Visual Speech recognition (AVSR) have led to unprecedented achievements in the field, improving the robustness of this type of system in adverse, noisy environments. In most cases, this task has been addressed through the design of models composed of two independent encoders, each dedicated to a specific modality. However, while recent works have explored unified audio-visual encoders, determining the optimal cross-modal architecture remains an ongoing challenge. Furthermore, such approaches often rely on models comprising vast amounts of parameters and high computational cost training processes. In this paper, we aim to bridge this research gap by introducing a novel audio-visual framework. Our proposed method constitutes, to the best of our knowledge, the first attempt to harness the flexibility and interpretability offered by encoder architectures, such as the Branchformer, in the design of parameter-efficient AVSR systems. To be more precise, the proposed framework consists of two steps: first, estimating audio- and video-only systems, and then designing a tailored audiovisual unified encoder based on the layer-level branch scores provided by the modality-specific models. Extensive experiments on English and Spanish AVSR benchmarks covering multiple data conditions and scenarios demonstrated the effectiveness of our proposed method. Even when trained on a moderate scale of data, our models achieve competitive word error rates (WER) of approximately 2.5% for English and surpass existing approaches for Spanish, establishing a new benchmark with an average WER of around 9.1%. These results reflect how our tailored AVSR system is able to reach state-of-the-art recognition rates while significantly reducing the model complexity w.r.t. the prevalent approach in the field. Code and pre-trained models are available at https://***/david-gimeno/tailored-avsr. © 2024, CC BY-NC-ND.

关键词： Speech enhancement