检索结果-内蒙古大学图书馆

Investigation on data adaptation techniques for neural named entity recognition 59

学校读者我要写书评

暂无评论

Investigation on data adaptation techniques for neural named...

2021 Student Research Workshop, SRW 2021 at the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural language Processing, ACL-IJCNLP 2021

作者： Tokarchuk, Evgeniia Thulke, David Wang, Weiyue Dugast, Christian Ney, Hermann Informatics Institute University of Amsterdam Netherlands Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Germany

ISBN: (纸本)9781954085558

Data processing is an important step in various natural language processing tasks. As the commonly used datasets in named entity recognition contain only a limited number of samples, it is important to obtain additional labeled data in an efficient and reliable manner. A common practice is to utilize large monolingual unlabeled corpora. Another popular technique is to create synthetic data from the original labeled data (data augmentation). In this work, we investigate the impact of these two methods on the performance of three different named entity recognition tasks. © 2021 Association for Computational Linguistics.

关键词： Data handling

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Xu, Jingjing Beck, Eugen Yang, Zijian Schlüter, Ralf Machine Learning and Human Language Technology Group RWTH Aachen University Germany AppTek GmbH Germany

ASR systems are deployed across diverse environments, each with specific hardware constraints. We use supernet training to jointly train multiple encoders of varying sizes, enabling dynamic model size adjustment to fit hardware constraints without redundant training. Moreover, we introduce a novel method called OrthoSoftmax, which applies multiple orthogonal softmax functions to efficiently identify optimal subnets within the supernet, avoiding resource-intensive search. This approach also enables more flexible and precise subnet selection by allowing selection based on various criteria and levels of granularity. Our results with CTC on Librispeech and TED-LIUM-v2 show that FLOPs-aware component-wise selection achieves the best overall performance. With the same number of training updates from one single job, WERs for all model sizes are comparable to or slightly better than those of individually trained models. Furthermore, we analyze patterns in the selected components and reveal interesting insights. Copyright © 2025, The Authors. All rights reserved.

关键词： Orthogonal functions

Right Label Context in End-to-End Training of Time-Synchronous ASR Models

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Raissi, Tina Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology Group RWTH Aachen University Germany AppTek GmbH Germany

关键词： Hidden Markov models

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression

学校读者我要写书评

暂无评论

Efficient Supernet Training with Orthogonal Softmax for Scal...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Jingjing Xu Eugen Beck Zijian Yang Ralf Schlüter Machine Learning and Human Language Technology Group RWTH Aachen University Germany AppTek GmbH Germany

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

关键词： Training Speech recognition Signal processing Hardware Acoustics Speech processing

Does Joint Training Really Help Cascaded Speech Translation?

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Tran, Viet Anh Khoa Thulke, David Gao, Yingbo Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation from the automatic speech recognition system still remain. To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods. In this work, we seek to answer the question of whether joint training really helps cascaded speech translation. We review recent papers on the topic and also investigate a joint training criterion by marginalizing the transcription posterior probabilities. Our findings show that a strong cascaded baseline can diminish any improvements obtained using joint training, and we suggest alternatives to joint training. We hope this work can serve as a refresher of the current speech translation landscape, and motivate research in finding more efficient and creative ways to utilize the direct data for speech translation. Copyright © 2022, The Authors. All rights reserved.

关键词： Speech recognition

ROBUST KNOWLEDGE DISTILLATION FROM RNN-T MODELS WITH NOISY TRAINING LABELS USING FULL-SUM LOSS

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zeineldeen, Mohammad Audhkhasi, Kartik Baskar, Murali Karthick Ramabhadran, Bhuvana Human Language Technology and Pattern Recognition Computer Science Department Rwth Aachen University Aachen52074 Germany Google Llc New York United States

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNNT) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNNT architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data. Copyright © 2023, The Authors. All rights reserved.

关键词： Recurrent neural networks

Revisiting Checkpoint Averaging for Neural Machine Translation

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Gao, Yingbo Herold, Christian Yang, Zijian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Germany

Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several checkpoints, the selection of which is mostly based on empirical recipes without many justifications. In this work, we revisit the concept of checkpoint averaging and consider several extensions. Specifically, we experiment with ideas such as using different checkpoint selection strategies, calculating weighted average instead of simple mean, making use of gradient information and fine-tuning the interpolation weights on development data. Our results confirm the necessity of applying checkpoint averaging for optimal performance, but also suggest that the landscape between the converged checkpoints is rather flat and not much further improvement compared to simple averaging is to be obtained. Copyright © 2022, The Authors. All rights reserved.

关键词： Neural machine translation

Is Encoder-Decoder Redundant for Neural Machine Translation?

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Gao, Yingbo Herold, Christian Yang, Zijian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation. Copyright © 2022, The Authors. All rights reserved.

关键词： Neural machine translation

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Raissi, Tina Lüscher, Christoph Berger, Simon Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology Group RWTH Aachen University Germany AppTek GmbH Germany

The ongoing research scenario for automatic speech recognition (ASR) envisions a clear division between end-to-end approaches and classic modular systems. Even though a high-level comparison between the two approaches in terms of their requirements and (dis)advantages is commonly addressed, a closer comparison under similar conditions is not readily available in the literature. In this work, we present a comparison focused on the label topology and training criterion. We compare two discriminative alignment models with hidden Markov model (HMM) and connectionist temporal classification topology, and two first-order label context ASR models utilizing factored HMM and strictly monotonic recurrent neural network transducer, respectively. We use different measurements for the evaluation of the alignment quality, and compare word error rate and real time factor of our best systems. Experiments are conducted on the LibriSpeech 960h and Switchboard 300h tasks. © 2024, CC BY.

关键词： Alignment