检索结果-内蒙古大学图书馆

Ground-truth generation through crowdsourcing with probabilistic indexes

Neural Computing and Applications 2024年第30期36卷 18879-18895页

作者： Sánchez, Joan Andreu Vidal, Enrique Bosch, Vicente Quirós, Lorenzo Pattern Recognition and Human Language Technologies Center Universitat Politècnica de València València46022 Spain tranSkriptorium AI València Spain

Automatic transcription of large series of historical handwritten documents generally aims at allowing to search for textual information in these documents. However, automatic transcripts often lack the level of accuracy needed for reliable text indexing and search purposes. Probabilistic Indexing (PrIx) offers a unique alternative to raw transcripts. Since it needs training data to achieve good search performance, PrIx-based crowdsourcing techniques are introduced in this paper to gather the required data. In the proposed approach, PrIx confidence measures are used to drive a correction process in which users can amend errors and possibly add missing text. In a further step, corrected data are used to retrain the PrIx models. Results on five large series are reported which show consistent improvements after retraining. However, it can be argued whether the overall costs of the crowdsourcing operation pay off for the improvements, or perhaps it would have been more cost-effective to just start with a larger and cleaner amount of professionally produced training transcripts. © The Author(s) 2024.

关键词： Crowdsourcing

来源：评论

学校读者我要写书评

暂无评论

Comparison of Different Neural Network Architectures for Spoken language Identification 15

Comparison of Different Neural Network Architectures for Spo...

引用

15th ITG Conference on Speech Communication

作者： Bazazo, Tala Zeineldeen, Mohammad Plahl, Christian Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition RWTH Aachen University Germany eBay Aachen Germany

ISBN: (纸本)9783800761654

This paper compares different neural network based architectures on the spoken language identification task. To our best knowledge such a comparison of different models on the same dataset and the same set of languages does not yet exist. We incorporate 7 different models which include the latest architectures: a spectral images based Resnet model, a Convolutional Neural Network, a Bi-directional Long Short-Term Memory, a Convolutional Recurrent Neural Network, Wav2Vec 2.0, a transformer and a conformer. We also tackle audio with background noise and music by training on data with similar accoustics. We finally also show that our models generalize well on third-party data. © VDE VERLAG GMBH Berlin Offenbach.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

Document-Level language Models for Machine Translation 8

Document-Level Language Models for Machine Translation

引用

8th Conference on Machine Translation, WMT 2023

作者： Petrick, Frithjof Herold, Christian Petrushkov, Pavel Khadivi, Shahram Ney, Hermann eBay Inc. Aachen Germany Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (纸本)9798891760417

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination. © 2023 Association for Computational Linguistics.

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

Improving Long Context Document-Level Machine Translation 4

Improving Long Context Document-Level Machine Translation

引用

4th Workshop on Computational Approaches to Discourse, CODI 2023

作者： Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781959429890

Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many works have been published on the topic of document-level NMT, but most restrict the system to only local context, typically including just the one or two preceding sentences as additional information. This might be enough to resolve some ambiguous inputs, but it is probably not sufficient to capture some document-level information like the topic or style of a conversation. When increasing the context size beyond just the local context, there are two challenges: (i) the memory usage increases exponentially (ii) the translation performance starts to degrade. We argue that the widely-used attention mechanism is responsible for both issues. Therefore, we propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption. For evaluation, we utilize targeted test sets in combination with novel evaluation techniques to analyze the translations in regards to specific discourse-related phenomena. We find that our approach is a good compromise between sentence-level NMT vs attending to the full context, especially in low resource scenarios. © 2023 Association for Computational Linguistics.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech recognition 30

Comparison of Conventional Hybrid and CTC/Attention Decoders...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on language Resources and Evaluation, LREC-COLING 2024

作者： Gimeno-Gómez, David Martínez-Hinarejos, Carlos D. Pattern Recognition and Human Language Technologies Research Center Universitat Politècnica de València Camino de Vera s/n València46022 Spain

ISBN: (纸本)9782493814104

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems are usually based on encoder-decoder architectures. While encoders are somewhat general, multiple decoding approaches have been explored, such as the conventional hybrid model based on Deep Neural Networks combined with Hidden Markov Models (DNN-HMM) or the Connectionist Temporal Classification (CTC) paradigm. However, there are languages and tasks in which data is scarce, and in this situation, there is not a clear comparison between different types of decoders. Therefore, we focused our study on how the conventional DNN-HMM decoder and its state-of-the-art CTC/Attention counterpart behave depending on the amount of data used for their estimation. We also analyzed to what extent our visual speech features were able to adapt to scenarios for which they were not explicitly trained, either considering a similar dataset or another collected for a different language. Results showed that the conventional paradigm reached recognition rates that improve the CTC/Attention model in data-scarcity scenarios along with a reduced training time and fewer parameters. © 2024 ELRA language Resource Association: CC BY-NC 4.0.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Enhancing and Adversarial: Improve ASR with Speaker Labels 48

Enhancing and Adversarial: Improve ASR with Speaker Labels

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Zhou, Wei Wu, Haotian Xu, Jingjing Zeineldeen, Mohammad Luscher, Christoph Schluter, Ralf Ney, Hermann Rwth Aachen University Human Language Technology and Pattern Recognition Computer Science Department Aachen52074 Germany AppTek GmbH Aachen52062 Germany

ISBN: (纸本)9781728163277

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN. © 2023 IEEE.

关键词： Linearization

来源：评论

学校读者我要写书评

暂无评论

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers 48

Lattice-Free Sequence Discriminative Training for Phoneme-Ba...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Yang, Zijian Zhou, Wei Schluter, Ralf Ney, Hermann Rwth Aachen University Human Language Technology and Pattern Recognition Computer Science Department Aachen52074 Germany AppTek GmbH Aachen52062 Germany

ISBN: (纸本)9781728163277

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance. © 2023 IEEE.

关键词： neural transducer sequence discriminative training Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Limitations and Challenges of Unsupervised Cross-lingual Pre-training 15

Limitations and Challenges of Unsupervised Cross-lingual Pre...

引用

15th Conference of the Association for Machine Translation in the Americas, AMTA 2022

作者： Zaragoza, Martín Quesada Casacuberta, Francisco Research Center of Pattern Recognition and Human Language Technology Universitat Politècnica de València Valencia46022 Spain

Cross-lingual alignment methods for monolingual language representations have received notable attention in recent years. However, their use in machine translation pre-training remains scarce. This work tries to shed light on the effects of some of the factors that play a role in cross-lingual pre-training, both for cross-lingual mappings and their integration in supervised neural models. The results show that unsupervised cross-lingual methods are effective at inducing alignment even for distant languages and they benefit noticeably from subword information. However, we find that their effectiveness as pre-training models in machine translation is severely limited due to their cross-lingual signal being easily distorted by the principal network during training. Moreover, the learned bilingual projection is too restrictive to allow said network to learn properly when the embedding weights are frozen. © AMTA 2022 - 15th Conference of the Association for Machine Translation in the Americas, Proceedings.

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss 48

Robust Knowledge Distillation from RNN-T Models with Noisy T...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Zeineldeen, Mohammad Audhkhasi, Kartik Baskar, Murali Karthick Ramabhadran, Bhuvana Rwth Aachen University Human Language Technology and Pattern Recognition Computer Science Department Aachen52074 Germany Google Llc New York United States

ISBN: (纸本)9781728163277

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNNT architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data. © 2023 IEEE.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

Conspiracy vs Critical Thinking Using an Ensemble of Transformers with Data Augmentation Techniques 25

Conspiracy vs Critical Thinking Using an Ensemble of Transfo...

引用

25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024

作者： Tulbure, Angelo Maximilian Ardanuy, Mariona Coll Universitat Politècnica de València València Spain Politecnico di Milano Milan Italy Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

This paper provides an overview of our contributions to the PAN at CLEF2024 Oppositional thinking analysis shared task, which focuses on distinguishing between conspiratorial and critical thinking narratives. The competition featured two main tasks. The first task is a binary classification task that aims at determining whether a text is conspiratorial or critical. The second task is a span-level detection task, in which the goal is to detect elements of oppositional narratives in the texts. Two annotated datasets, one in English and one in Spanish, were provided, each of 5K telegram comments. Our best-performing approaches combined custom fine-tuned Transformer models with data augmentation techniques. We achieved an F1-Score of 0.8917 for English and of 0.8293 for Spanish for task 1, and a span-F1 score of 0.6279 for English and 0.6129 for Spanish for task 2. Our task 2 approach achieved the best results in the shared task for both English and Spanish. © 2024 Copyright for this paper by its authors.

关键词： binary classification conspiracy theories critical thinking data augmentation ensembling models oppositional thinking PAN 2024 token classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：