检索结果-内蒙古大学图书馆

LATTICE-FREE SEQUENCE DISCRIMINATIVE TRAINING FOR PHONEME-BASED NEURAL TRANSDUCERS

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Yang, Zijian Zhou, Wei Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance. Copyright © 2022, The Authors. All rights reserved.

关键词： Transducers

Efficient Training of Neural Transducer for Speech Recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zhou, Wei Michel, Wilfried Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-Transducer has achieved evolving performance with more and more sophisticated neural network models of growing size and increasing training epochs. While strong computation resources seem to be the prerequisite of training superior models, we try to overcome it by carefully designing a more efficient training pipeline. In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period. The effectiveness of each stage is experimentally verified on both Librispeech and Switchboard corpora. The proposed pipeline is able to train transducer models approaching state-of-the-art performance with a single GPU in just 2-3 weeks. Our best conformer transducer achieves 4.1% WER on Librispeech test-other with only 35 epochs of training. Copyright © 2022, The Authors. All rights reserved.

关键词： Speech recognition

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zeineldeen, Mohammad Xu, Jingjing Lüscher, Christoph Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformer-based hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative. Copyright © 2022, The Authors. All rights reserved.

关键词： Speech recognition

MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zeyer, Albert Schmitt, Robin Zhou, Wei Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52062 Germany AppTek GmbH Aachen52062 Germany

We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, one specifically taking the segmental nature into account, yielding further improvements. Using time-synchronous decoding for segmental models is novel and a step towards streaming applications. Our experiments show the importance of a length model to predict the segment boundaries. The final best segmental-attention model using segmental decoding performs better than global-attention, in contrast to other monotonic attention approaches in the literature. Further, we observe that the segmental model generalizes much better to long sequences of up to several minutes. © 2022, CC BY-SA.

关键词： Decoding

Language Model Pre-training on True Negatives

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zhang, Zhuosheng Zhao, Hai Utiyama, Masao Sumita, Eiichiro Department of Computer Science and Engineering Shanghai Jiao Tong University China Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai Jiao Tong University Shanghai China Kyoto Japan

Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones. Taking the former text as positive and the latter as negative samples, the PLM can be trained effectively for contextualized representation. However, the training of such a type of PLMs highly relies on the quality of the automatically constructed samples. Existing PLMs simply treat all corrupted texts as equal negative without any examination, which actually lets the resulting model inevitably suffer from the false negative issue where training is carried out on pseudo-negative data and leads to less efficiency and less robustness in the resulting PLMs. In this work, on the basis of defining the false negative issue in discriminative PLMs that has been ignored for a long time, we design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives by correcting the harmful gradient updates subject to false negative predictions. Experimental results on GLUE and SQuAD benchmarks show that our counter-false-negative pre-training methods indeed bring about better performance together with stronger robustness. Copyright © 2022, The Authors. All rights reserved.

关键词： Forecasting

Instance Regularization for Discriminative Language Model Pre-training

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zhang, Zhuosheng Zhao, Hai Zhou, Ming Department of Computer Science and Engineering Shanghai Jiao Tong University China Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai Jiao Tong University China Langboat Technology China

Discriminative pre-trained language models (PrLMs) can be generalized as denoising auto-encoders that work with two procedures, ennoising and denoising. First, an ennoising process corrupts texts with arbitrary noising functions to construct training instances. Then, a denoising language model is trained to restore the corrupted tokens. Existing studies have made progress by optimizing independent strategies of either ennoising or denosing. They treat training instances equally throughout the training process, with little attention on the individual contribution of those instances. To model explicit signals of instance contribution, this work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training. The estimations involve the corruption degree in the ennoising data construction process and the prediction confidence in the denoising counterpart. Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness. Copyright © 2022, The Authors. All rights reserved.

关键词： Computational linguistics

Towards expert gaze modeling and recognition of a user’s attention in realtime

学校读者我要写书评

暂无评论

Procedia computer science 2020年 176卷 2020-2029页

作者： Nora Castner Lea Geßler David Geisler Fabian Hüttig Enkelejda Kasneci Human-Computer Interaction University of Tübingen Germany Institute of Computer Science University of Tübingen Germany Department of Prosthodontics University Hospital Tübingen Germany

One of the appealing areas of expertise research is devoted to measuring the effectiveness of training programs for novices. With recent progress in eye tracking, gaze-based interaction systems recognize a user’s attention and can direct it accordingly. Moreover, dynamic visualization of an expert gaze model facilitates novice training by guiding the gaze to relevant areas. In addition, the system should be aware of realtime attention to remove an overlay that could occlude relevant information. We use an implementation of subtle gaze direction (SGD) and the simplified scanpath of a dentist to train naive participants in finding anomalies in dental radiographs. We were able to effectively direct user gaze to relevant image features without occluding the area when attention was recognized. Additionally, participants reported that the intervention was helpful for image inspection. The results of the model intervention show minimal improvements in anomaly detection, which is expected of naive subjects. We advocate that the system has the potential to be highly effective for advanced students and trainees with a certain foundation of conceptual knowledge.

关键词： Eye Tracking Gaze-based interaction Attention Guiding Expertise Learning

Reducing Uncertainty and Offering Comfort: Designing Technology for Coping with Interpersonal Racism 21

学校读者我要写书评

暂无评论

Reducing Uncertainty and Offering Comfort: Designing Technol...

Proceedings of the 2021 CHI Conference on human Factors in Computing Systems

作者： Alexandra To Hillary Carey Geoff Kaufman Jessica Hammer Art + Design and Khoury College of Computer Science Northeastern University United States School of Design Carnegie Mellon University United States Human-Computer Interaction Institute Carnegie Mellon University United States

ISBN: (纸本)9781450380966

Ranging from subtle to overt, unintentional to systemic, navigating racism is additional everyday work for many people. Yet the needs of people who experience racism have been overlooked as a fertile ground for better technology. Through a series of workshops we call Foundational Fiction, we engaged BIPOC (Black, Indigenous, People of Color) in participatory design to identify qualities of technology that can support people coping before, during, and after a racist interaction. Participants developed storyboards for digital tools that offer advice, predict consequences, identify racist remarks and intervene, educate both targets and perpetrators about interpersonal and systemic racism, and more. In the paper we present our workshop method utilizing interactive fiction, participants’ design concepts, prevalent themes (reducing uncertainty and offering comfort), and we provide critical analysis of the complexity of technology in these contexts. This work identifies specific opportunities for exploring anti-racist social tools.

关键词： interactive fiction design workshops microaggressions participatory design racism uncertainty

Position: social choice should guide AI alignment in dealing with diverse human feedback 24

学校读者我要写书评

暂无评论

Position: social choice should guide AI alignment in dealing...

Proceedings of the 41st International Conference on Machine Learning

作者： Vincent Conitzer Rachel Freedman Jobst Heitzig Wesley H. Holliday Bob M. Jacobs Nathan Lambert Milan Mossé Eric Pacuit Stuart Russell Hailey Schoelkopf Emanuel Tewolde William S. Zwicker Foundations of Cooperative AI Lab Computer Science Department Carnegie Mellon University Pittsburgh and Institute for Ethics in AI University of Oxford Oxford UK Center for Human-Compatible AI Department of Electrical Engineering and Computer Sciences University of California Berkeley Potsdam Institute for Climate Impact Research Potsdam Brandenburg Germany Department of Philosophy University of California Berkeley Department of Philosophy and Moral Sciences Ghent University Ghent Begium Allen Institute for AI Berkeley California Department of Philosophy University of Maryland College Park EleutherAI Foundations of Cooperative AI Lab Computer Science Department Carnegie Mellon University Pittsburgh Department of Mathematics Union College Schenectady and Murat Sertel Center for Advanced Economic Studies Istanbul Bilgi University Istanbul Turkey

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.

关键词：