检索结果-内蒙古大学图书馆

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zhou, Han Wan, Xingchen Vulić, Ivan Korhonen, Anna Language Technology Lab University of Cambridge United Kingdom Machine Learning Research Group University of Oxford United Kingdom

Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (CLAPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, CLAPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning. Copyright © 2023, The Authors. All rights reserved.

关键词： Combinatorial optimization

AUTOPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zhou, Han Wan, Xingchen Vulić, Ivan Korhonen, Anna Language Technology Lab University of Cambridge United Kingdom Machine Learning Research Group University of Oxford United Kingdom

Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model finetuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AUTOPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AUTOPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs. Copyright © 2023, The Authors. All rights reserved.

关键词： Fast Fourier transforms

Chunked Attention-Based Encoder-Decoder Model for Streaming Speech Recognition

学校读者我要写书评

暂无评论

Chunked Attention-Based Encoder-Decoder Model for Streaming ...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Mohammad Zeineldeen Albert Zeyer Ralf Schlüter Hermann Ney Computer Science Department Machine Learning and Human Language Technology RWTH Aachen University Germany AppTek GmbH Germany

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transducer model that operates on chunks instead of frames, where EOC corresponds to the blank symbol. We further explore the remaining differences between a standard transducer and our model. Additionally, we examine relevant aspects such as long-form speech generalization, beam size, and length normalization. Through experiments on Librispeech and TED-LIUM-v2, and by concatenating consecutive sequences for long-form trials, we find that our streamable model maintains competitive performance compared to the non-streamable variant and generalizes very well to long-form speech.

关键词：

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

学校读者我要写书评

暂无评论

On the Relevance of Phoneme Duration Variability of Synthesi...

IEEE Workshop on Automatic Speech Recognition and Understanding

作者： Nick Rossenbach Benedikt Hilmes Ralf Schlüter Computer Science Departement Machine Learning and Human Language Technology RWTH Aachen University Germany AppTek GmbH Germany

关键词：

Fairer Preferences Elicit Improved human-Aligned Large language Model Judgments

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Zhou, Han Wan, Xingchen Liu, Yinhong Collier, Nigel Vulić, Ivan Korhonen, Anna Language Technology Lab University of Cambridge United Kingdom Machine Learning Research Group University of Oxford United Kingdom

Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human judgments. To this end, we propose a zero-shot learning objective based on the preference decision fairness. ZEPO demonstrates substantial performance improvements over state-of-the-art LLM evaluators, without requiring labeled data, on representative meta-evaluation benchmarks. Our findings underscore the critical correlation between preference fairness and human alignment, positioning ZEPO as an efficient prompt optimizer for bridging the gap between LLM evaluators and human judgments. Copyright © 2024, The Authors. All rights reserved.

关键词： Zero-shot learning

Analyzing And Improving Neural Speaker Embeddings for ASR

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Lüscher, Christoph Xu, Jingjing Zeineldeen, Mohammad Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, only a few inconclusive studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a Conformer-based hybrid HMM ASR system. For ASR, our improved embedding extraction pipeline in combination with the Weighted-Simple-Add integration method results in x-vector and c-vector reaching on par performance with i-vectors. We further analyze, compare and combine different speaker embeddings. We improve our already strong baseline by switching to one cycle learning schedule while reducing the training time. By further adding neural speaker embeddings, we gain additional improvements. This results in our best Conformer-based hybrid ASR system with speaker embeddings achieving 9.0% WER on Hub5'00 and Hub5'01 while only training on SWB 300h. Copyright © 2023, The Authors. All rights reserved.

关键词： Embeddings

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Lüscher, Christoph Zeineldeen, Mohammad Yang, Zijian Raissi, Tina Vieting, Peter Le-Duc, Khai Wang, Weiyue Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology RWTH Aachen University Aachen52072 Germany AppTek GmbH Aachen52062 Germany

language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties, and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic-, Vietnamese-, or Ukrainian-speaking patient. Currently, a doctor can call the Triaphon service to get assistance from an interpreter in order to help facilitate communication. The HYKIST goal is to support the usually nonprofessional bilingual interpreter with an automatic speech translation system to improve patient care and help overcome language barriers. In this work, we present our ASR system development efforts for this conversational telephone speech translation task in the medical domain for two language pairs, data collection, various acoustic model architectures, and dialect-induced difficulties. Copyright © 2022, The Authors. All rights reserved.

关键词： Translation (languages)

Investigating The Effect of language Models in Sequence Discriminative Training For Neural Transducers

学校读者我要写书评

暂无评论

Investigating The Effect of Language Models in Sequence Disc...

IEEE Workshop on Automatic Speech Recognition and Understanding

作者： Zijian Yang Wei Zhou Ralf Schlüter Hermann Ney Computer Science Department Machine Learning and Human Language Technology RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-context dependency. This approximation can be extended to arbitrary context length and enables the usage of word-level LMs in lattice-free methods. Moreover, a systematic comparison is conducted across lattice-free and N-best-list-based methods. Experimental results on Librispeech show that using the word-level LM in training outperforms the phoneme-level LM. Besides, we find that the context size of the LM used for probability computation has a limited effect on performance. Moreover, our results reveal the pivotal importance of the hypothesis space quality in sequence discriminative training.

关键词：

End-To-End Training of a Neural HMM with Label and Transition Probabilities

学校读者我要写书评

暂无评论

End-To-End Training of a Neural HMM with Label and Transitio...

IEEE Workshop on Automatic Speech Recognition and Understanding

作者： Daniel Mann Tina Raissi Wilfried Michel Ralf Schlüter Hermann Ney AppTek GmbH Aachen Germany Machine Learning and Human Language Technology Computer Science Department RWTH Aachen University Aachen Germany

We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration *** implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition *** investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.

关键词：