检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Verma, Neha Murray, Kenton Duh, Kevin Center for Language and Speech Processing United States Human Language Technology Center Excellence Johns Hopkins University United States

With the ubiquity of large deep learning models and their growing number of use cases, the need for high-quality compression techniques is growing in order to deploy these models widely across diverse hardware and memory settings. In this work, we present a novel approach to model compression by merging parameter groups within a model, rather than pruning away less important parameters. Specifically, we select, align, and merge separate feed-forward sublayers in Transformer models, and test our method on language modeling, image classification, and machine translation. With our method, we demonstrate performance comparable to the original models while combining more than a third of model feed-forward sublayers, and demonstrate improved performance over a strong layer-pruning baseline. For instance, we can remove over 21% of total parameters from a vision transformer, while maintaining 99% of its original performance. Additionally, we observe that some groups of feed-forward sublayers exhibit high activation similarity, which may help explain their surprising mergeability. Copyright © 2025, The Authors. All rights reserved.

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

HLTCOE Submission to the VoicePrivacy Attacker Challenge

HLTCOE Submission to the VoicePrivacy Attacker Challenge

引用

2025 IEEE International Conference on Acoustics, speech, and Signal processing, ICASSP 2025

作者： Xinyuan, Henry Li Garg, Ashi Cai, Zexin Duh, Kevin García-Perera, Leibny Paola Khudanpur, Sanjeev Andrews, Nicholas Wiesner, Matthew Human Language Technology Center of Excellence Johns Hopkins University Baltimore United States

ISBN: (纸本)9798350368741

We describe our submission to the 2024 VoicePrivacy Attacker Challenge. We propose three main categories of methods to improve ASV performance against anonymized speech: improvements to the underlying classifier, alternative distance metrics when computing ASV scores, and kNN-VC normalization. By simultaneously employing one or more of these methods, we were able to achieve a significant reduction in EER against all of the submitted anonymization systems in the VoicePrivacy Challenge. © 2025 IEEE.

关键词： Automatic Speaker Verification speech Anonymization

来源：评论

学校读者我要写书评

暂无评论

Faux Polyglot: A Study on Information Disparity in Multilingual Large language Models

arXiv

引用

arXiv 2024年

作者： Sharma, Nikhil Murray, Kenton Xiao, Ziang Johns Hopkins University United States Center for Speech and Language Processing United States Human Language Technology Center for Excellence United States

Although the multilingual capability of LLMs offers new opportunities to overcome the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known occurrences? In this paper, we studied LLM’s linguistic preference in a cross-language RAG-based information search setting. We found that LLMs displayed systemic bias towards information in the same language as the query language in both document retrieval and answer generation. Furthermore, in scenarios where no information is in the language of the query, LLMs prefer documents in high-resource languages during generation, potentially reinforcing the dominant views. Such bias exists for both factual and opinion-based queries. Our results highlight the linguistic divide within multilingual LLMs in information search systems. The seemingly beneficial multilingual capability of LLMs may backfire on information parity by reinforcing language-specific information cocoons or filter bubbles further marginalizing low-resource views. Copyright © 2024, The Authors. All rights reserved.

关键词： Structured Query language

来源：评论

学校读者我要写书评

暂无评论

Adapting Self-Supervised Models to Multi-Talker speech Recognition Using Speaker Embeddings

Adapting Self-Supervised Models to Multi-Talker Speech Recog...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Zili Huang Desh Raj Paola García Sanjeev Khudanpur Center for Language and Speech Processing and Human Language Technology Center of Excellence Johns Hopkins University Baltimore USA

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios — possibly due to the domain mismatch — which severely limits their use for such applications. In this paper, we investigate the adaptation of upstream SSL models to the multi-talker automatic speech recognition (ASR) task under two conditions. First, when segmented utterances are given, we show that adding a target speaker extraction (TSE) module based on enrollment embeddings is complementary to mixture-aware pre-training. Second, for unsegmented mixtures, we propose a novel joint speaker modeling (JSM) approach, which aggregates information from all speakers in the mixture through their embeddings. With controlled experiments on Libri2Mix, we show that using speaker embeddings provides relative WER improvements of 9.1% and 42.1% over strong baselines for the segmented and unsegmented cases, respectively. We also demonstrate the effectiveness of our models for real conversational mixtures through experiments on the AMI dataset. Our code and models are open-sourced on https://***/HuangZiliAndy/SSL_for_multitalker.

关键词： Adaptation models Codes Aggregates Self-supervised learning Signal processing Acoustics Task analysis

来源：评论

学校读者我要写书评

暂无评论

Benchmarking Visually-Situated Translation of Text in Natural Images 9

Benchmarking Visually-Situated Translation of Text in Natura...

引用

9th Conference on Machine Translation, WMT 2024

作者： Salesky, Elizabeth Koehn, Philipp Post, Matt Johns Hopkins University United States Human Language Technology Center of Excellence United States Microsoft United States

ISBN: (纸本)9798891761797

We introduce a benchmark, VISTRA, for visually-situated translation of English text in natural images to four target languages. We describe the dataset construction and composition. We benchmark open-source and commercial OCR and MT models on VISTRA, and present both quantitative results and a taxonomy of common OCR error classes with their effect on downstream MT. Finally, we assess direct image-to-text translation with a multimodal LLM, and show that it is able in some cases but not yet consistently to disambiguate possible translations with visual context. We show that this is an unsolved and challenging task even for strong commercial models. We hope that the creation and release of this benchmark which is the first of its kind for these language pairs will encourage further research in this direction. ©2024 Association for Computational Linguistics.

关键词： Translation (languages)

来源：评论

学校读者我要写书评

暂无评论

PQLM - Multilingual Decentralized Portable Quantum language Model 48

PQLM - Multilingual Decentralized Portable Quantum Language ...

引用

48th IEEE International Conference on Acoustics, speech and Signal processing, ICASSP 2023

作者： Li, Shuyue Stella Zhang, Xiangyu Zhou, Shu Shu, Hongchao Liang, Ruixing Liu, Hexin Garcia, Leibny Paola Hong Kong University of Science and Technology Department of Physics Hong Kong Nanyang Technological University School of Electrical and Electronic Engineering Singapore Johns Hopkins University Center for Language and Speech Processing United States Johns Hopkins University Human Language Technology Center of Excellence United States

ISBN: (纸本)9781728163277

With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly portable quantum language model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with random Variational Quantum Classifiers (VQC) and local models for downstream applications. We demonstrate the ad hoc portability of the quantum model by extracting only the word embeddings and effectively applying them to downstream tasks on classical machines. Our PQLM exhibits comparable performance to its classical counterpart on both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy) metrics. We also perform ablation studies on the factors affecting PQLM performance to analyze model stability. Our work establishes a theoretical foundation for a portable quantum pre-trained language model that could be trained on private data and made available for public use with privacy protection guarantees. © 2023 IEEE.

关键词： Federated Learning language Modeling Model Portability Quantum Machine Learning

来源：评论

学校读者我要写书评

暂无评论

Extending Translate-Train for ColBERT-X to African language CLIR 15

Extending Translate-Train for ColBERT-X to African Language ...

引用

15th Forum for Information Retrieval Evaluation, FIRE 2023

作者： Yang, Eugene Lawrie, Dawn J. McNamee, Paul Mayfield, James Human Language Technology Center of Excellence Johns Hopkins University BaltimoreMD United States

This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting. © 2023 Copyright for this paper by its authors.

关键词： Fires

来源：评论

学校读者我要写书评

暂无评论

Recovering document annotations for sentence-level bitext

arXiv

引用

arXiv 2024年

作者： Wicks, Rachel Post, Matt Koehn, Philipp Human Language Technology Center of Excellence Johns Hopkins University United States Center of Language and Speech Processing Johns Hopkins University United States Microsoft United States

Data availability limits the scope of any given task. In machine translation, historical models were incapable of handling longer contexts, so the lack of document-level datasets was less noticeable. Now, despite the emergence of long-sequence methods, we remain within a sentence-level paradigm and without data to adequately approach context-aware machine translation. Most large-scale datasets have been processed through a pipeline that discards document-level metadata. In this work, we reconstruct document-level information for three (ParaCrawl, News Commentary, and Europarl) large datasets in German, French, Spanish, Italian, Polish, and Portuguese (paired with English). We then introduce a document-level filtering technique as an alternative to traditional bitext filtering. We present this filtering with analysis to show that this method prefers context-consistent translations rather than those that may have been sentence-level machine translated. Last we train models on these longer contexts and demonstrate improvement in document-level translation without degradation of sentence-level translation. We release our dataset, PARADOCS, and resulting models as a resource to the community. Copyright © 2024, The Authors. All rights reserved.

关键词： Large datasets

来源：评论

学校读者我要写书评

暂无评论

CMU’s IWSLT 2024 Offline speech Translation System: A Cascaded Approach For Long-Form Robustness 21

CMU’s IWSLT 2024 Offline Speech Translation System: A Casca...

引用

21st International Conference on Spoken language Translation, IWSLT 2024

作者： Yan, Brian Fernandes, Patrick Tian, Jinchuan Ouyang, Siqi Chen, William Livescu, Karen Li, Lei Neubig, Graham Watanabe, Shinji Language Technologies Institute Carnegie Mellon University United States Toyota Technological Institute at Chicago University of Chicago United States Human Language Technology Center of Excellence Johns Hopkins University United States

ISBN: (纸本)9798891761414

This work describes CMU’s submission to the IWSLT 2024 Offline speech Translation (ST) Shared Task for translating English speech to German, Chinese, and Japanese text. We are the first participants to employ a long-form strategy which directly processes unsegmented recordings without the need for a separate voice-activity detection stage (VAD). We show that the Whisper automatic speech recognition (ASR) model has a hallucination problem when applied out-of-the-box to recordings containing non-speech noises, but a simple noisy fine-tuning approach can greatly enhance Whisper’s long-form robustness across multiple domains. Then, we feed English ASR outputs into fine-tuned NLLB machine translation (MT) models which are decoded using COMET-based Minimum Bayes Risk. Our VAD-free ASR+MT cascade is tested on TED talks, TV series, and workout videos and shown to outperform prior winning IWSLT submissions and large open-source models. ©2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

SELF-SUPERVISED LEARNING WITH speech MODULATION DROPOUT

arXiv

引用

arXiv 2023年

作者： Sadhu, Samik Hermansky, Hynek Center for Language and Speech Processing Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States

We show that training a multi-headed self-attention-based deep network to predict deleted, information-dense 2-8 Hz speech modulations over a 1.5-second section of a speech utterance is an effective way to make machines learn to extract speech modulations using time-domain contextual information. Our work exhibits that, once trained on large volumes of unlabelled data, the outputs of the self-attention layers vary in time with a modulation peak at 4 Hz. These pre-trained layers can be used to initialize parts of an Automatic speech Recognition system to reduce its reliance on labeled speech data greatly. © 2023, CC BY.

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：