检索结果-内蒙古大学图书馆

conference on empirical methods in natural language processing (EMNLP)

作者： Fernandez, Jared Kahn, Jacob Na, Clara Bisk, Yonatan Strubell, Emma Carnegie Mellon Univ Language Technol Inst Pittsburgh PA 15213 USA Allen Inst Artificial Intelligence Seattle WA USA FAIR London England

ISBN: (纸本)9798891760608

Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in computational throughput and reductions in floating point operations have not directly translated to improvements in wall-clock inference latency. We demonstrate that these discrepancies can be largely attributed to bottlenecks introduced by deep learning frameworks. We denote this phenomenon as the framework tax, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. Code is available at https://***/JaredFern/Framework-Tax.

关键词： Digital arithmetic

来源：评论

学校读者我要写书评

暂无评论

Scaling Properties of Speech language Models

Scaling Properties of Speech Language Models

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Cuervo, Santiago Marxer, Ricard Université de Toulon Aix Marseille Université CNRS LIS Toulon France

ISBN: (纸本)9798891761643

Speech language Models (SLMs) aim to learn language from raw audio, without textual resources. Despite significant advances, our current models exhibit weak syntax and semantic abilities. However, if the scaling properties of neural language models hold for the speech modality, these abilities will improve as the amount of compute used for training increases. In this paper, we use models of this scaling behavior to estimate the scale at which our current methods will yield a SLM with the English proficiency of text-based Large language Models (LLMs). We establish a strong correlation between pre-training loss and downstream syntactic and semantic performance in SLMs and LLMs, which results in predictable scaling of linguistic performance. We show that the linguistic performance of SLMs scales up to three orders of magnitude more slowly than that of text-based LLMs. Additionally, we study the benefits of synthetic data designed to boost semantic understanding and the effects of coarser speech tokenization. © 2024 Association for Computational Linguistics.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

W2VPCA: A Machine Learning Method for Measuring Attitudes With natural language

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2024年第7期25卷 8063-8075页

作者： Stinson, Monique Mohammadian, Abolfazl Univ Illinois Dept Civil Engn Chicago IL 60607 USA Argonne Natl Lab Transportat & Power Syst Div Lemont IL 60439 USA US Dept Transportat Bur Transportat Stat Off Secretary Washington DC 20590 USA Univ Illinois Civil Mat & Environm Engn Dept Chicago IL 60607 USA

Company strategy influences many decisions in freight transportation. Behavioral models of company decision-making therefore could benefit from including strategy variables. However, strategy is difficult to observe and quantify. Attitudinal surveys of company executives can be used to collect measurements of latent strategy to use in quantitative models. However, surveys are costly and burdensome. Text mining methods to collect measurements overcome these issues somewhat, but typically require manual intervention and ignore the context of words, which can be problematic. This study introduces a new machine learning method to generate strategy measurement data from existing big text data. The new method, called W2VPCA, combines natural language processing and Principal Components Analysis. W2VPCA produces measurement data that serve as quantitative indicators of latent strategy in behavioral models. W2VPCA is unsupervised, data-driven, and uses information on word context. We apply W2VPCA to generate measurements of latent strategies using readily available, large-scale text data: annual company reports. The empirical measurements are used successfully to associate two latent strategies, one focusing on distribution and the other on products, with truck fleet and distribution center outsourcing decisions. The main empirical outcome is that the W2VPCA measurements outperform Bag-of-Words measurements in a psychometric analysis of latent firm strategies. While this study focuses on freight behavioral models, W2VPCA may also have applications in behavioral modeling in other domains.

关键词： Surveys Transportation Companies Mathematical models Data models Analytical models Principal component analysis Attitude behavioral model freight transportation latent variable machine learning natural language processing principal components analysis strategy

来源：评论

学校读者我要写书评

暂无评论

Understanding the Effect of Model Compression on Social Bias in Large language Models

Understanding the Effect of Model Compression on Social Bias...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Goncalves, Gustavo Strubell, Emma Carnegie Mellon Univ Language Technol Inst Pittsburgh PA 15213 USA Univ Nova Lisboa NOVA LINCS Lisbon Portugal Allen Inst Artificial Intelligence Seattle WA USA

ISBN: (纸本)9798891760608

Large language Models (LLMs) trained with self-supervision on vast corpora of web text fit to the social biases of that text. Without intervention, these social biases persist in the model's predictions in downstream tasks, leading to representational harm. Many strategies have been proposed to mitigate the effects of inappropriate social biases learned during pre-training. Simultaneously, methods for model compression have become increasingly popular to reduce the computational burden of LLMs. Despite the popularity and need for both approaches, little work has been done to explore the interplay between these two. We perform a carefully controlled study of the impact of model compression via quantization and knowledge distillation on measures of social bias in LLMs. Longer pretraining and larger models led to higher social bias, and quantization showed a regularizer effect with its best trade-off around 20% of the original pretraining time. (1)

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

Compressing and Debiasing Vision-language Pre-Trained Models for Visual Question Answering

Compressing and Debiasing Vision-Language Pre-Trained Models...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Si, Qingyi Liu, Yuanxin Lin, Zheng Fu, Peng Cao, Yanan Wang, Weiping Chinese Acad Sci Inst Informat Engn Beijing Peoples R China Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China Peking Univ Sch Comp Sci Natl Key Lab Multimedia Informat Proc Beijing Peoples R China

ISBN: (纸本)9798891760608

Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD) data. Second, they are inefficient in terms of memory footprint and computation. Although promising progress has been made in both problems, most existing works tackle them independently. To facilitate the application of VLP to VQA tasks, it is imperative to jointly study VLP compression and OOD robustness, which, however, has not yet been explored. This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks. To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules. Our experiments involve 3 VLPs, 2 compression methods, 4 training methods, 2 datasets and a range of sparsity levels. Our results show that there indeed exist sparse and robust subnetworks, which are competitive with the debiased full VLP and clearly outperform the debiasing SoTAs with fewer parameters on OOD datasets VQA-CP v2 and VQA-VS.(1)

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

language Representation Projection: Can We Transfer Factual Knowledge across languages in Multilingual language Models?

Language Representation Projection: Can We Transfer Factual ...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Xu, Shaoyang Li, Junzhuo Xiong, Deyi Tianjin Univ Sch New Media & Commun Tianjin Peoples R China Tianjin Univ Coll Intelligence & Comp Tianjin Peoples R China

ISBN: (纸本)9798891760608

Multilingual pretrained language models serve as repositories of multilingual factual knowledge. Nevertheless, a substantial performance gap of factual knowledge probing exists between high-resource languages and low-resource languages, suggesting limited implicit factual knowledge transfer across languages in multilingual pretrained language models. This paper investigates the feasibility of explicitly transferring relatively rich factual knowledge from English to non-English languages. To accomplish this, we propose two parameter-free language Representation Projection modules (LRP2). The first module converts non-English representations into English-like equivalents, while the second module reverts English-like representations back into representations of the corresponding non-English language. Experimental results on the mLAMA dataset demonstrate that LRP2 significantly improves factual knowledge retrieval accuracy and facilitates knowledge transferability across diverse non-English languages. We further investigate the working mechanism of LRP2 from the perspectives of representation space and cross-lingual knowledge neuron.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large language Model Tuning

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Rajabzadeh, Hossein Valipour, Mojtaba Zhu, Tianshu Tahaei, Marzieh Kwon, Hyock Ju Ghodsi, Ali Chen, Boxing Rezagholizadeh, Mehdi University of Waterloo Canada Huawei Noah’s Ark Lab Canada

ISBN: (纸本)9798891761667

Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank, therefore, cannot be reconfigured for its lower ranks without further finetuning steps. This paper proposes QDyLoRA -Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. Motivated by Dynamic LoRA, QDyLoRA is able to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32 GB V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank. © 2024 Association for Computational Linguistics.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

ACTOR: Active Learning with Annotator-specific Classificatio...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Wang, Xinpeng Plank, Barbara Ludwig Maximilians Univ Munchen Ctr Informat & Language Proc MaiNLP Munich Germany Munich Ctr Machine Learning MCML Munich Germany

ISBN: (纸本)9798891760608

Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving 70% of the annotation budget.

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

natural Evolution-based Dual-Level Aggregation for Temporal Knowledge Graph Reasoning

Natural Evolution-based Dual-Level Aggregation for Temporal ...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Chen, Bin Xiao, Chunjing Zhou, Fan University of Electronic Science and Technology of China Chengdu China Henan University Kaifeng China Kash Institute of Electronics and Information Industry Kashgar China

ISBN: (纸本)9798891761681

Temporal knowledge graph (TKG) reasoning aims to predict missing facts based on a given *** of the existing methods unifiedly model the evolution process of different events and ignore their inherent asynchronous characteristics, resulting in suboptimal *** tackle this challenge, we propose a natural Evolution-based Dual-level Aggregation framework (NEDA) for TKG ***, we design a natural division strategy to group TKGs into different patches according to the occurrence of a given target ***, we present a dual-level aggregation scheme to extract local representations from information within patches and then aggregate these representations with adaptive weights as the final entity *** assigning varying weights to different patches, this aggregation scheme can incorporate the asynchronous characteristics of event evolution for representation computation, thus enhancing prediction *** experiments demonstrate the significant improvement of our proposed model. © 2024 Association for Computational Linguistics.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

RECOVERING FROM PRIVACY-PRESERVING MASKING WITH LARGE language MODELS 49

RECOVERING FROM PRIVACY-PRESERVING MASKING WITH LARGE LANGUA...

引用

49th IEEE International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Vats, Arpita Liu, Zhe Sue, Peng Paul, Debjyoti Ma, Yingyi Pang, Yutong Ahmed, Zeeshan Kalinli, Ozlem Santa Clara Univ Santa Clara CA 95053 USA Meta Menlo Pk CA USA

ISBN: (纸本)9798350344868;9798350344851

Model adaptation is crucial to handle the discrepancy between proxy training data and actual users' data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.

关键词： Privacy-preserving machine learning language modeling large language models automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：