Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in comput...
详细信息
ISBN:
(纸本)9798891760608
Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in computational throughput and reductions in floating point operations have not directly translated to improvements in wall-clock inference latency. We demonstrate that these discrepancies can be largely attributed to bottlenecks introduced by deep learning frameworks. We denote this phenomenon as the framework tax, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. Code is available at https://***/JaredFern/Framework-Tax.
Speech language Models (SLMs) aim to learn language from raw audio, without textual resources. Despite significant advances, our current models exhibit weak syntax and semantic abilities. However, if the scaling prope...
详细信息
Company strategy influences many decisions in freight transportation. Behavioral models of company decision-making therefore could benefit from including strategy variables. However, strategy is difficult to observe a...
详细信息
Company strategy influences many decisions in freight transportation. Behavioral models of company decision-making therefore could benefit from including strategy variables. However, strategy is difficult to observe and quantify. Attitudinal surveys of company executives can be used to collect measurements of latent strategy to use in quantitative models. However, surveys are costly and burdensome. Text mining methods to collect measurements overcome these issues somewhat, but typically require manual intervention and ignore the context of words, which can be problematic. This study introduces a new machine learning method to generate strategy measurement data from existing big text data. The new method, called W2VPCA, combines naturallanguageprocessing and Principal Components Analysis. W2VPCA produces measurement data that serve as quantitative indicators of latent strategy in behavioral models. W2VPCA is unsupervised, data-driven, and uses information on word context. We apply W2VPCA to generate measurements of latent strategies using readily available, large-scale text data: annual company reports. The empirical measurements are used successfully to associate two latent strategies, one focusing on distribution and the other on products, with truck fleet and distribution center outsourcing decisions. The main empirical outcome is that the W2VPCA measurements outperform Bag-of-Words measurements in a psychometric analysis of latent firm strategies. While this study focuses on freight behavioral models, W2VPCA may also have applications in behavioral modeling in other domains.
Large language Models (LLMs) trained with self-supervision on vast corpora of web text fit to the social biases of that text. Without intervention, these social biases persist in the model's predictions in downstr...
详细信息
ISBN:
(纸本)9798891760608
Large language Models (LLMs) trained with self-supervision on vast corpora of web text fit to the social biases of that text. Without intervention, these social biases persist in the model's predictions in downstream tasks, leading to representational harm. Many strategies have been proposed to mitigate the effects of inappropriate social biases learned during pre-training. Simultaneously, methods for model compression have become increasingly popular to reduce the computational burden of LLMs. Despite the popularity and need for both approaches, little work has been done to explore the interplay between these two. We perform a carefully controlled study of the impact of model compression via quantization and knowledge distillation on measures of social bias in LLMs. Longer pretraining and larger models led to higher social bias, and quantization showed a regularizer effect with its best trade-off around 20% of the original pretraining time. (1)
Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to general...
详细信息
ISBN:
(纸本)9798891760608
Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD) data. Second, they are inefficient in terms of memory footprint and computation. Although promising progress has been made in both problems, most existing works tackle them independently. To facilitate the application of VLP to VQA tasks, it is imperative to jointly study VLP compression and OOD robustness, which, however, has not yet been explored. This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks. To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules. Our experiments involve 3 VLPs, 2 compression methods, 4 training methods, 2 datasets and a range of sparsity levels. Our results show that there indeed exist sparse and robust subnetworks, which are competitive with the debiased full VLP and clearly outperform the debiasing SoTAs with fewer parameters on OOD datasets VQA-CP v2 and VQA-VS.(1)
Multilingual pretrained language models serve as repositories of multilingual factual knowledge. Nevertheless, a substantial performance gap of factual knowledge probing exists between high-resource languages and low-...
详细信息
ISBN:
(纸本)9798891760608
Multilingual pretrained language models serve as repositories of multilingual factual knowledge. Nevertheless, a substantial performance gap of factual knowledge probing exists between high-resource languages and low-resource languages, suggesting limited implicit factual knowledge transfer across languages in multilingual pretrained language models. This paper investigates the feasibility of explicitly transferring relatively rich factual knowledge from English to non-English languages. To accomplish this, we propose two parameter-free language Representation Projection modules (LRP2). The first module converts non-English representations into English-like equivalents, while the second module reverts English-like representations back into representations of the corresponding non-English language. Experimental results on the mLAMA dataset demonstrate that LRP2 significantly improves factual knowledge retrieval accuracy and facilitates knowledge transferability across diverse non-English languages. We further investigate the working mechanism of LRP2 from the perspectives of representation space and cross-lingual knowledge neuron.
Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this ...
详细信息
Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from in...
详细信息
ISBN:
(纸本)9798891760608
Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving 70% of the annotation budget.
Temporal knowledge graph (TKG) reasoning aims to predict missing facts based on a given *** of the existing methods unifiedly model the evolution process of different events and ignore their inherent asynchronous char...
详细信息
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users' data received. To effectively perform adaptation, textual data of users is typically stored on servers or their l...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users' data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream naturallanguageprocessing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.
暂无评论