Warning: This paper contains content and language that may be considered offensive to some readers. While biases disadvantaging African American language (AAL) have been uncovered in models for tasks such as speech re...
详细信息
ISBN:
(纸本)9798891760608
Warning: This paper contains content and language that may be considered offensive to some readers. While biases disadvantaging African American language (AAL) have been uncovered in models for tasks such as speech recognition and toxicity detection, there has been little investigation of these biases for language generation models like ChatGPT. We evaluate how well LLMs understand AAL in comparison to White Mainstream English (WME), the encouraged "standard" form of English taught in American classrooms. We measure large language model performance on two tasks: a counterpart generation task, where a model generates AAL given WME and vice versa, as well as a masked span prediction (MSP) task, where models predict a phrase hidden from their input. Using a novel dataset of AAL texts from a variety of regions and contexts, we present evidence of dialectal bias for six pre-trained LLMs through performance gaps on these tasks.
作者:
Ye, XinLin, HugoFudan Univ
Inst Global Publ Policy 220 Handan Rd Shanghai 200433 Peoples R China Fudan Univ
LSE Fudan Res Ctr Global Publ Policy 220 Handan Rd Shanghai 200433 Peoples R China Paris Saclay Univ
Cent Supelec F-91192 Paris France
Purpose of Review This review aimed to systematically synthesize the global evidence base for natural disasters and human health using naturallanguageprocessing (NLP) techniques. Recent Findings We searched Embase, ...
详细信息
Purpose of Review This review aimed to systematically synthesize the global evidence base for natural disasters and human health using naturallanguageprocessing (NLP) techniques. Recent Findings We searched Embase, PubMed, Scopus, PsycInfo, and Web of Science Core Collection, using titles, abstracts, and keywords, and included only literature indexed in English. NLP techniques, including text classification, topic modeling, and geoparsing methods, were used to systematically identify and map scientific literature on natural disasters and human health published between January 1, 2012, and April 3, 2022. We predicted 6105 studies in the area of natural disasters and human health. Earthquakes, hurricanes, and tsunamis were the most frequent nature disasters;posttraumatic stress disorder (PTSD) and depression were the most frequently studied health outcomes;mental health services were the most common way of coping. Geographically, the evidence base was dominated by studies from high-income countries. Co-occurrence of natural disasters and psychological distress was common. Psychological distress was one of the top three most frequent topics in all continents except Africa, where infectious diseases was the most prevalent topic. Summary Our findings demonstrated the importance and feasibility of using NLP to comprehensively map natural disasters and human health in the growing literature. The review identifies clear topics for future clinical and public health research and can provide an empirical basis for reducing the negative health effects of natural disasters.
A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homoge...
详细信息
ISBN:
(纸本)9798891760608
A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin.
Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for...
详细信息
Large language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model m...
详细信息
ISBN:
(纸本)9798891760608
Large language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might predict both "Anne Redpath passed away in Edinburgh." and "Anne Redpath's life ended in London." In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency while retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and other evaluation task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.
While large visual-language models (LVLM) have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world ...
详细信息
Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language...
详细信息
In recent years, the injection of factual knowledge has been observed to have a significant positive correlation to the downstream task performance of pre-trained language models. However, existing work neither demons...
详细信息
ISBN:
(纸本)9798891760608
In recent years, the injection of factual knowledge has been observed to have a significant positive correlation to the downstream task performance of pre-trained language models. However, existing work neither demonstrates that pre-trained models successfully learn the injected factual knowledge nor proves that there is a causal relation between injected factual knowledge and downstream performance improvements. In this paper, we introduce a counterfactual-based analysis framework to explore the causal effects of factual knowledge injection on the performance of language models within pretrain-finetune paradigm. Instead of directly probing the language model or exhaustively enumerating potential confounding factors, we analyze this issue by perturbing the factual knowledge sources at different scales and comparing the performance of pre-trained language models before and after the perturbation. Surprisingly, throughout our experiments, we find that although the knowledge seems to be successfully injected, the correctness of injected knowledge only has a very limited effect on the models' downstream performance. This finding strongly challenges previous assumptions that the injected factual knowledge is the key for language models to achieve performance improvements on downstream tasks in pretrain-finetune paradigm.
This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states ...
详细信息
ISBN:
(纸本)9798891760608
This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs. (sic) https://***/rycolab/rnn-turing-completeness
Factual consistency evaluation is often conducted using naturallanguage Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic trai...
详细信息
ISBN:
(纸本)9798891760608
Factual consistency evaluation is often conducted using naturallanguage Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic training data. However, the data is typically based on perturbed human-written summaries, which often differ in their characteristics from real model-generated summaries and have limited coverage of possible factual errors. Alternatively, large language models (LLMs) have recently shown promising results in directly evaluating generative tasks, but are too computationally expensive for practical use. Motivated by these limitations, we introduce TrueTeacher, a method for generating synthetic data by annotating diverse model-generated summaries using a LLM. Unlike prior work, TrueTeacher does not rely on human-written summaries, and is multilingual by nature. Experiments on the TRUE benchmark show that a student model trained using our data, substantially outperforms both the state-of-the-art model with similar capacity, and the LLM teacher. In a systematic study, we compare TrueTeacher to existing synthetic data generation methods and demonstrate its superiority and robustness to domain-shift. We also show that our method generalizes to multilingual scenarios. Lastly, we release our large-scale synthetic dataset (1.4M examples), generated using TrueTeacher, and a checkpoint trained on this data.(1)
暂无评论