We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online languageprocessing, based on the simulation of expected continuations of incremental linguistic contexts. Ou...
详细信息
Surprisal theory (Hale, 2001;Levy, 2008) posits that a word's reading time is proportional to its surprisal (i.e., to its negative log probability given the proceeding context). It has been empirically tested usin...
详细信息
ISBN:
(纸本)9798891760608
Surprisal theory (Hale, 2001;Levy, 2008) posits that a word's reading time is proportional to its surprisal (i.e., to its negative log probability given the proceeding context). It has been empirically tested using surprisal estimates from language models (LMs). Under the premise that surprisal theory holds, we would expect that higher quality language models, whose predictions are more accurate, provide more powerful predictors of human reading behavior-a conjecture we dub the quality-power (QP) hypothesis. Unfortunately, empirical support for the QP hypothesis is mixed. Some studies in English have found correlations between LM quality and psychometric predictive power, but other studies using Japanese data, as well as using larger English LMs, find no such correlations. In this work, we conduct a systematic cross-linguistic assessment of the QP hypothesis. We train LMs from scratch on small- and medium-sized datasets from 13 languages (across five language families) and assess their ability to predict eye tracking data. We find correlations between LM quality and psychometric predictive power in eleven of these thirteen languages, suggesting that, within the range of model classes and sizes tested, better language models provide better predictors of human languageprocessing behaviors. (sic) https://***/rycolab/quality-power-hypothesis
Complex Word Identification (CWI) is an essential step in the lexical simplification task and has recently become a task on its own. Some variations of this binary classification task have emerged, such as lexical com...
详细信息
Fine-tuning all parameters of large language models (LLMs) requires significant computational resources and is time-consuming. Recent parameter-efficient tuning methods such as Adapter tuning, Prefix tuning, and LoRA ...
详细信息
ISBN:
(纸本)9798891760608
Fine-tuning all parameters of large language models (LLMs) requires significant computational resources and is time-consuming. Recent parameter-efficient tuning methods such as Adapter tuning, Prefix tuning, and LoRA allow updating a small subset of parameters in large language models. However, they can only save approximately 30% of the training memory requirements because gradient computation and backpropagation are still necessary for these methods. This paper proposes a novel parameter-efficient tuning method for LLMs without calculating their gradients. Leveraging the discernible similarities between the parameter-efficient modules of the same task learned by both large and small language models, we put forward a strategy for transferring the parameter-efficient modules derived initially from small language models to much larger ones. To ensure a smooth and effective adaptation process, we introduce a Bridge model to guarantee dimensional consistency while stimulating a dynamic interaction between the models. We demonstrate the effectiveness of our method using the T5 and GPT-2 series of language models on the SuperGLUE benchmark. Our method achieves comparable performance to fine-tuning and parameter-efficient tuning on large language models without needing gradient-based optimization. Additionally, our method achieves up to 5.7x memory reduction compared to parameter-efficient tuning.
This paper explores the application of Variational Autoencoders (VAE) in text generation, focusing on overcoming challenges like posterior collapse and the limitations of simplistic prior distributions. We investigate...
详细信息
Multi-agent systems empowered by large language models (LLMs) have demonstrated remarkable capabilities in a wide range of downstream applications. In this work, we introduce TRANSAGENTS, a novel multi-agent translati...
详细信息
Recent studies highlight the potential of large language models in creating educational tools for children, yet significant challenges remain in maintaining key child-specific properties such as linguistic nuances, co...
详细信息
We propose a general method to break down a main complex task into a set of intermediary easier sub-tasks, which are formulated in naturallanguage as binary questions related to the final target task. Our method allo...
详细信息
ISBN:
(纸本)9798891760608
We propose a general method to break down a main complex task into a set of intermediary easier sub-tasks, which are formulated in naturallanguage as binary questions related to the final target task. Our method allows for representing each example by a vector consisting of the answers to these questions. We call this representation naturallanguage Learned Features (NLLF). NLLF is generated by a small transformer language model (e.g., BERT) that has been trained in a naturallanguage Inference (NLI) fashion, using weak labels automatically obtained from a Large language Model (LLM). We show that the LLM normally struggles for the main task using in-context learning, but can handle these easiest subtasks and produce useful weak labels to train a BERT. The NLI-like training of the BERT allows for tackling zero-shot inference with any binary question, and not necessarily the ones seen during the training. We show that this NLLF vector not only helps to reach better performances by enhancing any classifier, but that it can be used as input of an easy-to-interpret machine learning model like a decision tree. This decision tree is interpretable but also reaches high performances, surpassing those of a pre-trained transformer in some cases. We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.(1)
While naturallanguage inference (NLI) has emerged as a prominent task for evaluating a model's capability to perform naturallanguage understanding, creating large benchmarks for training deep learning models imp...
详细信息
In this work, we designed, developed and released in production DataQue - a hybrid NLQ (naturallanguage Querying) system for conversational DB querying. We address multiple practical problems that are not accounted f...
详细信息
暂无评论