检索结果-内蒙古大学图书馆

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Jung, Chani Kim, Dongkwan Jin, Jiho Kim, Jiseon Seonwoo, Yeon Choi, Yejin Oh, Alice Kim, Hyunwoo KAIST Korea Republic of Amazon United States University of Washington United States Allen Institute for AI United States

ISBN: (纸本)9798891761643

While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM *** posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors-perception inference and perception-to-belief inference-in *** introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, *** evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control).Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief *** results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Transfer-Free Data-Efficient Multilingual Slot Labeling

Transfer-Free Data-Efficient Multilingual Slot Labeling

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Razumovskaia, Evgeniia Vulic, Ivan Korhonen, Anna Univ Cambridge Language Technol Lab Cambridge England

ISBN: (纸本)9798891760608

Slot labeling (SL) is a core component of task-oriented dialogue (TOD) systems, where slots and corresponding values are usually language-, task- and domain-specific. Therefore, extending the system to any new language-domain-task configuration requires (re)running an expensive and resource-intensive data annotation process. To mitigate the inherent data scarcity issue, current research on multilingual ToD assumes that sufficient English-language annotated data are always available for particular tasks and domains, and thus operates in a standard cross-lingual transfer setup. In this work, we depart from this often unrealistic assumption. We examine challenging scenarios where such transfer-enabling English annotated data cannot be guaranteed, and focus on bootstrapping multilingual data-efficient slot labelers in transfer-free scenarios directly in the target languages without any English-ready data. We propose a two-stage slot labeling approach (termed TWOSL) which transforms standard multilingual sentence encoders into effective slot labelers. In Stage 1, relying on SL-adapted contrastive learning with only a handful of SL-annotated examples, we turn sentence encoders into task-specific span encoders. In Stage 2, we recast SL from a token classification into a simpler, less data-intensive span classification task. Our results on two standard multilingual TOD datasets and across diverse languages confirm the effectiveness and robustness of TWOSL. It is especially effective for the most challenging transfer-free few-shot setups, paving the way for quick and data-efficient bootstrapping of multilingual slot labelers for TOD.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large language Models Fine-tuning

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large L...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Zhang, Jingfan Zhao, Yi Chen, Dan Tian, Xing Zheng, Huanran Zhu, Wei iFLYTEK Co. Ltd. China University of Pennsylvania United States Lenovo Connect Co. Ltd. China Niuxin Network Technology Co. Ltd. China East China Normal University China

ISBN: (纸本)9798891761681

Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency in multi-tenant settings due to the LoRA modules and MOE routers added to multiple linear modules in the Transformer layer. To address this issue, we propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant. MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism. This mechanism calculates expert routing results once before generating the first new token and reuses these results for subsequent tokens, reducing latency. Extensive experiments and analysis on commonsense reasoning tasks, math reasoning tasks, and widely used LLM evaluation benchmarks demonstrate that MiLoRA consistently outperforms strong PEFT baselines with comparable tunable parameter budgets. Additionally, MiLoRA significantly reduces latency in multi-tenant settings compared to previous LoRA-based methods. © 2024 Association for Computational Linguistics.

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

Improving Retrieval in Sponsored Search by Leveraging Query Context Signals

Improving Retrieval in Sponsored Search by Leveraging Query ...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Mohankumar, Akash Kumar Gururaj, K. Madan, Gagan Singh, Amit Microsoft India

ISBN: (纸本)9798891761667

Accurately retrieving relevant bid keywords for user queries is critical in Sponsored Search but remains challenging, particularly for short, ambiguous queries. Existing dense and generative retrieval models often fail to capture nuanced user intent in these cases. To address this, we propose an approach to enhance query understanding by augmenting queries with rich contextual signals derived from web search results and large language models, stored in an online cache. Specifically, we use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations that clarify user intent. These signals are efficiently integrated through a Fusion-in-Decoder based Unity architecture, enabling both dense and generative retrieval with serving costs on par with traditional context-free models. To address scenarios where context is unavailable in the cache, we introduce context glancing, a curriculum learning strategy that improves model robustness and performance even without contextual signals during inference. Extensive offline experiments demonstrate that our context-aware approach substantially outperforms context-free models. Furthermore, online A/B testing on a prominent search engine across 160+ countries shows significant improvements in user engagement and revenue. © 2024 Association for Computational Linguistics.

关键词： Structured Query language

来源：评论

学校读者我要写书评

暂无评论

Enhancing language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

Enhancing Language Model Factuality via Activation-Based Con...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Liu, Xin Bayat, Farima Fatahi Wang, Lu University of Michigan Ann ArborMI United States

ISBN: (纸本)9798891761643

Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover, simply filtering out low-confidence responses reduces the LM's helpfulness when the answers are correct. Therefore, effectively using calibration techniques to enhance an LM's factuality remains an unsolved challenge. In this paper, we first propose an activation-based calibration method, ACTCAB, which trains a linear layer on top of the LM's last-layer activations that can better capture the representations of knowledge. Built on top of ACTCAB, we further propose CODEC, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs. By evaluating on five popular QA benchmarks, ACTCAB achieves superior calibration performance than all competitive baselines, e.g., by reducing the average expected calibration error (ECE) score by up to 39%. Further experiments on CODEC show consistent improvements in several LMs' factuality on challenging QA datasets, such as TruthfulQA, highlighting the value of confidence signals in enhancing the factuality. © 2024 Association for Computational Linguistics.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Social Media Topic Classification on Greek Reddit

引用

INFORMATION 2024年第9期15卷 521页

作者： Mastrokostas, Charalampos Giarelis, Nikolaos Karacapilidis, Nikos Univ Patras Ind Management & Informat Syst Lab MEAD Rion 26504 Greece

Text classification (TC) is a subtask of natural language processing (NLP) that categorizes text pieces into predefined classes based on their textual content and thematic aspects. This process typically includes the training of a Machine Learning (ML) model on a labeled dataset, where each text example is associated with a specific class. Recent progress in Deep Learning (DL) enabled the development of deep neural transformer models, surpassing traditional ML ones. In any case, works of the topic classification literature prioritize high-resource languages, particularly English, while research efforts for low-resource ones, such as Greek, are limited. Taking the above into consideration, this paper presents: (i) the first Greek social media topic classification dataset;(ii) a comparative assessment of a series of traditional ML models trained on this dataset, utilizing an array of text vectorization methods including TF-IDF, classical word and transformer-based Greek embeddings;(iii) a fine-tuned GREEK-BERT-based TC model on the same dataset;(iv) key empirical findings demonstrating that transformer-based embeddings significantly increase the performance of traditional ML models, while our fine-tuned DL model outperforms previous ones. The dataset, the best-performing model, and the experimental code are made public, aiming to augment the reproducibility of this work and advance future research in the field.

关键词： Greek language deep learning large language models machine learning natural language processing transformers text classification Greek NLP resources social media

来源：评论

学校读者我要写书评

暂无评论

A Cheaper and Better Diffusion language Model with Soft-Masked Noise

A Cheaper and Better Diffusion Language Model with Soft-Mask...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Chen, Jiaao Zhang, Aston Li, Mu Smola, Alex Yang, Diyi Georgia Inst Technol Atlanta GA 30332 USA Meta GenAI Sunnyvale CA USA Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9798891760608

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffusion LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffusion LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency. Code is available at https://github. com/SALT-NLP/Masked_Diffusioin_LM.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

What's "up" with vision-language models? Investigating their struggle with spatial reasoning

What's "up" with vision-language models? Investigating their...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Kamath, Amita Hessel, Jack Chang, Kai-Wei Univ Calif Los Angeles Los Angeles CA 90095 USA Allen Inst AI Seattle WA USA

ISBN: (纸本)9798891760608

Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"? We curate three new corpora to quantify model comprehension of such basic spatial relations. These tests isolate spatial reasoning more precisely than existing datasets like VQAv2, e.g., our What'sUp benchmark contains sets of photographs varying only the spatial relations of objects, keeping their identity fixed (see Figure 1: models must comprehend not only the usual case of a dog under a table, but also, the same dog on top of the same table). We evaluate 18 VL models, finding that all perform poorly, e.g., BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. We conclude by studying causes of this surprising behavior, finding: 1) that popular vision-language pretraining corpora like LAION-2B contain little reliable data for learning spatial relationships;and 2) that basic modeling interventions like up-weighting preposition-containing instances or fine-tuning on our corpora are not sufficient to address the challenges our benchmarks pose. We are hopeful that these corpora will facilitate further research, and we release our data and code at https://***/amitakamath/ whatsup_vlms.

关键词： Modeling languages

来源：评论

学校读者我要写书评

暂无评论

Characterizing Mechanisms for Factual Recall in language Models

Characterizing Mechanisms for Factual Recall in Language Mod...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Yu, Qinan Merullo, Jack Pavlick, Ellie Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9798891760608

language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large language Models

Have LLMs Advanced Enough? A Challenging Problem Solving Ben...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Arora, Daman Singh, Himanshu Gaurav Mausam Microsoft Res Redmond WA 98052 USA Univ Calif Berkeley Berkeley CA 94720 USA IIT Delhi New York NY USA

ISBN: (纸本)9798891760608

The performance of large language models (LLMs) on existing reasoning benchmarks has significantly improved over the past years. In response, we present JEEBENCH, a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. We curate 515 challenging pre-engineering mathematics, physics and chemistry problems from the highly competitive IIT JEE-Advanced exam. Long-horizon reasoning on top of deep in-domain knowledge is essential for solving problems in this benchmark. Our evaluation on various open-source and proprietary models reveals that the highest performance, even after using techniques like self-consistency, self-refinement and chain-of-thought prompting, is less than 40%. The typical failure modes of GPT-4, the best model, are errors in algebraic manipulation, difficulty in grounding abstract concepts into mathematical equations accurately and failure in retrieving relevant domain-specific concepts. We also observe that by mere prompting, GPT-4 is unable to assess risk introduced by negative marking for incorrect answers. For this, we develop a post-hoc confidence-thresholding method over self-consistency, which enables effective response selection. We hope that our challenging benchmark will guide future re-search in problem-solving using LLMs.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：