检索结果-内蒙古大学图书馆

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Stoehr, Niklas Du, Kevin Snæbjarnarson, Vésteinn West, Robert Cotterell, Ryan Schein, Aaron ETH Zürich Switzerland University of Copenhagen Denmark EPFL Switzerland The University of Chicago United States

ISBN: (纸本)9798891761681

Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a model is a prerequisite for interpreting its internal ***, we establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa (effectiveness), and leave other tokens unaffected (faithfulness), all while being sparse (minimality).Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention: activation scaling only modifies the signed magnitude of activation vectors to strengthen, weaken, or reverse the steering directions already encoded in the *** synthetic tasks, this intervention performs comparably with steering vectors in terms of effectiveness and faithfulness, but is much more minimal allowing us to pinpoint interpretable model *** evaluate activation scaling from different angles, compare performance on different datasets, and make activation scalars a learnable function of the activation vectors themselves to generalize to varying-length prompts. © 2024 Association for Computational Linguistics.

关键词： Vectors

来源：评论

学校读者我要写书评

暂无评论

Implicit Personalization in language Models: A Systematic Study

Implicit Personalization in Language Models: A Systematic St...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Jin, Zhijing Heil, Nils Liu, Jiarui Dhuliawala, Shehzaad Qi, Yahang Schölkopf, Bernhard Mihalcea, Rada Sachan, Mrinmaya University of Toronto Canada TUM Germany CMU United States ETH Zürich Switzerland MPI Germany University of Michigan United States

ISBN: (纸本)9798891761681

Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation, a multi-perspective moral reasoning framework, and a set of case studies. Our theoretical foundation for IP relies on a structural causal model and introduces a novel method, indirect intervention, to estimate the causal effect of a mediator variable that cannot be directly intervened upon. Beyond the technical approach, we also introduce a set of moral reasoning principles based on three schools of moral philosophy to study when IP may or may not be ethically appropriate. Equipped with both mathematical and ethical insights, we present three diverse case studies illustrating the varied nature of the IP problem and offer recommendations for future research. © 2024 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Statistically Profiling Biases in natural language Reasoning Datasets and Models

Statistically Profiling Biases in Natural Language Reasoning...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Huang, Shanshan Zhu, Kenny Q. Shanghai Jiao Tong Univ Shanghai Peoples R China Univ Texas Arlington Arlington TX 76019 USA

ISBN: (纸本)9798891760615

Recent studies have shown that many natural language understanding and reasoning datasets contain statistical cues that can be exploited by NLP models, resulting in an overestimation of their capabilities. Existing methods, such as "hypothesis-only" tests and CheckList, are limited in identifying these cues and evaluating model weaknesses. We introduce ICQ (I-See-Cue), a lightweight, general statistical profiling framework that automatically identifies potential biases in multiple-choice NLU datasets without requiring additional test cases. ICQ assesses the extent to which models exploit these biases through black-box testing, addressing the limitations of current methods. In this work, we conduct a comprehensive evaluation of statistical biases in 10 popular NLU datasets and 4 models, confirming prior findings, revealing new insights, and offering an online demonstration system to encourage users to assess their own datasets and models. Furthermore, we present a case study on investigating ChatGPT's bias, providing valuable recommendations for practical applications.

关键词： Black-box testing

来源：评论

学校读者我要写书评

暂无评论

RULE: Reliable Multimodal RAG for Factuality in Medical Vision language Models

RULE: Reliable Multimodal RAG for Factuality in Medical Visi...

引用

29th conference on empirical methods in natural language processing

作者： Xia, Peng Zhu, Kangyu Li, Haoran Zhu, Hongtu Li, Yun Li, Gang Zhang, Linjun Yao, Huaxiu UNC Chapel Hill Chapel Hill NC 27599 USA Brown Univ Providence RI USA PolyU Hong Kong Peoples R China Rutgers State Univ New Brunswick NJ USA

ISBN: (纸本)9798891761643

The recent emergence of Medical Large Vision language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on medical VQA and report generation tasks across three datasets, achieving an average improvement of 47.4% in factual accuracy. We publicly release our benchmark and code in https: //***/richard-peng- xia/RULE.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Are Large language Models Consistent over Value-laden Questions?

Are Large Language Models Consistent over Value-laden Questi...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Moore, Jared Deshpande, Tanvi Yang, Diyi Stanford University United States

ISBN: (纸本)9798891761681

Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, and (4) multilingual translations of a question to English, Chinese, German, and Japanese. We apply these measures to small and large, open LLMs including llama-3, as well as gpt-4o, using 8,000 questions spanning more than 300 topics. Unlike prior work, we find that models are relatively consistent across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Models are more consistent on uncontroversial topics (e.g., in the U.S., "Thanksgiving") than on controversial ones ("euthanasia"). Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics, while fine-tuned models are more inconsistent about some topics ("euthanasia") than others ("women's rights") like our human subjects (n=165). © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models

What if...?: Thinking Counterfactual Keywords Helps to Mitig...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Kim, Junho Kim, Yeonju Ro, Yong Man Integrated Vision and Language Lab KAIST Korea Republic of

ISBN: (纸本)9798891761681

This paper presents a way of enhancing the reliability of Large Multi-modal Models (LMMs) in addressing hallucination, where the models generate cross-modal inconsistent responses. Without additional training, we propose Counterfactual Inception, a novel method that implants counterfactual thinking into LMMs using self-generated counterfactual keywords. Our method is grounded in the concept of counterfactual thinking, a cognitive process where human considers alternative realities, enabling more extensive context exploration. Bridging the human cognition mechanism into LMMs, we aim for the models to engage with and generate responses that span a wider contextual scene understanding, mitigating hallucinatory outputs. We further introduce Plausibility Verification Process (PVP), a simple yet robust keyword constraint that effectively filters out suboptimal keywords to enable the consistent triggering of counterfactual thinking in the model responses. Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination and helps to broaden contextual understanding based on true visual clues. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

BiasDora: Exploring Hidden Biased Associations in Vision-language Models

BiasDora: Exploring Hidden Biased Associations in Vision-Lan...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Raj, Chahat Mukherjee, Anjishnu Caliskan, Aylin Anastasopoulos, Antonios Zhu, Ziwei George Mason University United States University of Washington United States Archimedes AI Research Unit RC Athena Greece

ISBN: (纸本)9798891761681

Existing works examining Vision-language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender↔profession or race↔crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available. © 2024 Association for Computational Linguistics.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Code-Switched language Identification is Harder Than You Think 18

Code-Switched Language Identification is Harder Than You Thi...

引用

18th conference of the European-Chapter of the Association-for-Computational-Linguistics (EACL)

作者： Burchell, Laurie Birch, Alexandra Thompson, Robert P. Heafield, Kenneth Univ Edinburgh Sch Informat Inst Language Cognit & Computat 10 Crichton St Edinburgh EH8 9AB Midlothian Scotland Univ Cambridge Dept Mat Sci & Met 27 Charles Babbage Rd Cambridge CB3 0FS England

ISBN: (纸本)9798891760882

Code switching (CS) is a very common phenomenon in written and spoken communication but one that is handled poorly by many natural language processing (NLP) applications. Looking to the application of building CS corpora, we explore CS language identification (LID) for corpus building. We make the task more realistic by scaling it to more languages and considering models with simpler architectures for faster inference. We also reformulate the task as a sentence-level multi-label tagging problem to make it more tractable. Having defined the task, we investigate three reasonable models for this task and define metrics which better reflect desired performance. We present empirical evidence that no current approach is adequate and finally provide recommendations for future work in this area.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

An empirical study of text-based machine learning models for vulnerability detection

引用

empirical SOFTWARE ENGINEERING 2023年第2期28卷 1-45页

作者： Napier, Kollin Bhowmik, Tanmay Wang, Shaowei Mississippi State Univ Dept Comp Sci & Engn Mississippi State MS 39762 USA Univ Manitoba Dept Comp Sci Winnipeg MB Canada

With an increase in complexity and severity, it is becoming harder to identify and mitigate vulnerabilities. Although traditional tools remain useful, machine learning models are being adopted to expand efforts. To help explore methods of vulnerability detection, we present an empirical study on the effectiveness of text-based machine learning models by utilizing 344 open-source projects, 2,182 vulnerabilities and 38 vulnerability types. With the availability of vulnerabilities being presented in forms such as code snippets, we construct a methodology based on extracted source code functions and create equal pairings. We conduct experiments using seven machine learning models, five natural language processing techniques and three data processing methods. First, we present results based on full context function pairings. Next, we introduce condensed functions and conduct a statistical analysis to determine if there is a significant difference between the models, techniques, or methods. Based on these results, we answer research questions regarding model prediction for testing within and across projects and vulnerability types. Our results show that condensed functions with fewer features may achieve greater prediction results when testing within rather than across. Overall, we conclude that text-based machine learning models are not effective in detecting vulnerabilities within or across projects and vulnerability types.

关键词： Vulnerability detection Machine learning Text-based analysis

来源：评论

学校读者我要写书评

暂无评论

Remember This Event That Year? Assessing Temporal Information and Understanding in Large language Models

Remember This Event That Year? Assessing Temporal Informatio...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Beniwal, Himanshu Patel, Dishant Kowsik Nandagopan, D. Ladia, Hritik Yadav, Ankit Singh, Mayank Department of Computer Science and Engineering Indian Institute of Technology Gandhinagar India

ISBN: (纸本)9798891761681

Large language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited, hindering their application in real-world scenarios where understanding the sequential nature of events is crucial. Our study experiments with 12 state-of-the-art models (ranging from 2B to 70B+ parameters) on a novel numerical-temporal dataset, TempUN, spanning from 10,000 BCE to 2100 CE, to uncover significant temporal retention and comprehension limitations. We propose six metrics to assess three learning paradigms to enhance temporal knowledge acquisition. Our findings reveal that open-source models exhibit knowledge gaps more frequently, suggesting a trade-off between limited knowledge and incorrect responses. Additionally, various fine-tuning approaches significantly improved performance, reducing incorrect outputs and impacting the identification of 'information not available' in the generations. The associated dataset and code are available at https://***/lingoiitgn/TempUN. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：