咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >CoSafe: Evaluating Large Langu... 收藏
arXiv

CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference

作     者:Yu, Erxin Li, Jing Liao, Ming Wang, Siqi Gao, Zuchen Mi, Fei Hong, Lanqing 

作者机构:Department of Computing The Hong Kong Polytechnic University Hong Kong Research Centre for Data Science & Artificial Intelligence Huawei Noah’s Ark Lab Canada 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Computational linguistics 

摘      要:As large language models (LLMs) constantly evolve, ensuring their safety remains a critical research problem. Previous red teaming approaches for LLM safety have primarily focused on single prompt attack or goal hijacking. To the best of our knowledge, we are the first to study LLM safety in multi-turn dialogue coreference. We created a dataset of 1, 400 questions across 14 categories, each featuring multi-turn coreference safety attacks. We then conducted detailed evaluations on five widely used open-source LLMs. The results indicated that under multi-turn coreference safety attacks, the highest attack successful rate was 56% with the LLaMA2-Chat-7b model, while the lowest was 13.9% with the Mistral-7B-Instruct model. These findings highlight the safety vulnerabilities in LLMs during dialogue coreference interactions. Warning: This paper may contain offensive language or harmful content.1 © 2024, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分