咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >DEFVERIFY: Do Hate Speech Mode... 收藏
arXiv

DEFVERIFY: Do Hate Speech Models Reflect Their Dataset’s Definition?

作     者:Khurana, Urja Nalisnick, Eric Fokkens, Antske 

作者机构:Computational Linguistics and Text Mining Lab Vrije Universiteit Amsterdam Netherlands Department of Computer Science Johns Hopkins University United States 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Speech recognition 

摘      要:Warning: Due to the nature of the topic, this paper contains offensive content. When building a predictive model, it is often difficult to ensure that application-specific requirements are encoded by the model that will eventually be deployed. Consider researchers working on hate speech detection. They will have an idea of what is considered hate speech, but building a model that reflects their view accurately requires preserving those ideals throughout the workflow of data set construction and model training. Complications such as sampling bias, annotation bias, and model misspecification almost always arise, possibly resulting in a gap between the application specification and the model’s actual behavior upon deployment. To address this issue for hate speech detection, we propose DEFVERIFY: a 3-step procedure that (i) encodes a user-specified definition of hate speech, (ii) quantifies to what extent the model reflects the intended definition, and (iii) tries to identify the point of failure in the workflow. We use DEFVERIFY to find gaps between definition and model behavior when applied to six popular hate speech benchmark datasets. © 2024, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分