文献详情 >Refinement of an Epilepsy Dict... 收藏

arXiv

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

作者：Min, Aehong Wang, Xuan Correia, Rion Brattig Rozum, Jordan Miller, Wendy R. Rocha, Luis M.

作者机构：Donald Bren School of Information & Computer Sciences University of California IrvineCA United States Luddy School of Informatics Computing & Engineering Indiana University BloomingtonIN United States Instituto Gulbenkian de Ciência Oeiras Portugal Dept. of Systems Science & Industrial Engineering Binghamton University BinghamtonNY United States School of Nursing Indiana University IndianapolisIN United States

出版物：《arXiv》 (arXiv)

年卷期：2024年

核心收录：

主　　题：Eigenvalues and eigenfunctions

摘要：Objective — To (1) identify health-related terms used on social media posts that do not precisely match the health-related meaning of terms in a biomedical dictionary, (2) decide which terms need to be removed in order to improve the quality of the dictionary in the scope of biomedical text mining tasks, (3) evaluate the effect of removing imprecise terms on such tasks, and (4) discuss how human-centered annotation complements automated annotation in social media mining for biomedical purposes. Materials and Methods — We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. Frequent terms with a high false-positive rate were removed from the dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. OpenAI’s GPT series models were compared against human annotation. Results — Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI’s GPT series models fare worse than human annotators in this task. Discussion — Dictionaries built from traditional clinical terminology are not tailored for social media language and can bias results when used in biomedical infe

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：