版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:University College London Institute of Health Informatics United Kingdom Queen Mary University of London School of Electronic Engineering and Computer Science London United Kingdom Division of Biomedical Informatics Cincinnati Children's Hospital Medical Center University of Cincinnati CincinnatiOH United States Advanced Computing in Health Sciences Computational Sciences in Engineering Division Oak Ridge National Laboratory United States Division of Neurology Cincinnati Children's Hospital Medical Center University of Cincinnati CincinnatiOH United States Department of Pediatrics Cincinnati Children's Hospital Medical Center University of Cincinnati CincinnatiOH United States Department of Psychiatry College of Medicine University of Cincinnati CincinnatiOH United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2024年
核心收录:
主 题:Pediatrics
摘 要:Introduction Healthcare analytics and Artificial Intelligence (AI) hold transformative potential, yet AI models often inherit biases from their training data, which can exacerbate healthcare disparities, particularly among minority groups. While efforts have primarily targeted bias in structured data, mental health heavily depends on unstructured data like clinical notes, where bias and data sparsity introduce unique challenges. This study aims to detect and mitigate linguistic differences related to non-biological differences in the training data of AI models designed to assist in pediatric mental health screening. Our objectives are: (1) to assess the presence of bias by evaluating outcome parity across sex subgroups, (2) to identify bias sources through textual distribution analysis, and (3) to develop and evaluate a de-biasing method for mental health text data. Methods We examined classification parity across demographic groups, identifying biases through analysis of linguistic patterns in clinical notes. Using interpretability techniques, we assessed how gendered language influences model predictions. We then applied a data-centric de-biasing method focused on neutralizing biased terms and retaining only the salient clinical information. This methodology was tested on a model for automatic anxiety detection in pediatric patients-a crucial application given the rise in youth anxiety post-COVID-19. Results Our findings show a systematic under-diagnosis of female adolescent patients, with a 4% lower accuracy and a 9% higher False Negative Rate (FNR) compared to male patients, likely due to disparities in information density and linguistic differences in patient notes. Notes for male patients were on average 500 words longer, and linguistic similarity metrics indicated distinct word distributions between genders. Implementing our de-biasing approach reduced this diagnostic bias by up to 27%, demonstrating the approach s effectiveness in enhancing equity across dem