Background: Physicians invest hours creating patient notes, which are rich in information but difficult for computers to analyze due to their unstructured format. GPT-4 reshaped our ability to process text, yet it is ...
详细信息
Background: Physicians invest hours creating patient notes, which are rich in information but difficult for computers to analyze due to their unstructured format. GPT-4 reshaped our ability to process text, yet it is unknown how well this model can handle medical notes. This project aims to compare GPT-4’s ability to annotate medical notes against experienced physicians across three different languages at multiple institutions and countries. Methods: This study included eight sites from four countries - the united States, Colombia, Singapore, and Italy. Each site contributed seven de-identified notes (admission, progress, or consult) from hospitalized patients. GPT-4 assessed each note by answering 14 questions, including demographic information, clinical judgments, data quality, and patients’ eligibility for a hypothetical study enrollment. For validation, two physicians from each site independently evaluated GPT-4's responses. Findings: Overall, 56 medical notes, written in English, Italian, and Spanish, were analyzed. A total of 784 responses from GPT-4 were generated. Both physicians agreed with GPT-4’s response 79% of the time (622/784, 95%CI 76-82%). Only one of the two physicians agreed with GPT-4’s response 10% of the time (82/784, 95%CI 8-13%). Neither physician agreed with GPT-4’s response 10% of the time (80/784, 95%CI 8-13%). Both physicians agreed with GPT-4 more often in notes written in Spanish and Italian than in English, with agreement rates of 88% (86/98, 95%CI 79-93%), 84% (82/98, 95%CI 75-90%), and 77% (454/588, 95%CI 74-80%), respectively. Hallucinations were rare (10/784, 95%CI 0-2%). GPT-4 correctly selected patients for a hypothetical study enrollment based on three criteria 90% of the time (95%CI 81-98%). Interpretation: The findings indicate that GPT-4 annotations demonstrated a high agreement rate with physicians across all languages. We also demonstrate GPT-4's potential to assist in patient selection for studies. Funding: None. Declarati
暂无评论