版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:The Oxford Suzhou Centre for Advanced Research Suzhou215123 China The Department of Engineering Science Institute of Biomedical Engineering University of Oxford OxfordOX1 2JD United Kingdom The National Key Laboratory for Novel Software Technology Nanjing University Nanjing210023 China The School of Public Health Shanghai Jiao Tong University School of Medicine Shanghai200025 China The Department of Engineering Science Institute of Biomedical Engineering University of Oxford OxfordOX1 2JD United Kingdom
出 版 物:《SSRN》
年 卷 期:2023年
核心收录:
主 题:Semantics
摘 要:Electronic health records (EHR) contain a comprehensive history of patients’ diagnostic, procedural, and prescription data, which are expressed in medical concepts coded in diverse standards, such as ICD-10-CM, ICD-10-PCS, and NDC. These medical concepts possess complex semantics e.g., causal/mutual exclusive/complication relationship between diagnoses concepts, which are implicitly embedded in EHR data. The semantics of medical concepts summarise medical context, which plays an important role in improving clinical tasks in the healthcare domain. Many EHR mining researchers utilize natural language processing (NLP) models for learning to represent medical concepts. However, there has been no systematic comparison of these models in representing medical concepts. In this study, we evaluated four NLP models, including Latent Dirichlet Allocation (LDA), Word2Vec, GloVe, and BERT, for medical concept representation learning using two large public EHR datasets in the pre-training setting. We discovered that GloVe performed the best among the four models. On the whole, medical concept representation learning models should incorporate global co-occurrence and subtle modelling for accurately capturing the relationships between code embeddings. © 2023, The Authors. All rights reserved.