咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Hierarchical Bayesian nonparam... 收藏

Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

层次贝叶斯的 nonparametric 从电子医药记录为知识发现当模特儿

作     者:Li, Cheng Rana, Santu Dinh Phung Venkatesh, Svetha 

作者机构:Deakin Univ Ctr Pattern Recognit & Data Analyt Geelong Vic 3217 Australia 

出 版 物:《KNOWLEDGE-BASED SYSTEMS》 (知识库系统)

年 卷 期:2016年第99卷第0期

页      面:168-182页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:Bayesian nonparametric models Correspondence models Word distances Disease topics Readmission prediction Procedure codes prediction 

摘      要:Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the C

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分