Restricted Boltzmann Machines (RBMs) are an important class of latent variable models for representing vector data. An under-explored area is multimode data, where each data point is a matrix or a tensor. Standard RBM...
详细信息
Electronic Medical Records (EMR) are increasingly used for risk prediction. EMR analysis is complicated by missing entries. There are two reasons - the “primary reason for admission” is included in EMR, but the co-m...
详细信息
Electronic Medical Records (EMR) are increasingly used for risk prediction. EMR analysis is complicated by missing entries. There are two reasons - the “primary reason for admission” is included in EMR, but the co-morbidities (other chronic diseases) are left uncoded, and, many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflecting the fact that patients have some, but not all diseases. We propose a novel model to fill-in these missing values, and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3 months admission prediction is improved significantly from (0.741 to 0.786) for Cancer data and (0.678 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting of multiple related risk outcomes (e.g. emergency presentations and admissions in hospital over 3, 6 and 12 months period) in an integrated framework. For this model, the AUC averaged over outcomes is improved significantly from (0.768 to 0.806) for Cancer data and (0.685 to 0.748) for AMI data.
Assessing prognostic risk is crucial to clinical care, and critically dependent on both diagnosis and medical interventions. Current methods use this augmented information to build a single prediction rule. But this m...
详细信息
Medical outcomes are inexorably linked to patient illness and clinical interventions. Interventions change the course of disease, crucially determining outcome. Traditional outcome prediction models build a single cla...
详细信息
The goal of data clustering is to partition data points into groups to optimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is...
详细信息
We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word...
详细信息
We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.
The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be performed through a Gibbs sampler using the auxiliary variable method. We propose a splitmerge procedure to augm...
详细信息
Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The c...
详细信息
The success of any machine learning system depends critically on effective representations of data. In many cases, especially those in vision, it is desirable that a representation scheme uncovers the parts-based, add...
详细信息
The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and ...
详细信息
暂无评论