The recent adoption of Electronic Health Records (EHRs) by healthcare providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts...
详细信息
The recent adoption of Electronic Health Records (EHRs) by healthcare providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured text designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and utilize the clusters to represent information about the patient compactly. Additionally, we define the sentences on ontologic and natural language vocabularies to automatically detect pertinent combinations of concepts present in the corpus, even when an ontology is not available. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of .65 thousand documents with a total of .3.2 million sentences. After correcting for cancer type and other confounding factors, we identify a total of 340 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty and we report several known associations. We also propose 37 plausible, testable hypothesis for associations where the underlying biological mechanism does not appear to be known. These results illustrate that the automated discovery of clinical features
Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders p...
详细信息
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts...
详细信息
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.
Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and extensive genomic ...
详细信息
Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and extensive genomic instability. To address this, a dataset of whole genomes and transcriptomes was generated from purified epithelium of primary and metastatic tumors. Transcriptome analysis demonstrated that molecular subtypes are a product of a gene expression continuum driven by a mixture of intratumoral subpopulations, which was confirmed by single-cell analysis. Integrated whole-genome analysis uncovered that molecular subtypes are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6. By mapping tumor genetic histories, tetraploidization emerged as a key mutational process behind these events. Taken together, these data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype, and that disease heterogeneity is due to ongoing genomic instability during progression.
This paper studies non-parametric time-series approach to electric load in national holiday seasons based on historical hourly data in state electric company of Indonesia consisting of historical data of the Northern ...
详细信息
We propose a new type of generative model of high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing ...
详细信息
暂无评论