The recent adoption of Electronic Health Records (EHRs) by healthcare providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts...
详细信息
The recent adoption of Electronic Health Records (EHRs) by healthcare providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured text designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and utilize the clusters to represent information about the patient compactly. Additionally, we define the sentences on ontologic and natural language vocabularies to automatically detect pertinent combinations of concepts present in the corpus, even when an ontology is not available. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of .65 thousand documents with a total of .3.2 million sentences. After correcting for cancer type and other confounding factors, we identify a total of 340 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty and we report several known associations. We also propose 37 plausible, testable hypothesis for associations where the underlying biological mechanism does not appear to be known. These results illustrate that the automated discovery of clinical features
Our understanding of how chromosomes structurally organize and dynamically interact has been revolutionized through the lens of long-chain polymer physics. Major protein contributors to chromosome structure and dynami...
详细信息
Our understanding of how chromosomes structurally organize and dynamically interact has been revolutionized through the lens of long-chain polymer physics. Major protein contributors to chromosome structure and dynamics are condensin and cohesin that stochastically generate loops within and between chains, and entrap proximal strands of sister chromatids. In this paper, we explore the ability of transient, protein-mediated, gene-gene crosslinks to induce clusters of genes, thereby dynamic architecture, within the highly repeated ribosomal DNA that comprises the nucleolus of budding yeast. We implement three approaches: live cell microscopy;computational modeling of the full genome during G1 in budding yeast, exploring four decades of timescales for transient crosslinks between 5kbp domains (genes) in the nucleolus on Chromosome XII;and, temporal network models with automated community (cluster) detection algorithms applied to the full range of 4D modeling datasets. The data analysis tools detect and track gene clusters, their size, number, persistence time, and their plasticity (deformation). Of biological significance, our analysis reveals an optimal mean crosslink lifetime that promotes pairwise and cluster gene interactions through "flexible" clustering. In this state, large gene clusters self-assemble yet frequently interact (merge and separate), marked by gene exchanges between clusters, which in turn maximizes global gene interactions in the nucleolus. This regime stands between two limiting cases each with far less global gene interactions: with shorter crosslink lifetimes, "rigid" clustering emerges with clusters that interact infrequently;with longer crosslink lifetimes, there is a dissolution of clusters. These observations are compared with imaging experiments on a normal yeast strain and two condensin-modified mutant cell strains. We apply the same image analysis pipeline to the experimental and simulated datasets, providing support for the modeling pred
Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabil...
详细信息
Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are nece...
详细信息
暂无评论