The rapid advance of gene sequencing technologies has produced an unprecedented rate of discovery of genome variation in humans. A growing number of authoritative clinical repositories archive gene variants and diseas...
详细信息
The rapid advance of gene sequencing technologies has produced an unprecedented rate of discovery of genome variation in humans. A growing number of authoritative clinical repositories archive gene variants and disease phenotypes, yet there are currently many more gene variants that lack clear annotation or disease association. To date, there has been very limited coverage of gene-specific predictors in the literature. Here the evaluation is presented of "gene-specific" predictor models based on a naive Bayesian classifier for 20 gene-disease datasets, containing 3986 variants with clinically characterized patient conditions. The utility of gene-specific prediction is then compared with "all-gene" generalized prediction and also with existing popular predictors. Gene-specific computational prediction models derived from clinically curated gene variant disease datasets often outperform established generalized algorithms for novel and uncertain gene variants.
Objective The Cross-Institutional Clinical Translational Research project explored a federated query tool and looked at how this tool can facilitate clinical trial cohort discovery by managing access to aggregate pati...
详细信息
Objective The Cross-Institutional Clinical Translational Research project explored a federated query tool and looked at how this tool can facilitate clinical trial cohort discovery by managing access to aggregate patient data located within unaffiliated academic medical centers. Methods The project adapted software from the Informatics for Integrating Biology and the Bedside (i2b2) program to connect three Clinical Translational Research Award sites: University of Washington, Seattle, University of California, Davis, and University of California, San Francisco. The project developed an iterative spiral software development model to support the implementation and coordination of this multisite data resource. Results By standardizing technical infrastructures, policies, and semantics, the project enabled federated querying of deidentified clinical datasets stored in separate institutional environments and identified barriers to engaging users for measuring utility. Discussion The authors discuss the iterative development and evaluation phases of the project and highlight the challenges identified and the lessons learned. Conclusion The common system architecture and translational processes provide high-level (aggregate) deidentified access to a large patient population (>5 million patients), and represent a novel and extensible resource. Enhancing the network for more focused disease areas will require research-driven partnerships represented across all partner sites.
Objective We investigated the common-disease relevant information obtained from sequencing compared with that reported from genotyping arrays. Materials and methods Using 187 publicly available individual human genome...
详细信息
Objective We investigated the common-disease relevant information obtained from sequencing compared with that reported from genotyping arrays. Materials and methods Using 187 publicly available individual human genomes, we constructed genomic disease risk summaries based on 55 common diseases with reported gene-disease associations in the research literature using two different risk models, one based on the product of likelihood ratios and the other on the allelic variant with the maximum associated disease risk. We also constructed risk profiles based on the single nucleotide polymorphisms (SNPs) of these individuals that could be measured or imputed from two common genotyping array platforms. Results We show that the model risk predictions derived from sequencing differ substantially from those obtained from the SNPs measured on commercially available genotyping arrays for several different non-monogenic diseases, although high density genotyping arrays give identical results for many diseases. Conclusions Our approach may be used to compare the ability of different platforms to probe known genetic risks disease by disease.
Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benc...
详细信息
Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. Design The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes;(2) classification of assertions made on the medical problems;(3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. Measurements Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. Results The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first);assertion detection 0.9362 (ranked first);relationship detection 0.7313 (ranked second). Conclusion For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.
Nearly a decade since the completion of the first draft of the human genome, the biomedical community is positioned to usher in a new era of scientific inquiry that links fundamental biological insights with clinical ...
详细信息
Nearly a decade since the completion of the first draft of the human genome, the biomedical community is positioned to usher in a new era of scientific inquiry that links fundamental biological insights with clinical knowledge. Accordingly, holistic approaches are needed to develop and assess hypotheses that incorporate genotypic, phenotypic, and environmental knowledge. This perspective presents translational bioinformatics as a discipline that builds on the successes of bioinformatics and health informatics for the study of complex diseases. The early successes of translational bioinformatics are indicative of the potential to achieve the promise of the Human genome Project for gaining deeper insights to the genetic underpinnings of disease and progress toward the development of a new generation of therapies.
暂无评论