Objective This paper presents a study of methods for medical literature retrieval for case queries, in which the goal is to retrieve literature articles similar to a given patient case. In particular, it focuses on an...
详细信息
Objective This paper presents a study of methods for medical literature retrieval for case queries, in which the goal is to retrieve literature articles similar to a given patient case. In particular, it focuses on analyzing the performance of state-of-the-art general retrieval methods and improving them by the use of medical thesauri and physician feedback. Materials and Methods The Kullback-Leibler divergence retrieval model with Dirichlet smoothing is used as the state-of-the-art general retrieval method. Pseudorelevance feedback and term weighing methods are proposed by leveraging MeSH and UMLS thesauri. Evaluation is performed on a test collection recently created for the ImageCLEF medical case retrieval challenge. Results Experimental results show that a well-tuned state-of-the-art general retrieval model achieves a mean average precision of 0.2754, but the performance can be improved by over 40% to 0.3980, through the proposed methods. Discussion The results over the ImageCLEF test collection, which is currently the best collection available for the task, are encouraging. There are, however, limitations due to small evaluation set size. The analysis shows that further refinement of the methods is necessary before they can be really useful in a clinical setting. Conclusion Medical case-based literature retrieval is a critical search application that presents a number of unique challenges. This analysis shows that the state-of-the-art general retrieval models are reasonably good for the task, but the performance can be significantly improved by developing new task-specific retrieval models that incorporate medical thesauri and physician feedback.
The rapid advance of gene sequencing technologies has produced an unprecedented rate of discovery of genome variation in humans. A growing number of authoritative clinical repositories archive gene variants and diseas...
详细信息
The rapid advance of gene sequencing technologies has produced an unprecedented rate of discovery of genome variation in humans. A growing number of authoritative clinical repositories archive gene variants and disease phenotypes, yet there are currently many more gene variants that lack clear annotation or disease association. To date, there has been very limited coverage of gene-specific predictors in the literature. Here the evaluation is presented of "gene-specific" predictor models based on a naive Bayesian classifier for 20 gene-disease datasets, containing 3986 variants with clinically characterized patient conditions. The utility of gene-specific prediction is then compared with "all-gene" generalized prediction and also with existing popular predictors. Gene-specific computational prediction models derived from clinically curated gene variant disease datasets often outperform established generalized algorithms for novel and uncertain gene variants.
Objective To demonstrate that a large, heterogeneous clinical database can reveal fine temporal patterns in clinical associations;to illustrate several types of associations;and to ascertain the value of exploiting ti...
详细信息
Objective To demonstrate that a large, heterogeneous clinical database can reveal fine temporal patterns in clinical associations;to illustrate several types of associations;and to ascertain the value of exploiting time. Materials and methods Lagged linear correlation was calculated between seven clinical laboratory values and 30 clinical concepts extracted from resident signout notes from a 22-year, 3-million-patient database of electronic health records. Time points were interpolated, and patients were normalized to reduce inter-patient effects. Results The method revealed several types of associations with detailed temporal patterns. Definitional associations included low blood potassium preceding 'hypokalemia.' Low potassium preceding the drug spironolactone with high potassium following spironolactone exemplified intentional and physiologic associations, respectively. Counterintuitive results such as the fact that diseases appeared to follow their effects may be due to the workflow of healthcare, in which clinical findings precede the clinician's diagnosis of a disease even though the disease actually preceded the findings. Fully exploiting time by interpolating time points produced less noisy results. Discussion Electronic health records are not direct reflections of the patient state, but rather reflections of the healthcare process and the recording process. With proper techniques and understanding, and with proper incorporation of time, interpretable associations can be derived from a large clinical database. Conclusion A large, heterogeneous clinical database can reveal clinical associations, time is an important feature, and care must be taken to interpret the results.
暂无评论