BACKGROUND:Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and d...
详细信息
BACKGROUND:Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and decoder models trained for Named Entity Recognition (NER) of clinical parameters in pathology and radiology reports, highlighting the applicability of Large Language Models (LLMs) for this task.
METHODS:Three NER methods were evaluated: (1) flat NER using transformer-based models, (2) nested NER with a multi-task learning setup, and (3) instruction-based NER utilizing LLMs. A dataset of 2013 pathology reports and 413 radiology reports, annotated by medical students, was used for training and testing.
RESULTS:The performance of encoder-based NER models (flat and nested) was superior to that of LLM-based approaches. The best-performing flat NER models achieved F1-scores of 0.87-0.88 on pathology reports and up to 0.78 on radiology reports, while nested NER models performed slightly lower. In contrast, multiple LLMs, despite achieving high precision, yielded significantly lower F1-scores (ranging from 0.18 to 0.30) due to poor recall. A contributing factor appears to be that these LLMs produce fewer but more accurate entities, suggesting they become overly conservative when generating outputs.
CONCLUSION:LLMs in their current form are unsuitable for comprehensive entity extraction tasks in clinical domains, particularly when faced with a high number of entity types per document, though instructing them to return more entities in subsequent refinements may improve recall. Additionally, their computational overhead does not provide proportional performance gains. Encoder-based NER models, particularly those pre-trained on biomedical data, remain the preferred choice for extracting information from unstructured medical documents.
Anthropogenic emissions of black carbon (BC) aerosols are generally thought to warm the climate. However, the magnitude of this warming remains highly uncertain due to limited knowledge of BC sources; optical properti...
详细信息
Anthropogenic emissions of black carbon (BC) aerosols are generally thought to warm the climate. However, the magnitude of this warming remains highly uncertain due to limited knowledge of BC sources; optical properties; and atmospheric processes such as transport, removal, and cloud interactions. Here, we assess and constrain estimates of the historical warming influence of BC using recent observations and emission inventories. Based on simulations from four climate models, we show that the current global mean surface temperature change from anthropogenic BC due to aerosol-radiation interaction spans a factor of three—from +0.02 ± 0.02 K to +0.06 ± 0.05 K. Rapid atmospheric adjustments reduce the instantaneous radiative forcing by nearly 50% (multi-model mean), substantially lowering the net warming. Yet, recent satellite constraints suggest a stronger effect, highlighting the need for a more comprehensive reassessment of BC’s climate influence.
SummaryDespite the global investment in One Health disease surveillance, it remains difficult and costly to identify and monitor the wildlife reservoirs of novel zoonotic viruses. Statistical models can guide sampling...
SummaryDespite the global investment in One Health disease surveillance, it remains difficult and costly to identify and monitor the wildlife reservoirs of novel zoonotic viruses. Statistical models can guide sampling target prioritisation, but the predictions from any given model might be highly uncertain; moreover, systematic model validation is rare, and the drivers of model performance are consequently under-documented. Here, we use the bat hosts of betacoronaviruses as a case study for the data-driven process of comparing and validating predictive models of probable reservoir hosts. In early 2020, we generated an ensemble of eight statistical models that predicted host–virus associations and developed priority sampling recommendations for potential bat reservoirs of betacoronaviruses and bridge hosts for SARS-CoV-2. During a time frame of more than a year, we tracked the discovery of 47 new bat hosts of betacoronaviruses, validated the initial predictions, and dynamically updated our analytical pipeline. We found that ecological trait-based models performed well at predicting these novel hosts, whereas network methods consistently performed approximately as well or worse than expected at random. These findings illustrate the importance of ensemble modelling as a buffer against mixed-model quality and highlight the value of including host ecology in predictive models. Our revised models showed an improved performance compared with the initial ensemble, and predicted more than 400 bat species globally that could be undetected betacoronavirus hosts. We show, through systematic validation, that machine learning models can help to optimise wildlife sampling for undiscovered viruses and illustrates how such approaches are best implemented through a dynamic process of prediction, data collection, validation, and updating.
The health care sector can benefit considerably from developments in digital technology. Consequently, eHealth applications are rapidly increasing in number and sophistication. For successful development and implement...
详细信息
Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) that decides the next set of experiments. Taken to their ultimate expression, SDLs could usher a new paradigm of scientifi...
详细信息
Background: systems Medicine is a novel approach to medicine, that is, an interdisciplinary field that considers the human body as a system, composed of multiple parts and of complex relationships at multiple levels, ...
详细信息
Background: Diabetes is one of the leading causes of death and disability worldwide, and affects people regardless of country, age group, or sex. Using the most recent evidentiary and analytical framework from the Glo...
Background: Diabetes is one of the leading causes of death and disability worldwide, and affects people regardless of country, age group, or sex. Using the most recent evidentiary and analytical framework from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD), we produced location-specific, age-specific, and sex-specific estimates of diabetes prevalence and burden from 1990 to 2021, the proportion of type 1 and type 2 diabetes in 2021, the proportion of the type 2 diabetes burden attributable to selected risk factors, and projections of diabetes prevalence through 2050. Methods: Estimates of diabetes prevalence and burden were computed in 204 countries and territories, across 25 age groups, for males and females separately and combined;these estimates comprised lost years of healthy life, measured in disability-adjusted life-years (DALYs;defined as the sum of years of life lost [YLLs] and years lived with disability [YLDs]). We used the Cause of Death Ensemble model (CODEm) approach to estimate deaths due to diabetes, incorporating 25 666 location-years of data from vital registration and verbal autopsy reports in separate total (including both type 1 and type 2 diabetes) and type-specific models. Other forms of diabetes, including gestational and monogenic diabetes, were not explicitly modelled. Total and type 1 diabetes prevalence was estimated by use of a Bayesian meta-regression modelling tool, DisMod-MR 2.1, to analyse 1527 location-years of data from the scientific literature, survey microdata, and insurance claims;type 2 diabetes estimates were computed by subtracting type 1 diabetes from total estimates. Mortality and prevalence estimates, along with standard life expectancy and disability weights, were used to calculate YLLs, YLDs, and DALYs. When appropriate, we extrapolated estimates to a hypothetical population with a standardised age structure to allow comparison in populations with different age structures. We used the comparative r
OBJECTIVES:This study aimed to investigate the association between sleep problems and suicidal behaviors as well as healthcare utilization in Canadian adults with chronic diseases, while also examining the mediating r...
OBJECTIVES:This study aimed to investigate the association between sleep problems and suicidal behaviors as well as healthcare utilization in Canadian adults with chronic diseases, while also examining the mediating role of mental illness.
METHODS:data were drawn from the 2015-16 cycle of the Canadian Community Health Survey, specifically from Ontario, Manitoba, and Saskatchewan - the provinces that included the optional sleep module. A total of 22,700 participants aged ≥ 18 years and diagnosed with at least one chronic disease were included in the analysis. Sleep problems were defined as extreme sleep durations (either < 5 or ≥ 10 h) and insomnia. Mental illness was classified as a self-reported mood or anxiety disorder.
RESULTS:Participants with extreme sleep durations (compared to 7 to < 8 h) and those with insomnia (compared to no insomnia) showed a higher prevalence of suicidal ideation, suicidal plans, and increased healthcare utilization. After adjusting for multiple covariates, both extreme sleep durations and insomnia remained significantly associated with increased odds of suicidal ideation, suicidal plans, and healthcare utilization. Mediation analyses indicated that mental illness partially mediated these associations.
CONCLUSIONS:Both extreme sleep durations and insomnia were independently associated with higher odds of suicidal behaviors and increased healthcare utilization in adults with chronic diseases, with mental illness playing a partial mediating role in these relationships.
暂无评论