Since 2010, the international JEM-EUSO (Joint Exploratory Missions for Extreme Universe Space Observatory) Collaboration has been developing an ambitious program with the support of major International and National Sp...
详细信息
Measuring sentiment in social media text has become an important practice in studying emotions at the macroscopic level. However, this approach can suffer from methodological issues like sampling biases and measuremen...
详细信息
Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general ...
详细信息
Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence interaction systems, particularly in the healthcare engineering domain. However, a general method for quantify...
详细信息
Background: Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective: This study aimed to e...
详细信息
Background: Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective: This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods: We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results: Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001
Objectives: In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. Th...
详细信息
Objectives: In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third "undecided" category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association *** and Methods: We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated through a wide range of simulation settings and real-world applications in Alzheimer's disease and related dementias using EHR data from the OneFlorida+ Clinical Research ***: Compared to estimation based on random sampling, our augmented method improved mean square error by up to 43.5% in simulation studies;compared to estimation using trinary chart-reviewed phenotypes only, our method improved efficiency by up to 32.1% in the real-world data ***: Our simulation studies and real-world application demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical ***: The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to
Recently, computational drug repurposing has emerged as a promising method for identifying new interventions for diseases. This study predicts novel drugs for Alzheimer's disease (AD) through link prediction on ou...
Recently, computational drug repurposing has emerged as a promising method for identifying new interventions for diseases. This study predicts novel drugs for Alzheimer's disease (AD) through link prediction on our developed biomedical knowledge graph. We constructed a comprehensive knowledge graph containing AD concepts and various potential interventions, called ADInt, by integrating a dietary supplement (DS) domain knowledge graph, SuppKG, with semantic triples from SemMedDB database. Four knowledge graph embedding models (TransE, RotatE, DistMult and ComplEX) and two graph convolutional network models (R-GCN and CompGCN) were compared to learn the representation of ADInt. R-GCN outperformed other models by evaluating on the time slice test set and the clinical trial test set, and was used to generate the score tables for the link prediction task. According to the results of link prediction, we proposed candidate drugs for AD. In conclusion, we presented a novel methodology to extend an existing knowledge graph and discover novel drugs for AD. Our method can potentially be applied to other clinical problems, such as discovering drug adverse reactions and drug-drug interactions.
Background Platform trials can evaluate the efficacy of several experimental treatments compared to a control. The number of experimental treatments is not fixed, as arms may be added or removed as the trial progresse...
详细信息
Background Platform trials can evaluate the efficacy of several experimental treatments compared to a control. The number of experimental treatments is not fixed, as arms may be added or removed as the trial progresses. Platform trials are more efficient than independent parallel group trials because of using shared control groups. However, for a treatment entering the trial at a later time point, the control group is divided into concurrent controls, consisting of patients randomised to control when that treatment arm is in the platform, and non-concurrent controls, patients randomised before. Using non-concurrent controls in addition to concurrent controls can improve the trial's efficiency by increasing power and reducing the required sample size, but can introduce bias due to time trends. Methods We focus on a platform trial with two treatment arms and a common control arm. Assuming that the second treatment arm is added at a later time, we assess the robustness of recently proposed model-based approaches to adjust for time trends when utilizing non-concurrent controls. In particular, we consider approaches where time trends are modeled either as linear in time or as a step function, with steps at time points where treatments enter or leave the platform trial. For trials with continuous or binary outcomes, we investigate the type 1 error rate and power of testing the efficacy of the newly added arm, as well as the bias and root mean squared error of treatment effect estimates under a range of scenarios. In addition to scenarios where time trends are equal across arms, we investigate settings with different time trends or time trends that are not additive in the scale of the model. Results A step function model, fitted on data from all treatment arms, gives increased power while controlling the type 1 error, as long as the time trends are equal for the different arms and additive on the model scale. This holds even if the shape of the time trend deviates from a s
Objective: Our study aimed to construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to better represent the often underrepresented physical and psychological CIH approaches in standard te...
详细信息
暂无评论