检索结果-内蒙古大学图书馆

OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and transformers: Deidentification Algorithm Development and Validation Study

引用

Journal of Medical Internet Research 2023年第1期25卷 e48145页

作者： Liu, Jiaxing Gupta, Shalini Chen, Aipeng Wang, Chen-Kai Mishra, Pratik Dai, Hong-Jie Wong, Zoie Shui-Yee Jonnagaddala, Jitendra School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan China CGD Health Pty Ltd Canberra Australia School of Computer Science and Engineering UNSW Sydney Australia Department of Computer Science National Yang Ming Chiao Tung University Hsinchu Taiwan School of Post-Baccalaureate Medicine Kaohsiung Medical University Kaohsiung Taiwan Graduate School of Public Health St. Luke's International University Tokyo Japan The Kirby Institute University of New South Wales Sydney Australia School of Population Health UNSW Sydney Kensington Australia NMC Royal Hospital Khalifa City Abu Dhabi United Arab Emirates

Background: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning-based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. Objective: The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. Methods: In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. Results: The OpenDeID achieved a best F₁-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. Conclusions: The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline. © Jiaxing Liu, Shalini Gupta, Aipeng Chen, Chen-Kai Wang, Pratik Mishra, Hong-Jie Dai, Zoie Shui-Yee Wong, Jitendra Jonnagaddala. Originally published

关键词： anonymization BERT bidirectional encoder representations from transformers deidentification electronic health records scrubbing surrogate generation unstructured EHRs

来源：评论

学校读者我要写书评

暂无评论

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

引用

JMIR FORMATIVE RESEARCH 2023年 7卷 e47434页

作者： Chen, Yifei Li, Xiaoying Li, Aihua Li, Yongjie Yang, Xuemei Lin, Ziluo Yu, Shirui Tang, Xiaoli Chinese Acad Med Sci Inst Med Informat 69 Dongdan North St Beijing 100005 Peoples R China

Background: The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors' writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning-based institution name normalization model can increase the accuracy of institution name normalization at the semantic level. Objective: This study aimed to train a deep learning-based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization. Methods: In this study, an institution name normalization-oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus. Results: It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it

关键词： multisource literature institution name normalization deep learning bidirectional encoder representations from transformers BERT

来源：评论

学校读者我要写书评

暂无评论

Calibrating a Transformer-Based Model's Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study

引用

JMIR FORMATIVE RESEARCH 2023年 7卷 e41516页

作者： Ferrell, Brian Raskin, Sarah E. Zimmerman, Emily B. Virginia Commonwealth Univ Richmond VA USA Virginia Commonwealth Univ L Douglas Wilder Sch Govt & Publ Affairs Richmond VA USA Virginia Commonwealth Univ Ctr Soc & Hlth Richmond VA USA Virginia Commonwealth Univ 907 Floyd Ave Richmond VA 23284 USA

Background: Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for classifying distinct levels within community-engaged research (CEnR). CEnR is a collaborative approach between academics and community partners with the aim of conducting research that is relevant to community needs while incorporating diverse forms of expertise. In the field of deep learning and artificial intelligence (AI), training multiple models to obtain the highest validation accuracy is common practice;however, it can overfit toward that specific data set and not generalize well to a real-world population, which creates issues of bias and potentially dangerous algorithmic decisions. Consequently, if we plan on automating human decision-making, there is a need for creating techniques and exhaustive evaluative processes for these powerful unexplainable models to ensure that we do not incorporate and blindly trust poor AI models to make real-world ***: We aimed to conduct an evaluation study to see whether our most accurate transformer-based models derived from previous studies could emulate our own classification spectrum for tracking CEnR studies as well as whether the use of calibrated confidence scores was ***: We compared the results from 3 domain experts, who classified a sample of 45 studies derived from our university's institutional review board database, with those from 3 previously trained transformer-based models, as well as investigated whether calibrated confidence scores can be a viable technique for using AI in a support role for complex decision-making ***: Our findings reveal that certain models exhibit an overestimation of their performance through high confidence scores, despite not achieving the highest validation ***: Future studies should be conducted with larger sample

关键词： explainable artificial intelligence XAI bidirectional encoder representations from transformers BERT transformer-based models text classification community engagement community-engaged research deep learning decision support trust confidence

来源：评论

学校读者我要写书评

暂无评论

Automated Category and Trend Analysis of Scientific Articles on Ophthalmology Using Large Language Models: Development and Usability Study

引用

JMIR FORMATIVE RESEARCH 2024年 8卷 e52462页

作者： Raja, Hina Munawar, Asim Mylonas, Nikolaos Delsoz, Mohammad Madadi, Yeganeh Elahi, Muhammad Hassan, Amr Abu Serhan, Hashem Inam, Onur Hernandez, Luis Chen, Hao Tran, Sang Munir, Wuqaas Abd-Alrazaq, Alaa Yousefi, Siamak Univ Tennessee Hlth Sci Ctr Dept Ophthalmol 930 Madison AveSte 468 Memphis TN 38111 USA IBM Res Watson Res Ctr New York NY USA Aristotle Univ Thessaloniki Sch Informat Thessaloniki Greece East Tennessee State Univ Quillen Coll Med Johnson City TN USA Univ Calif Irvine Gavin Herbert Eye Inst Sch Med Irvine CA 92697 USA Hamad Med Corp Dept Ophthalmol Doha Qatar Columbia Univ Irving Med Ctr Edward S Harkness Eye Inst Vagelos Coll Phys & Surg New York NY USA Gazi Univ Fac Med Dept Biophys Ankara Turkiye Assoc Prevent Blindness Mexico Mexico City Mexico Univ Tennessee Hlth Sci Ctr Dept Pharmacol Addict Sci & Toxicol Memphis TN USA Univ Maryland Sch Med Dept Ophthalmol & Visual Sci Baltimore MD USA Weill Cornell Med Qatar AI Ctr Precis Hlth Doha Qatar

Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. Methods: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared bidirectional and Auto-Regressive transformers (BART) and its variants with bidirectional encoder representations from transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease-related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. Results: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F-1 -score of 0.85 based on the RenD data set. Conclusions: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.

关键词： bidirectional and Auto-Regressive transformers BART bidirectional encoder representations from transformers BERT ophthalmology text classification large language model LLM trend analysis

来源：评论

学校读者我要写书评

暂无评论

Leveraging natural language processing and geospatial time series model to analyze COVID-19 vaccination sentiment dynamics on Tweets

引用

JAMIA OPEN 2023年第2期6卷 ooad023页

作者： Ye, Jiancheng Hai, Jiarui Wang, Zidan Wei, Chumei Song, Jiacheng Northwestern Univ Feinberg Sch Med Chicago IL USA Johns Hopkins Univ Dept Engn Baltimore MD USA Northwestern Univ Dept Stat Evanston IL USA Univ Hong Kong Dept Diagnost Radiol Hong Kong Peoples R China Northwestern Univ Feinberg Sch Med 633 N St Clair St Chicago IL 60611 USA

Objective: To develop and apply a natural language processing (NLP)-based approach to analyze public sentiments on social media and their geographic pattern in the United States toward coronavirus disease 2019 (COVID-19) vaccination. We also aim to provide insights to facilitate the understanding of the public attitudes and concerns regarding COVID-19 ***: We collected Tweet posts by the residents in the United States after the dissemination of the COVID-19 vaccine. We performed sentiment analysis based on the bidirectional encoder representations from transformers (BERT) and qualitative content analysis. Time series models were leveraged to describe sentiment trends. Key topics were analyzed longitudinally and ***: A total of 3 198 686 Tweets related to COVID-19 vaccination were extracted from January 2021 to February 2022. 2 358 783 Tweets were identified to contain clear opinions, among which 824 755 (35.0%) expressed negative opinions towards vaccination while 1 534 028 (65.0%) demonstrated positive opinions. The accuracy of the BERT model was 79.67%. The key hashtag-based topics include Pfizer, breaking, wearamask, and smartnews. The sentiment towards vaccination across the states showed manifest variability. Key barriers to vaccination include mistrust, hesitancy, safety concern, misinformation, and ***: We found that opinions toward the COVID-19 vaccination varied across different places and over time. This study demonstrates the potential of an analytical pipeline, which integrates NLP-enabled modeling, time series, and geospatial analyses of social media data. Such analyses could enable real-time assessment, at scale, of public confidence and trust in COVID-19 vaccination, help address the concerns of vaccine skeptics, and provide support for developing tailored policies and communication strategies to maximize uptake.

关键词： natural language processing bidirectional encoder representations from transformers sentiment analysis geospatial analysis time series social media COVID-19 vaccination

来源：评论

学校读者我要写书评

暂无评论

Automatic construction of the citation network from the medieval Jewish Responsa literature

引用

Journal on Computing and Cultural Heritage 1000年

作者： Nati Ben-Gigi Maayan Zhitomirsky-Geffet Jonathan Schler Binyamin Katzoff Bar-Ilan University Department Of Information Science Ramat Gan 52900 Israel Holon Institute of Technology School of Computer Science Holon Department of Talmud Ramat Gan 52900 Israel Bar-Ilan University Department of Talmud Ramat Gan 52900 Israel

This article presents a novel approach for the citation network construction from Jewish Responsa literature based on automatic extraction of references from texts. Jewish Responsa literature contains thousands of answers to questions related to Jewish law (Halachah), spanning over 1,300 years by authors from all over the world. This literature is abundant with references, but because of their high lexical and format variability their automatic identification and extraction is very challenging. In this article we present a novel, multi layered approach that splits the reference extraction task into two main subtasks: i) reference boundaries’ identification; ii) reference internal components’ identification. We experimented with several different machine learning models: CRF (Conditional Random Field) model, BERT (bidirectional encoder representations from transformers) model, and a combined approach, BERT-CRF. Additionally, we examined the influence of the training corpus on the model’s accuracy by comparing the performance of the models trained on modern Hebrew vs. Rabbinic Hebrew. We found that the best results were achieved by a BERT-CRF model trained on Rabbinic Hebrew. The constructed network can be utilized to build various tools for analyzing trends and influences in the Jewish Halachic corpus, such as the most influencing authors, the authors’ sources of authority, and their evolution over time and place.

关键词： Conditional Random Fields bidirectional encoder representations from transformers Jewish Rabbinic literature digital humanities

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：