Background: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research ...
详细信息
Background: The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in a...
详细信息
Background: The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors' writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning-based institution name normalization model can increase the accuracy of institution name normalization at the semantic level. Objective: This study aimed to train a deep learning-based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization. Methods: In this study, an institution name normalization-oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus. Results: It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it
Background: Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for clas...
详细信息
Background: Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for classifying distinct levels within community-engaged research (CEnR). CEnR is a collaborative approach between academics and community partners with the aim of conducting research that is relevant to community needs while incorporating diverse forms of expertise. In the field of deep learning and artificial intelligence (AI), training multiple models to obtain the highest validation accuracy is common practice;however, it can overfit toward that specific data set and not generalize well to a real-world population, which creates issues of bias and potentially dangerous algorithmic decisions. Consequently, if we plan on automating human decision-making, there is a need for creating techniques and exhaustive evaluative processes for these powerful unexplainable models to ensure that we do not incorporate and blindly trust poor AI models to make real-world ***: We aimed to conduct an evaluation study to see whether our most accurate transformer-based models derived from previous studies could emulate our own classification spectrum for tracking CEnR studies as well as whether the use of calibrated confidence scores was ***: We compared the results from 3 domain experts, who classified a sample of 45 studies derived from our university's institutional review board database, with those from 3 previously trained transformer-based models, as well as investigated whether calibrated confidence scores can be a viable technique for using AI in a support role for complex decision-making ***: Our findings reveal that certain models exhibit an overestimation of their performance through high confidence scores, despite not achieving the highest validation ***: Future studies should be conducted with larger sample
Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various ...
详细信息
Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. Methods: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared bidirectional and Auto-Regressive transformers (BART) and its variants with bidirectional encoder representations from transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease-related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. Results: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F-1 -score of 0.85 based on the RenD data set. Conclusions: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.
Objective: To develop and apply a natural language processing (NLP)-based approach to analyze public sentiments on social media and their geographic pattern in the United States toward coronavirus disease 2019 (COVID-...
详细信息
Objective: To develop and apply a natural language processing (NLP)-based approach to analyze public sentiments on social media and their geographic pattern in the United States toward coronavirus disease 2019 (COVID-19) vaccination. We also aim to provide insights to facilitate the understanding of the public attitudes and concerns regarding COVID-19 ***: We collected Tweet posts by the residents in the United States after the dissemination of the COVID-19 vaccine. We performed sentiment analysis based on the bidirectional encoder representations from transformers (BERT) and qualitative content analysis. Time series models were leveraged to describe sentiment trends. Key topics were analyzed longitudinally and ***: A total of 3 198 686 Tweets related to COVID-19 vaccination were extracted from January 2021 to February 2022. 2 358 783 Tweets were identified to contain clear opinions, among which 824 755 (35.0%) expressed negative opinions towards vaccination while 1 534 028 (65.0%) demonstrated positive opinions. The accuracy of the BERT model was 79.67%. The key hashtag-based topics include Pfizer, breaking, wearamask, and smartnews. The sentiment towards vaccination across the states showed manifest variability. Key barriers to vaccination include mistrust, hesitancy, safety concern, misinformation, and ***: We found that opinions toward the COVID-19 vaccination varied across different places and over time. This study demonstrates the potential of an analytical pipeline, which integrates NLP-enabled modeling, time series, and geospatial analyses of social media data. Such analyses could enable real-time assessment, at scale, of public confidence and trust in COVID-19 vaccination, help address the concerns of vaccine skeptics, and provide support for developing tailored policies and communication strategies to maximize uptake.
This article presents a novel approach for the citation network construction from Jewish Responsa literature based on automatic extraction of references from texts. Jewish Responsa literature contains thousands of ans...
详细信息
This article presents a novel approach for the citation network construction from Jewish Responsa literature based on automatic extraction of references from texts. Jewish Responsa literature contains thousands of answers to questions related to Jewish law (Halachah), spanning over 1,300 years by authors from all over the world. This literature is abundant with references, but because of their high lexical and format variability their automatic identification and extraction is very challenging. In this article we present a novel, multi layered approach that splits the reference extraction task into two main subtasks: i) reference boundaries’ identification; ii) reference internal components’ identification. We experimented with several different machine learning models: CRF (Conditional Random Field) model, BERT (bidirectional encoder representations from transformers) model, and a combined approach, BERT-CRF. Additionally, we examined the influence of the training corpus on the model’s accuracy by comparing the performance of the models trained on modern Hebrew vs. Rabbinic Hebrew. We found that the best results were achieved by a BERT-CRF model trained on Rabbinic Hebrew. The constructed network can be utilized to build various tools for analyzing trends and influences in the Jewish Halachic corpus, such as the most influencing authors, the authors’ sources of authority, and their evolution over time and place.
暂无评论