We propose to jointly detect and classify emergency events using a multi-class text classifier, which is a typical deep learning architecture with transformer modules and particularly employs bidirectionalencoder Rep...
详细信息
ISBN:
(纸本)9781665440899
We propose to jointly detect and classify emergency events using a multi-class text classifier, which is a typical deep learning architecture with transformer modules and particularly employs bidirectional encoder representations from transformers(BERT). Deep learning requires a large number of labeled data to work. Meanwhile, deep learning often implements the semi-supervised learning(SSL) method, which is able to use massive unlabeled data to improve performance of supervised deep learning. As an effective SSL variant, unsupervised data augmentation (UDA) focuses on data augmentation techniques to improve the performance of deep learning. We present an enhanced version of UDA(EUDA) by mixing more data augmentation strategies and using a problem related prefilter. Our EUDA targets at emergency event detection and classification. Considering that emergency events always have time and location elements, text can be filtered based on this semantic feature. We propose to use semantic feature aided enhanced unsupervised data augmentation to solve the concerned problem. Empirical studies on the dataset prepared for the task validates that the proposed EUDA can achieve significantly better performance than supervised learning with a limited size of labeled data. Experiments are also carried out on a text classification task, which confirms that EUDA improves performance for BERT neural network.
Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various ...
详细信息
Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. Methods: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared bidirectional and Auto-Regressive transformers (BART) and its variants with bidirectional encoder representations from transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease-related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. Results: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F-1 -score of 0.85 based on the RenD data set. Conclusions: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.
Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons....
详细信息
Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of curated articles likely constitutes only a fraction of all the articles that contain experimentally determined DTIs. Finding such articles and extracting the experimental information is a challenging task, and there is a pressing need for systematic approaches to assist the curation of DTIs. To this end, we applied bidirectional encoder representations from transformers (BERT) to identify such articles. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format. Results: Our novel method identified 0.6 million articles (along with drug and protein information) which are not previously included in public DTI databases. Using 10-fold cross-validation, we obtained similar to 99% accuracy for identifying articles containing quantitative drug-target profiles. The F1 micro for the prediction of assay format is 88%, which leaves room for improvement in future studies. Conclusion: The BERT model in this study is robust and the proposed pipeline can be used to identify previously overlooked articles containing quantitative DTIs. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.
Background Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led som...
详细信息
Background Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development. Results To shed new light on the conflicting views on short eccDNAs' randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65 +/- 4.7%, attention-based models: 83.31 +/- 4.18%). Conclusions The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.
Background: Emerging research has highlighted the potential of virtual reality (VR) as a tool for training health care students and professionals in care skills for individuals with Alzheimer disease and related demen...
详细信息
Background: Emerging research has highlighted the potential of virtual reality (VR) as a tool for training health care students and professionals in care skills for individuals with Alzheimer disease and related dementias (ADRD). However, there is limited research on the use of VR to engage the general public in raising awareness about ADRD. Objective: This research aimed to examine the impact of the VR video “A Walk-Through Dementia” on YouTube users by analyzing their posts. Methods: We collected 12, 754 comments from the VR video series “A Walk-Through Dementia, ” which simulates the everyday challenges faced by individuals with ADRD, providing viewers with an immersive experience of the condition. Topic modeling was conducted to gauge viewer opinions and reactions to the videos. A pretrained bidirectional encoder representations from transformers (BERT) model was used to transform the YouTube comments into high-dimensional vector embeddings, allowing for systematic identification and detailed analysis of the principal topics and their thematic structures within the dataset. Results: We identified the top 300 most frequent words in the dataset and categorized them into nouns, verbs, and adjectives or adverbs using a part-of-speech tagging model, fine-tuned for accurate tagging tasks. The topic modeling process identified eight8 initial topics based on the most frequent words. After manually reviewing the 8 topics and the content of the comments, we synthesized them into 5 themes. The predominant theme, represented in 2917 comments, centered on users’ personal experiences with the impact of ADRD on patients and caregivers. The remaining themes were categorized into 4 main areas: positive reactions to the VR videos, challenges faced by individuals with ADRD, the role of caregivers, and learning from the VR videos. Conclusions: Using topic modeling, this study demonstrated that VR applications serve as engaging and experiential learning tools, offering the public a
Objective:This study aimed to construct an intelligent prescription-generating(IPG)model based on deep-learning natural language processing(NLP)technology for multiple prescriptions in Chinese *** and Methods:We selec...
详细信息
Objective:This study aimed to construct an intelligent prescription-generating(IPG)model based on deep-learning natural language processing(NLP)technology for multiple prescriptions in Chinese *** and Methods:We selected the Treatise on Febrile Diseases and the Synopsis of Golden Chamber as basic datasets with EDA data augmentation,and the Yellow Emperor’s Canon of Internal Medicine,the Classic of the Miraculous Pivot,and the Classic on Medical Problems as supplementary datasets for *** selected the word-embedding model based on the Imperial Collection of Four,the bidirectional encoder representations from transformers(BERT)model based on the Chinese Wikipedia,and the robustly optimized BERT approach(RoBERTa)model based on the Chinese Wikipedia and a general *** addition,the BERT model was fine-tuned using the supplementary datasets to generate a Traditional Chinese Medicine-BERT *** IPG models were constructed based on the pretraining strategy and experiments were *** of precision,recall,and F1-score were used to assess the model *** on the trained models,we extracted and visualized the semantic features of some typical texts from treatise on febrile diseases and investigated the ***:Among all the trained models,the RoBERTa-large model performed the best,with a test set precision of 92.22%,recall of 86.71%,and F1-score of 89.38%and 10-fold cross-validation precision of 94.5%±2.5%,recall of 90.47%±4.1%,and F1-score of 92.38%±2.8%.The semantic feature extraction results based on this model showed that the model was intelligently stratified based on different meanings such that the within-layer’s patterns showed the associations of symptom–symptoms,disease–symptoms,and symptom–punctuations,while the between-layer’s patterns showed a progressive or dynamic symptom and disease ***:Deep-learning-based NLP technology significantly improves the performance of IPG mo
An automated question-answering system allows students to learn as an integral part of digitized learning. This system responds to queries using text. We also include a knowledge graph, which significantly enhances th...
详细信息
An automated question-answering system allows students to learn as an integral part of digitized learning. This system responds to queries using text. We also include a knowledge graph, which significantly enhances the model's intrigue and improves learners’ understanding. The features of knowledge entity extraction, information point evaluation and analysis, knowledge graph construction from unstructured text, and knowledge entity integration are all explored. The question-answering paradigm we suggest in this study uses knowledge graphs and BERT (bidirectional encoder representations from transformers) to provide diverse learners with quick feedback on the subject. In order to facilitate non-native learners’ understanding, we also include an English to Hindi translation. As a result, access to and continued learning can be very beneficial for educators.
Since the turn of the century, as millions of user's opinions are available on the web, sentiment analysis has become one of the most fruitful research fields in Natural Language Processing (NLP). Research on sent...
详细信息
Since the turn of the century, as millions of user's opinions are available on the web, sentiment analysis has become one of the most fruitful research fields in Natural Language Processing (NLP). Research on sentiment analysis has covered a wide range of domains such as economy, polity, and medicine, among others. In the pharmaceutical field, automatic analysis of online user reviews allows for the analysis of large amounts of user's opinions and to obtain relevant information about the effectiveness and side effects of drugs, which could be used to improve pharmacovigilance systems. Throughout the years, approaches for sentiment analysis have progressed from simple rules to advanced machine learning techniques such as deep learning, which has become an emerging technology in many NLP tasks. Sentiment analysis is not oblivious to this success, and several systems based on deep learning have recently demonstrated their superiority over former methods, achieving state-of-the-art results on standard sentiment analysis datasets. However, prior work shows that very few attempts have been made to apply deep learning to sentiment analysis of drug reviews. We present a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) and Long short-term memory (LSTM) recurrent neural networks. We propose several combinations of these models and also study the effect of different pre-trained word embedding models. As transformers have revolutionized the NLP field achieving state-of-art results for many NLP tasks, we also explore bidirectional encoder representations from transformers (BERT) with a Bi-LSTM for the sentiment analysis of drug reviews. Our experiments show that the usage of BERT obtains the best results, but with a very high training time. On the other hand, CNN achieves acceptable results while requiring less training time.
Traditional named entity recognition methods mainly explore the application of hand-crafted features. Currently, with the popularity of deep learning, neural networks have been introduced to capture deep features for ...
详细信息
Traditional named entity recognition methods mainly explore the application of hand-crafted features. Currently, with the popularity of deep learning, neural networks have been introduced to capture deep features for named entity recognition. However, most existing methods only aim at modern corpus. Named entity recognition in ancient literature is challenging because names in it have evolved over time. In this paper, we attempt to recognise entities by exploring the characteristics of characters and strokes. The enhanced character embedding model, named ECEM, is proposed on the basis of bidirectional encoder representations from transformers and strokes. First, ECEM can generate the semantic vectors dynamically according to the context of the words. Second, the proposed algorithm introduces morphological-level information of Chinese words. Finally, the enhanced character embedding is fed into the bidirectional long short term memory-conditional random field model for training. To explore the effect of our proposed algorithm, experiments are carried out on both ancient literature and modern corpus. The results indicate that our algorithm is very effective and powerful, compared with traditional ones.
Background: The current COVID-19 crisis underscores the importance of preprints, as they allow for rapid communication of research results without delay in review. To fully integrate this type of publication into libr...
详细信息
Background: The current COVID-19 crisis underscores the importance of preprints, as they allow for rapid communication of research results without delay in review. To fully integrate this type of publication into library information systems, we developed preview: a publicly available, central search engine for COVID-19-related preprints, which clearly distinguishes this source from peer-reviewed publications. The relationship between the preprint version and its corresponding journal version should be stored as metadata in both versions so that duplicates can be easily identified and information overload for researchers is reduced. Objective: In this work, we investigated the extent to which the relationship information between preprint and corresponding journal publication is present in the published metadata, how it can be further completed, and how it can be used in preVIEW to identify already republished preprints and filter those duplicates in search results. Methods: We first analyzed the information content available at the preprint servers themselves and the information that can be retrieved via Crossref. Moreover, we developed the algorithm Pre2Pub to find the corresponding reviewed article for each preprint. We integrated the results of those different resources into our search engine preVIEW, presented the information in the result set overview, and added filter options accordingly. Results: Preprints have found their place in publication workflows;however, the link from a preprint to its corresponding journal publication is not completely covered in the metadata of the preprint servers or in Crossref. Our algorithm Pre2Pub is able to find approximately 16% more related journal articles with a precision of 99.27%. We also integrate this information in a transparent way within preVIEW so that researchers can use it in their search. Conclusions: Relationships between the preprint version and its journal version is valuable information that can help research
暂无评论