Purpose: To compare the performance of 3 phenotyping methods in identifying diabetic retinopathy (DR) and related clinical conditions. Design: Three phenotyping methods were used to identify clinical conditions includ...
详细信息
Purpose: To compare the performance of 3 phenotyping methods in identifying diabetic retinopathy (DR) and related clinical conditions. Design: Three phenotyping methods were used to identify clinical conditions including unspecified DR, nonproliferative DR (NPDR) (mild, moderate, severe), consolidated NPDR (unspecified DR or any NPDR), proliferative DR, diabetic macular edema (DME), vitreous hemorrhage, retinal detachment (RD) (tractional RD or combined tractional and rhegmatogenous RD), and neovascular glaucoma (NVG). The first method used only International Classification of Diseases, 10th Revision (ICD-10) diagnosis codes (ICD-10 Lookup System). The next 2 methods used a bidirectional encoder representations from transformers with a dense Multilayer Perceptron output layer natural language processing (NLP) framework. The NLP framework was applied either to free-text of provider notes (Text-Only NLP System) or both free-text and ICD-10 diagnosis codes (Text-and-International Classification of Diseases [ICD] NLP System). Subjects: Adults >= 18 years with diabetes mellitus seen at the Wilmer Eye Institute. Methods: We compared the performance of the 3 phenotyping methods in identifying the DR related conditions with gold standard chart review. We also compared the estimated disease prevalence using each method. Main Outcome Measures: Performance of each method was reported as the macro F1 score. The agreement between the methods was calculated using the kappa statistic. Prevalence estimates were also calculated for each method. Results: A total of 91 097 patients and 692 486 office visits were included in the study. Compared with the gold standard, the Text-and-ICD NLP System had the highest F1 score for most clinical conditions (range 0.39-0.64). The agreement between the ICD-10 Lookup System and Text-Only NLP System varied (kappa of 0.21-0.81). The prevalence of DR and related conditions ranged from 1.1% for NVG to 17.9% for DME (using the Text-and-ICD NLP System)
With the rapid increase of Arabic content on the web comes an increased need for short and accurate answers to queries. Machine question answering has appeared as an important emerging field for progress in natural la...
详细信息
With the rapid increase of Arabic content on the web comes an increased need for short and accurate answers to queries. Machine question answering has appeared as an important emerging field for progress in natural language processing techniques. Machine learning performance surpasses that of humans in some areas, such as natural language processing and text analysis, especially with large amounts of data. There are two main contributions of this research. First, we propose the Tawasul Arabic question similarity (TAQS) system with four Arabic semantic question similarity models using deep learning techniques. Second, we curated and used an Arabic customer service question-similarity dataset with a 44,404 entries of question-answer pairs, called "Tawasul." For TAQS, first, we use transfer learning to extract the contextualized bidirectional encoder representations from transformers (BERT) embedding with bidirectional long short-term memory (BiLSTM) in two different ways. Specifically, we propose two architectures: the BERT contextual representation with BiLSTM (BERT-BiLSTM) and the hybrid transfer BERT contextual representation with BiLSTM (HT-BERT-BiLSTM). The hybrid transfer representation combines two transfer learning techniques. Second, we fine-tuned two versions of bidirectional encoder representations from transformers for Arabic language (AraBERT). The results show that the HT-BERT-BiLSTM with the features of Layer 12 reaches an accuracy of 94.45%, where the fine-tuning of AraBERTv2 and AraBERTv0.2 achieve 93.10% and 93.90% accuracy, respectively, for the Tawasul dataset. Our proposed TAQS model surpassed the performance of the state-of-the-art BiLSTM with SkipGram by a gain of 43.19% in accuracy.
Text Classification is an important research area in natural language processing (NLP) that has received a considerable amount of scholarly attention in recent years. However, real Chinese online news is characterized...
详细信息
Text Classification is an important research area in natural language processing (NLP) that has received a considerable amount of scholarly attention in recent years. However, real Chinese online news is characterized by long text, a large amount of information and complex structure, which also reduces the accuracy of Chinese long text classification as a result. To improve the accuracy of long text classification of Chinese news, we propose a BERT-based local feature convolutional network (LFCN) model including four novel modules. First, to address the limitation of bidirectional encoder representations from transformers (BERT) on the length of the max input sequence, we propose a named Dynamic LEAD-n (DLn) method to extract short texts within the long text based on the traditional LEAD digest algorithm. In Text-Text encoder (TTE) module, we use BERT pretrained language model to complete the sentence-level feature vector representation of a news text and to capture global features by using the attention mechanism to identify correlated words in text. After that, we propose a CNN-based local feature convolution (LFC) module to capture local features in text, such as key phrases. Finally, the feature vectors generated by the different operations over several different periods are fused and used to predict the category of a news text. Experimental results show that the new method further improves the accuracy of long text classification of Chinese news.
In deep reinforcement learning, sampling ineffi-ciency is addressed by mimicking human learning which lever-Ages past experiences stored in the hippocampus. Integrating this idea, the proposed approach utilizes a task...
详细信息
Objectives: This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-re...
详细信息
Objectives: This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life. Methods: We tested the ability of 4 NLP models to accurately classify text from interview transcripts as "symptom," "quality of life impact" and "other." Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score. Results: NLP models accurately classified multiclass text from patient interviews. The bidirectional encoder representations from transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the bidirectional encoder representations from transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations. Conclusions: NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.
There is the minimal restriction to users' speech in cyberspace. The Internet provides a space where people can freely present their speech, which puts a Utopian sense of freedom of speech into practice. However, ...
详细信息
There is the minimal restriction to users' speech in cyberspace. The Internet provides a space where people can freely present their speech, which puts a Utopian sense of freedom of speech into practice. However, the appearance of hate speech is a significant side effect of online freedom of speech. Some users use hate speech to attack others, making the attacked targets uncomfortable. The proliferation of hate speech poses severe challenges to cyber society. Users may hope that social media platforms and online communities promote anti-hate speech. However, hate speech detection is still a developing technology that requires system developers to create a method to detect unacceptable hate speech while maintaining the online freedom of speech environment. No excellence detection approach has yet been proposed, although some literature has focused on it. The current study proposes an approach to build a political hate speech lexicon and train artificial intelligence classifiers to detect hate speech. Our academic and practical contributions include the collection of a Chinese hate speech dataset, creating a Chinese hate speech lexicon, and developing both a deep learning-based and a lexicon-based approach to detect Chinese hate speech. Although we focus on Chinese hate speech detection, our proposed hate speech detection system and hate speech lexicon development approach can also be used for other languages.
Continuous sign language recognition (CSLR) is a challenging task involving various signal processing techniques to infer the sequences of glosses performed by signers. Existing approaches in CSLR typically use multip...
详细信息
Continuous sign language recognition (CSLR) is a challenging task involving various signal processing techniques to infer the sequences of glosses performed by signers. Existing approaches in CSLR typically use multiple input modalities such as the raw video data and the extracted hand images to improve their recognition accuracy. However, the large modality differences make it difficult to define an integrative framework to effectively exchange and combine the knowledge obtained from different modalities such that they can complement each other for improving the framework's robustness against the gesture variations and background noises in CSLR. To address this issue, we propose a novel cross-attention deep learning framework named the CA-SignBERT. This framework utilizes multiple bidirectional encoder representations from transformers (BERT) models to analyze the information from different modalities. Among these BERT models, we introduce a special cross-attention mechanism to ensure an efficient inter-modality knowledge exchange. Besides, an innovative weight control module is proposed to dynamically hybridize their outputs. Experimental results reveal that the CA-SignBERT framework attains state-of-the-art performance in four benchmark CSLR datasets.
This research applies a pre-trained bidirectional encoder representations from transformers (BERT) handwriting recognition model to predict foreign Korean-language learners' writing scores. A corpus of 586 answers...
详细信息
This research applies a pre-trained bidirectional encoder representations from transformers (BERT) handwriting recognition model to predict foreign Korean-language learners' writing scores. A corpus of 586 answers to midterm and final exams written by foreign learners at the Intermediate 1 level was acquired and used for pre-training, resulting in consistent performance, even with small datasets. The test data were pre-processed and fine-tuned, and the results were calculated in the form of a score prediction. The difference between the prediction and actual score was then calculated. An accuracy of 95.8% was demonstrated, indicating that the prediction results were strong overall;hence, the tool is suitable for the automatic scoring of Korean written test answers, including grammatical errors, written by foreigners. These results are particularly meaningful in that the data included written language text produced by foreign learners, not native speakers.
Elucidating the mechanisms of Compound-Protein Interactions (CPIs) plays an essential role in drug discovery and development. Many computational efforts have been done to accelerate the development of this field. Howe...
详细信息
Elucidating the mechanisms of Compound-Protein Interactions (CPIs) plays an essential role in drug discovery and development. Many computational efforts have been done to accelerate the development of this field. However, the current predictive performance is still not satisfactory, and existing methods consider only protein and compound features, ignoring their interactive information. In this study, we propose a multi-view deep learning method named MDL-CPI for CPI prediction. To sufficiently extract discriminative information, we introduce a hybrid architecture that leverages BERT (bidirectional encoder representations from transformers) and CNN (Convolutional Neural Network) to extract protein features from a sequential perspective, use the GNN (Graph Neural Networks) to extract compound features from a structural perspective, and generate a unified feature space by using AE2 (Autoencoder in Autoencoder Networks) network to learn the interactive information between BERT-CNN and Graph embeddings. Comparative results on benchmark datasets show that our proposed method exhibits better performance compared to existing CPI prediction methods, demonstrating the strong predictive ability of our model. Importantly, we demonstrate that the learned interactive information between compounds and proteins is critical to improve predictive performance. We release our source code and dataset at: https://***/Longwt123/MDL-CPI.
暂无评论