Spam Email is a serious concern which can steal user's personal information and cause huge financial loss due to the increasing rate of internet users. Therefore, the demand for accurate spam filtering has become ...
详细信息
Spam Email is a serious concern which can steal user's personal information and cause huge financial loss due to the increasing rate of internet users. Therefore, the demand for accurate spam filtering has become more sophisticated for the Email spam detection. In the existing techniques, it is difficult to intricate the relationship between words in the Email using certain word embedding techniques and learning rate tuning is one of the greatest challenges of stochastic optimization. To overcome this difficulty, the proposed framework uses diverse ensemble based Email spam classification by incorporating multiple word embedding's with Continuous Coin Betting optimizer. Word2Vec is used to produce the first set of 200D, next set of 200Dword embedding is produced by Glove and 768D is produced by using bidirectional encoder representations from transformers (BERT) respectively. After generating word embedding, then it is classified through diverse ensemble based classifier with base level classifier consists of Long Short Term Memory (LSTM) Networks, Gated Recurrent Unit (GRU) and Bi-directional Gated Recurrent Unit (Bi-GRU) and LSTM as Meta-classifier using COCOB optimizer. Experiments were conducted on 3 benchmark Email dataset and result shows that the proposed system outperforms well with a low false positive rate.
Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons....
详细信息
Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of curated articles likely constitutes only a fraction of all the articles that contain experimentally determined DTIs. Finding such articles and extracting the experimental information is a challenging task, and there is a pressing need for systematic approaches to assist the curation of DTIs. To this end, we applied bidirectional encoder representations from transformers (BERT) to identify such articles. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format. Results: Our novel method identified 0.6 million articles (along with drug and protein information) which are not previously included in public DTI databases. Using 10-fold cross-validation, we obtained similar to 99% accuracy for identifying articles containing quantitative drug-target profiles. The F1 micro for the prediction of assay format is 88%, which leaves room for improvement in future studies. Conclusion: The BERT model in this study is robust and the proposed pipeline can be used to identify previously overlooked articles containing quantitative DTIs. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.
Finding a suitable hotel based on user's need and affordability is a complex decision-making process. Nowadays, the availability of an ample amount of online reviews made by the customers helps us in this regard. ...
详细信息
Finding a suitable hotel based on user's need and affordability is a complex decision-making process. Nowadays, the availability of an ample amount of online reviews made by the customers helps us in this regard. This very fact gives us a promising research direction in the field of tourism called hotel recommendation system which also helps in improving the information processing of consumers. Real-world reviews may showcase different sentiments of the customers towards a hotel and each review can be categorized based on different aspects such as cleanliness, value, service, etc. Keeping these facts in mind, in the present work, we have proposed a hotel recommendation system using Sentiment Analysis of the hotel reviews, and aspect-based review categorization which works on the queries given by a user. Furthermore, we have provided a new rich and diverse dataset of online hotel reviews crawled from ***. We have followed a systematic approach which first uses an ensemble of a binary classification called bidirectional encoder representations from transformers (BERT) model with three phases for positive-negative, neutral-negative, neutral-positive sentiments merged using a weight assigning protocol. We have then fed these pre-trained word embeddings generated by the BERT models along with other different textual features such as word vectors generated by Word2vec, TF-IDF of frequent words, subjectivity score, etc. to a Random Forest classifier. After that, we have also grouped the reviews into different categories using an approach that involves fuzzy logic and cosine similarity. Finally, we have created a recommender system by the aforementioned frameworks. Our model has achieved a Macro F1-score of 84% and test accuracy of 92.36% in the classification of sentiment polarities. Also, the results of the categorized reviews have formed compact clusters. The results are quite promising and much better compared to state-of-the-art models. The relevant codes a
In the era of prevalent online commerce, online reviews significantly influence purchasing decisions. Unfortunately, this has also led to the emergence of fake reviews, which can deceive consumers and undermine trust ...
详细信息
In the era of prevalent online commerce, online reviews significantly influence purchasing decisions. Unfortunately, this has also led to the emergence of fake reviews, which can deceive consumers and undermine trust in online platforms. Our study addresses this issue by developing DenyBERT, a deep learning-based software that enhances the bidirectional encoder representations from transformers (BERT) framework with Deep and Light Transformation (DeLighT) and Knowledge Distillation (KD) techniques. These innovations not only reduce computational demands but also improve the model 's accuracy in identifying fake reviews, making it ideally suited for real-world applications. Significantly, DenyBERT requires only 16.01M parameters-significantly fewer than its predecessors such as BERT and TinyBERT-yet achieves a robust accuracy of 96.12% and an F1-score of 96.47%. This efficiency makes it particularly suited for deployment on devices with limited processing capabilities. The software, developed in Python, features a flexible input mechanism that allows for analyzing reviews directly from websites via URL or via manual input of review paragraphs. Our findings indicate that DenyBERT outperforms existing models in both speed and accuracy, making it a powerful tool for combating fake reviews in real -time scenarios. This advancement not only enhances user trust in online review systems but also supports ecommerce platforms in maintaining a fair and transparent market environment.
Continuous sign language recognition (CSLR) is a very challenging task in intelligent systems, since it requires to produce real-time responses while performing computationally intensive video analytics and language m...
详细信息
Continuous sign language recognition (CSLR) is a very challenging task in intelligent systems, since it requires to produce real-time responses while performing computationally intensive video analytics and language modeling. Previous studies mainly focus on adopting hidden Markov models or recurrent neural networks with a limited capability to model specific sign languages, and the accuracy can drop significantly when recognizing the signs performed by different signers with non-standard gestures or non-uniform speeds. In this work, we develop a deep learning framework named SignBERT, integrating the bidirectional encoder representations from transformers (BERT) with the residual neural network (ResNet), to model the underlying sign languages and extract spatial features for CSLR. We further propose a multimodal version of SignBERT, which combines the input of hand images with an intelligent feature alignment, to minimize the distance between the probability distributions of the recognition results generated by the BERT model and the hand images. Experimental results indicate that when compared to the performance of alternative approaches for CSLR, our method has better accuracy with significantly lower word error rate on three challenging continuous sign language datasets.
An automated question-answering system allows students to learn as an integral part of digitized learning. This system responds to queries using text. We also include a knowledge graph, which significantly enhances th...
详细信息
An automated question-answering system allows students to learn as an integral part of digitized learning. This system responds to queries using text. We also include a knowledge graph, which significantly enhances the model's intrigue and improves learners’ understanding. The features of knowledge entity extraction, information point evaluation and analysis, knowledge graph construction from unstructured text, and knowledge entity integration are all explored. The question-answering paradigm we suggest in this study uses knowledge graphs and BERT (bidirectional encoder representations from transformers) to provide diverse learners with quick feedback on the subject. In order to facilitate non-native learners’ understanding, we also include an English to Hindi translation. As a result, access to and continued learning can be very beneficial for educators.
The dissemination of extremist ideas and causes online has intensified over the last decade. Extremist organizations use social media to gain publicity and new recruits, often with little interference from network pro...
详细信息
The dissemination of extremist ideas and causes online has intensified over the last decade. Extremist organizations use social media to gain publicity and new recruits, often with little interference from network providers. New techniques are being developed to identify extremist content, ensuring it can be promptly removed and its authors blocked from network access. However, most techniques are only compatible with the English language, despite the fact that extremist propaganda is frequently shared in other languages, including Arabic. Since the most effective methods for automated linguistic analysis use deep learning and require large, high-quality datasets, creating specialised data samples containing examples of extremist communication is an essential step toward a practical solution. In this paper, we present a dataset compiled for this purpose and discuss the classification methods that can be used for extremism detection. The manually annotated Arabic Twitter dataset consists of 89,816 tweets published between 2011 and 2021. Using guidelines, three expert annotators labelled the tweets as extremist or non-extremist. Exploratory data analysis was performed to understand the dataset's features. Classification algorithms were used with the dataset, including logistic regression, support vector machine, multinominal naive Bayes, random forest, and BERT. Among the traditional machine learning models, support vector machine with term frequency-inverse document frequency features achieved the highest accuracy (0.9729). However, BERT outperformed the traditional models with an accuracy of 0.9749. This dataset is expected to enhance the accuracy of Arabic online extremism classification in future research, and so we have made it publicly available.
Purpose: Visual acuity (VA) is a critical component of the eye examination but is often only documented in electronic health records (EHRs) as unstructured free-text notes, making it challenging to use in research. Th...
详细信息
Purpose: Visual acuity (VA) is a critical component of the eye examination but is often only documented in electronic health records (EHRs) as unstructured free-text notes, making it challenging to use in research. This study aimed to improve on existing rule-based algorithms by developing and evaluating deep learning models to perform named entity recognition of different types of VA measurements and their lateralities from free-text ophthalmology notes: VA for each of the right and left eyes, with and without glasses correction, and with and without pinhole. Design: Cross-sectional study. Subjects: A total of 319 756 clinical notes with documented VA measurements from approximately 90 000 patients were included. Methods: The notes were split into train, validation, and test sets. bidirectional encoder representations from transformers (BERT) models were fine-tuned to identify VA measurements from the progress notes and included BERT models pretrained on biomedical literature (BioBERT), critical care EHR notes (ClinicalBERT), both (BlueBERT), and a lighter version of BERT with 40% fewer parameters (DistilBERT). A baseline rule-based al-gorithm was created to recognize the same VA entities to compare against BERT models. Main Outcome Measures: Model performance was evaluated on a held-out test set using microaveraged precision, recall, and F1 score for all entities. Results: On the human-annotated subset, BlueBERT achieved the best microaveraged F1 score (F1 = 0.92), followed by ClinicalBERT (F1 = 0.91), DistilBERT (F1 = 0.90), BioBERT (F1 = 0.84), and the baseline model (F1 = 0.83). Common errors included labeling VA in sections outside of the examination portion of the note, difficulties labeling current VA alongside a series of past VAs, and missing nonnumeric VAs. Conclusions: This study demonstrates that deep learning models are capable of identifying VA measurements from free-text ophthalmology notes with high precision and recall, achieving significant perfor
Upholding a secure and accepting digital environment is severely hindered by hate speech and inappropriate information on the internet. A novel approach that combines Convolutional Neural Network with GRU and BERT fro...
详细信息
Upholding a secure and accepting digital environment is severely hindered by hate speech and inappropriate information on the internet. A novel approach that combines Convolutional Neural Network with GRU and BERT fromtransformers proposed for enhancing the identification of offensive content, particularly hate speech. The method utilizes the strengths of both CNN-GRU and BERT models to capture complex linguistic patterns and contextual information present in hate speech. The proposed model first utilizes CNN-GRU to extract local and sequential features from textual data, allowing for effective representation learning of offensive language. Subsequently, BERT, advanced transformer-based model, is employed to capture contextualized representations of the text, thereby enhancing the understanding of detailed linguistic nuances and cultural contexts associated with hate speech. Fine tuning BERT model using hugging face transformer. To execute tests using datasets for hate speech identification that are made accessible to the public and show how well the method works to identify inappropriate content. By assisting with the continuing efforts to prevent the dissemination of hate speech and undesirable language online, the proposed framework promotes a more diverse and secure digital environment. The proposed method is implemented using python. The method achieves 98% competitive performance compared to existing approaches LSTM and RNN, CNN, LSTM and GBAT, showcasing its potential for real-world applications in combating online hate speech. Furthermore, it provides insights into the interpretability of the model's predictions, highlighting key linguistic and contextual factors influencing offensive language detection. The study contributes to advancing hate speech detection by integrating CNN-GRU and BERT models, giving a robust solution for enhancing offensive content identification in online platforms.
As the amount of content that is created on social media is constantly increasing, more and more opinions and sentiments are expressed by people in various subjects. In this respect, sentiment analysis and opinion min...
详细信息
As the amount of content that is created on social media is constantly increasing, more and more opinions and sentiments are expressed by people in various subjects. In this respect, sentiment analysis and opinion mining techniques can be valuable for the automatic analysis of huge textual corpora (comments, reviews, tweets etc.). Despite the advances in text mining algorithms, deep learning techniques, and text representation models, the results in such tasks are very good for only a few high-density languages (e.g., English) that possess large training corpora and rich linguistic resources;nevertheless, there is still room for improvement for the other lower-density languages as well. In this direction, the current work employs various language models for representing social media texts and text classifiers in the Greek language, for detecting the polarity of opinions expressed on social media. The experimental results on a related dataset collected by the authors of the current work are promising, since various classifiers based on the language models (naive bayesian, random forests, support vector machines, logistic regression, deep feed-forward neural networks) outperform those of word or sentence-based embeddings (word2vec, GloVe), achieving a classification accuracy of more than 80%. Additionally, a new language model for Greek social media has also been trained on the aforementioned dataset, proving that language models based on domain specific corpora can improve the performance of generic language models by a margin of 2%. Finally, the resulting models are made freely available to the research community.
暂无评论