In Q2 2022, educational apps were the second most popular category on the Google Play Store, accounting for 10.47% of the apps available worldwide. This work explores the application of five BERT-based pre-trained mod...
详细信息
In Q2 2022, educational apps were the second most popular category on the Google Play Store, accounting for 10.47% of the apps available worldwide. This work explores the application of five BERT-based pre-trained models with the transformers architecture to classify mobile educational applications. These five models are according to the knowledge field: bert-base-cased, bert-base-uncased, roberta-base, albert-base-v2 and distilbert-base-uncased. This study uses a dataset with educational apps of Google Play, this dataset was enriched with description and category because it lacked this information. In all models, a tokenizer and fine-tuning works were applied for training in the classification task. After training the data, the testing phase was performed in which the models had to go through four training epochs to obtain better results: roberta-base with 81% accuracy, bert-base-uncased with 79% accuracy, bert-base-cased obtained 80% accuracy, albert-base-v2 obtained 78% accuracy and distilbert-base-uncased obtained 76% accuracy.
Recently, there has been an increasing reward to manipulate product/ service reviews, mostly profit-driven, since positive reviews infer high business returns and vice versa. To combat this issue, experts in industry ...
详细信息
Recently, there has been an increasing reward to manipulate product/ service reviews, mostly profit-driven, since positive reviews infer high business returns and vice versa. To combat this issue, experts in industry and researchers recently attempted integrating multi-aspect (reviewer- and review-centric) data features. However, the emotions hidden in the review, the semantic meaning of the review, and data heterogeneity still deserve more study as they are essential indicators of fake content. This study proposed a Deep Hybrid Model for Fake Review Detection incorporating review Texts, Emotions, and Ratings (DHMFRD - TER). Initially, it computes contextualized review text vectors and extraction of emotion indicators representations. Then, the model learns the representation to extract higher-level review features. Finally, contextualized word vectors, ratings, and emotions are concatenated;such a multidimensional feature representation is used to classify reviews. Extensive experiments on three publicly available datasets demonstrate that DHMFRD-TER significantly outperforms state-of-the-art baseline approaches, achieving an accuracy of 0.988, 0.987, and 0.994 in Amazon, Yelp CHI, and OSF datasets, respectively.
Background: Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demons...
详细信息
Background: Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate. Objective: This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on theirapplicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessingthe evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlightingthe potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use. Methods: Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, thisreview searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM DigitalLibrary. Keywords used were (mental healthOR mental illnessOR mental disorderOR psychiatry) AND (large languagemodels). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published inlanguages other than English. Results: In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideationdetection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on otherapplications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues andproviding accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated withclinical use might surpass their benefits. These risks include inconsistencies in generated text;the production of hallucinations;and the absence of a compre
Background: For the provision of optimal care in a suicide prevention helpline, it is important to know what contributes topositive or negative effects on help seekers. Helplines can often be contacted through text-ba...
详细信息
Background: For the provision of optimal care in a suicide prevention helpline, it is important to know what contributes topositive or negative effects on help seekers. Helplines can often be contacted through text-based chat services, which producelarge amounts of text data for use in large-scale analysis. Objective: We trained a machine learning classification model to predict chat outcomes based on the content of the chatconversations in suicide helplines and identified the counsellor utterances that had the most impact on its outputs. Methods: from August 2021 until January 2023, help seekers (N=6903) scored themselves on factors known to be associatedwith suicidality (eg, hopelessness, feeling entrapped, will to live) before and after a chat conversation with the suicide preventionhelpline in the Netherlands (113 Suicide Prevention). Machine learning text analysis was used to predict help seeker scores onthese factors. Using 2 approaches for interpreting machine learning models, we identified text messages from helpers in a chatthat contributed the most to the prediction of the model. Results: According to the machine learning model, helpers'positive affirmations and expressing involvement contributed toimproved scores of the help seekers. Use of macros and ending the chat prematurely due to the help seeker being in an unsafesituation had negative effects on help seekers. Conclusions: This study reveals insights for improving helpline chats, emphasizing the value of an evocative style with questions,positive affirmations, and practical advice. It also underscores the potential of machine learning in helpline chat analysi
The rapid increase in Internet users has increased online concerns such as hate speech, abusive texts, and harassment. In Bangladesh, hate text in Bengali is frequently used on various social media platforms to condem...
详细信息
The rapid increase in Internet users has increased online concerns such as hate speech, abusive texts, and harassment. In Bangladesh, hate text in Bengali is frequently used on various social media platforms to condemn and abuse individuals. However, Research on recognizing hate speech in Bengali texts is lacking. The pervasive negative impact of hate speech on individuals' well-being and the urgent need for effective measures to address hate speech in Bengali texts have created a significant research gap in the Bengali hate speech detection field. This study suggests a technique for identifying hate speech in Bengali social media posts that may harm individuals' sentiments. Our approach utilizes the bidirectional encoder representations from transformers (BERT) architecture to extract Bengali text properties, whereas hate speech is categorized using a Gated Recurrent Units (GRU) model with a Softmax activation function. We propose a new model, G-BERT, that combines both models. We compared our model's performance with several other algorithms and achieved an accuracy, precision, recall, and F1-score of 95.56%, 95.07%, 93.63%, and 92.15%, respectively. Our proposed model outperformed all other classification algorithms tested. Our findings show that the strategy we have suggested is successful in locating hate speech in Bengali texts posted on social media platforms, which can aid in mitigating online hate speech and promoting a more respectful online environment.
Image captioning is known as a fundamental computer vision task aiming to figure out and describe what is happening in an image or image region. Through an image captioning process, it is ensured to describe and defin...
详细信息
Image captioning is known as a fundamental computer vision task aiming to figure out and describe what is happening in an image or image region. Through an image captioning process, it is ensured to describe and define the actions and the relations of the objects within the images. In this manner, the contents of the images can be understood and interpreted automatically by visual computing systems. In this paper, we proposed the TRCaptionNet a novel deep learning-based Turkish image captioning (TIC) model for the automatic generation of Turkish captions. The model we propose essentially consists of a basic image encoder, a feature projection module based on vision transformers, and a text decoder. In the first stage, the system encodes the input images via the CLIP (contrastive language-image pretraining) image encoder. The CLIP image features are then passed through a vision transformer and the final image features to be linked with the textual features are obtained. In the last stage, a deep text decoder exploiting a BERT (bidirectional encoder representations from transformers) based model is used to generate the image cations. Furthermore, unlike the related works, a natural language-based linguistic model called NLLB (No Language Left Behind) was employed to produce Turkish captions from the original English captions. Extensive performance evaluation studies were carried out and widely known image captioning quantification metrics such as BLEU, METEOR, ROUGE-L, and CIDEr were measured for the proposed model. Within the scope of the experiments, quite successful results were observed on MS COCO and Flickr30K datasets, two known and prominent datasets in this field. As a result of the comparative performance analysis by taking the existing reports in the current literature on TIC into consideration, it was witnessed that the proposed model has superior performance and outperforms the related works on TIC so far. Project details and demo links of TRCaptionNet wil
Background: The International Classification of Diseases (ICD), developed by the World Health Organization, standardizes health condition coding to support health care policy, research, and billing, but artificial int...
详细信息
Background: The International Classification of Diseases (ICD), developed by the World Health Organization, standardizes health condition coding to support health care policy, research, and billing, but artificial intelligence automation, while promising, still underperforms compared with human accuracy and lacks the explainability needed for adoption in medical settings. Objective: The potential of large language models for assisting medical coders in the ICD-10 coding was explored through the development of a computer-assisted coding system. This study aimed to augment human coding by initially identifying lead terms and using retrieval-augmented generation (RAG)-based methods for computer-assisted coding enhancement. Methods: The explainability dataset from the CodiEsp challenge (CodiEsp-X) was used, featuring 1000 Spanish clinical cases annotated with ICD-10 codes. A new dataset, CodiEsp-X-lead, was generated using GPT-4 to replace full-textual evidence annotations with lead term annotations. A Robustly Optimized BERT (bidirectional encoder representations from transformers) Pretraining Approach transformer model was fine-tuned for named entity recognition to extract lead terms. GPT-4 was subsequently employed to generate code descriptions from the extracted textual evidence. Using a RAG approach, ICD codes were assigned to the lead terms by querying a vector database of ICD code descriptions with OpenAI's text-embedding-ada-002 model. Results: The fine-tuned Robustly Optimized BERT Pretraining Approach achieved an overall F1-score of 0.80 for ICD lead term extraction on the new CodiEsp-X-lead dataset. GPT-4-generated code descriptions reduced retrieval failures in the RAG approach by approximately 5% for both diagnoses and procedures. However, the overall explainability F1-score for the CodiEsp-X task was limited to 0.305, significantly lower than the state-of-the-art F1-score of 0.633. The diminished performance was partly due to the reliance on code descripti
Plagiarism is a major problem in education, especially in higher education environments. To address this problem, a comprehensive detection method is proposed, utilizing cutting-edge models like bidirectionalencoder ...
详细信息
The authors introduce BERTNN (bidirectional encoder representations from transformers Neural Network), a novel methodology designed to expand affective lexicons, a critical component in sociological research. BERTNN e...
详细信息
The authors introduce BERTNN (bidirectional encoder representations from transformers Neural Network), a novel methodology designed to expand affective lexicons, a critical component in sociological research. BERTNN estimates the affective meanings and their distribution for new concepts, bypassing the need for extensive surveys by leveraging their contextual usage in language. The cornerstone of BERTNN is the use of nuanced word embeddings frombidirectional encoder representations from transformers. BERTNN uniquely encodes words within the framework of synthesized social event sentences, preserving their meaning across actor-behavior-object positions. The model is fine-tuned on the basis of the implied sentiment changes, providing a more refined estimation of affective meanings. BERTNN outperforms previous approaches, setting a new standard in deriving multidimensional affective meanings for novel concepts. It efficiently replicates sentiment ratings that traditionally require extensive survey hours, demonstrating the power of automated modeling in sociological research. The expanded affective lexicons that can be produced with BERTNN cater to shifting cultural meanings and diverse subgroups, demonstrating the potential of computational linguistics to enrich the measurement tools in sociological research. This article underscores the novelty and significance of BERTNN in the broader context of sociological methodology.
Topic modeling is a popular machine learning technique in natural language processing for identifying themes within unstructured text. One of the most prominent methods for this purpose is Latent Dirichlet Allocation ...
详细信息
暂无评论