Fake news is a new phenomenon related to false information and fraud that spreads through online social media or traditional news media. Today, fake news can be easily created and distributed across many social media ...
详细信息
ISBN:
(纸本)9781728186535
Fake news is a new phenomenon related to false information and fraud that spreads through online social media or traditional news media. Today, fake news can be easily created and distributed across many social media platforms and has a widespread impact on the real world. It is critical to develop efficient algorithms and tools for early detection of how false information is disseminated on social media platforms and why it is successful in deceiving users. Most research methods today are based on machine learning, deep learning, feature engineering, graph mining, image and video analysis and newly developed datasets and web services for detecting deceptive content. Therefore, a strong need emerges to find a suitable method that can easily detect false information. A hybrid approach has suggested using the CNN model and RNN-LSTM model to detect false information from this study. first, NLTK toolkit has used to remove stop words, punctuations and special characters from the text. Then the same toolkit applies to tokenize the text and preprocesses the text. From there on, GloVe word embeddings have added to the preprocessed text. Higher-level features of the input text extract from the CNN model using convolutional layers and max-pooling layers. Long-term dependencies between word sequences capture from RNN-LSTM model. The suggested model also applies dropout technology with Dense layers to enhance the efficiency of the hybrid model. Results of the suggested hybrid model have shown that the suggested CNN, RNN-LSTM based Hybrid approach achieves the highest accuracy of 92% by surpassing most of the classical models today with Adam optimizer and Binary Cross-Entropy loss function.
This paper explores two different methods of learning dialectal morphology from a small parallel corpus of standard and dialect-form text, given that a computational description of the standard morphology is available...
详细信息
ISBN:
(纸本)9781618392473
This paper explores two different methods of learning dialectal morphology from a small parallel corpus of standard and dialect-form text, given that a computational description of the standard morphology is available. The goal is to produce a model that translates individual lexical dialectal items to their standard dialect counterparts in order to facilitate dialectal use of available NLP tools that only assume standard-form input. The results show that a learning method based on inductive logic programming quickly converges to the correct model with respect to many phonological and morphological differences that are regular in nature.
graph-based text representation focuses on how text documents are represented as graphs for exploiting dependency information between tokens and documents within a corpus. Despite the increasing interest in graph repr...
详细信息
Many naturallanguageprocessing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments ...
Many naturallanguageprocessing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied another clustering method, that only needs to know the number of classes to build, on the same subset of text segments and we reformulate our evaluation problem in comparing the two classifications. So, we established different criteria to compare them, based either on the words as class descriptors or on the thematic units. Our first results lead to show a great correlation between the two classifications.
Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to naturallanguage data. The first half of the book (Parts I and II) covers the basics o...
详细信息
A tool to automatically generate naturallanguage documentation summaries for methods is presented. The approach uses prior work by the authors on stereotyping methods along with the source code analysis framework src...
详细信息
A tool to automatically generate naturallanguage documentation summaries for methods is presented. The approach uses prior work by the authors on stereotyping methods along with the source code analysis framework srcML. first, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary for each method. This summary is automatically added to the code base as a comment for each method. The predefined templates are designed to produce a generic summary for specific method stereotypes.
Recently, graph convolutional networks (GCNs) for text classification have received considerable attention in naturallanguageprocessing. However, most current methods just use original documents and words in the cor...
详细信息
Recently, graph convolutional networks (GCNs) for text classification have received considerable attention in naturallanguageprocessing. However, most current methods just use original documents and words in the corpus to construct the topology of graph which may lose some effective information. In this paper, we propose a Multi-Stream graph Convolutional Network (MS-GCN) for text classification via Representative-Word Document (RWD) mining, which is implemented in PyTorch. In the proposed method, we first introduce temporary labels and mine the RWDs which are treated as additional documents in the corpus. Then, we build a heterogeneous graphbased on relations among a Group of RWDs (GRWDs), words and original documents. Furthermore, we construct the MS-GCN based on multiple heterogeneous graphs according to different GRWDs. Finally, we optimize our MS-GCN model through updated mechanism of GRWDs. We evaluate the proposed approach on six text classification datasets, 20NG, R8, R52, Ohsumed, MR and Pheme. Extensive experiments on these datasets show that our proposed approach outperforms state-of-the-art methods for text classification.
Opinion summarization can facilitate user's decision-making by mining the salient review information. However, due to the lack of sufficient annotated data, most of the early works are based on extractive methods,...
详细信息
Opinion summarization can facilitate user's decision-making by mining the salient review information. However, due to the lack of sufficient annotated data, most of the early works are based on extractive methods, which restricts the performance of opinion summarization. In this work, we aim to improve the informativeness of opinion summarization to provide better guidance to users. We consider the setting with only reviews without corresponding summaries, and propose an aspect-augmented model for unsupervised abstractive opinion summarization, denoted as AsU-OSum. We first employ an aspect-based sentiment analysis system to extract opinion phrases from reviews. Then, we construct a heterogeneous graph consisting of reviews and opinion clusters as nodes, which is used to enhance the Transformer-based encoder-decoder framework. Furthermore, we design a novel cascaded attention mechanism to prompt the decoder to pay more attention to the aspects that are more likely to appear in summary. During training, we introduce a sentiment accuracy reward that further enhances the learning ability of our model. We conduct comprehensive experiments on the Yelp, Amazon, and Rotten Tomatoes datasets. Automatic evaluation results show that our model is competitive and performs better than the state-of-the-art (SOTA) models on some ROUGE metrics. Human evaluation results further verify that our model can generate more informative summaries and reduce redundancy.
Topic modeling is a key research area in naturallanguageprocessing and has inspired innovative studies in a wide array of social-science disciplines. Yet, the use of topic modeling in computational social science ha...
详细信息
Topic modeling is a key research area in naturallanguageprocessing and has inspired innovative studies in a wide array of social-science disciplines. Yet, the use of topic modeling in computational social science has been hampered by two critical issues. first, social scientists tend to focus on a few standard ways of topic modeling. Our understanding of semantic patterns has not been informed by rapid methodological advances in topic modeling. Moreover, a systematic comparison of the performance of different methods in this field is warranted. Second, the choice of the optimal number of topics remains a challenging task. A comparison of topic-modeling techniques has rarely been situated in a social-science context and the choice appears to be arbitrary for most social scientists. based on about 120,000 Canadian newspaper articles since 1977, we review and compare eight traditional, generative, and neural methods for topic modeling (Latent Semantic Analysis, Principal Component Analysis, Factor Analysis, Non-negative Matrix Factorization, Latent Dirichlet Allocation, Neural Autoregressive Topic Model, Neural Variational Document Model, and Hierarchical Dirichlet Process). Three measures (coherence statistics, held-out likelihood, and graph-based dimensionality selection) are then used to assess the performance of these methods. Findings are presented and discussed to guide the choice of topic-modeling methods, especially in social science research. (C) 2020 Elsevier Inc. All rights reserved.
暂无评论