版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Multimedia Information Systems and Advanced Computing Laboratory Sfax University Sfax Technopole Sfa 3021 Tunisia
出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))
年 卷 期:2018年第33卷第6期
页 面:1307-1319页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 08[工学]
主 题:sentiment analysis recursive auto-encoder stacked auto-encoder pointwise mutual information deep embed-ding representation
摘 要:Sentiment analysis, a hot research topic, presents new challenges for understanding users' opinions and judg-ments expressed online. They aim to classify the subjective texts by assigning them a polarity label. In this paper, weintroduce a novel machine learning framework using auto-encoders network to predict the sentiment polarity label at theword level and the sentence level. Inspired by the dimensionality reduction and the feature extraction capabilities of theauto-encoders, we propose a new model for distributed word vector representation "PMI-SA" using as input pointwise-mutual-information "PMI" word vectors. The resulted continuous word vectors are combined to represent a sentence. Anunsupervised sentence embedding method, called Contextual Recursive Auto-Encoders "CoRAE", is also developed forlearning sentence representation. Indeed, CoRAE follows the basic idea of the recursive auto-encoders to deeply composethe vectors of words constituting the sentence, but without relying on any syntactic parse tree. The CoRAE model consistsin combining recursively each word with its context words (neighbors' words: previous and next) by considering the wordorder. A support vector machine classifier with fine-tuning technique is also used to show that our deep compositionalrepresentation model CoRAE improves significantly the accuracy of sentiment analysis task. Experimental results demon-strate that CoRAE remarkably outperforms several competitive baseline methods on two databases, namely, Sanders twittercorpus and Facebook comments corpus. The CoRAE model achieves an efficiency of 83.28% with the Facebook dataset and97.57% with the Sanders dataset.