In the big data era, data is made in real-time or closer to real-time. Thus, businesses can utilize this ever-growing volume of data for the data-driven or information-driven decision-making process to improve their b...
详细信息
ISBN:
(纸本)9781728130033
In the big data era, data is made in real-time or closer to real-time. Thus, businesses can utilize this ever-growing volume of data for the data-driven or information-driven decision-making process to improve their businesses. Social media, like Twitter, generates an enormous amount of such data. However, social media data are often unstructured and difficult to manage. Hence, this study proposes an effective text data preprocessing technique and develop an algorithm to train the Support Vector Machine (SVM), Deep Learning (DL) and Naive Bayes (NB) classifiers to process Twitter data. We develop an algorithm that weights the sentiment score in terms of weight of hashtag and cleaned text. In this study, we (i) compare different preprocessing techniques on the data collected from Twitter using various techniques such as (stemming, lemmatization and spelling correction) to obtain the efficient method (ii) develop an algorithm to weight the scores of the hashtag and cleaned text to obtain the sentiment. We retrieved N=1,314,000 Twitter data, and we compared the popularity of two products, Google Now and Amazon Alexa. Using our datapreprocessing algorithm and sentiment weight score algorithm, we train SVM, DL, NB models. The results show that stemming technique performed best in terms of computational speed. Additionally, the accuracy of the algorithm was tested against manually sorted sentiments and sentiments produced before text data preprocessing. The result demonstrated that the impact produced by the algorithm was close to the manually annotated sentiments. In terms of model performance, the SVM performed better with the accuracy of 90.3%, perhaps, due to the unstructured nature of Twitter data. Previous studies used conventional techniques;hence, no precise methods were utilized on cleaning the text. Therefore, our approach confirms that proper text data preprocessing technique plays a significant role in the prediction accuracy and computational time of the cla
The paper focuses on preprocessing techniques application to short informal textual documents created in different natural languages. The goal is to evaluate the impact on the quality of the results and computational ...
详细信息
ISBN:
(纸本)9783319184760;9783319184753
The paper focuses on preprocessing techniques application to short informal textual documents created in different natural languages. The goal is to evaluate the impact on the quality of the results and computational complexity of the text mining process designed to reveal knowledge hidden in the data. Extensive number of experiments were carried out with real world textdata with correction of spelling errors, stemming, stop words removal, and their combinations applied. Support vector machine, decision trees, and k-means algorithms as the commonly used methods were considered to analyze the textdata. The text mining quality was generally not influenced significantly, however, the positive impact represented by the decreased computational complexity was observed.
暂无评论