The proposed work delves into how recommender systems, like those on YouTube and Amazon, shape our online experiences, particularly in book recommendations. It addresses the challenge of the 'cold start problem...
详细信息
Amidst the continuous stream of diverse data on the Bloomberg terminal, distinguishing editorial news articles from regular articles is critical to aid its users in tailoring their news experience and further analyzin...
详细信息
ISBN:
(纸本)9798400709227
Amidst the continuous stream of diverse data on the Bloomberg terminal, distinguishing editorial news articles from regular articles is critical to aid its users in tailoring their news experience and further analyzing the impact of news on global financial markets. In this paper, we propose various Artificial Intelligence and Neural Networks models regarding developing an editorial classifier that generalizes well across various news sources. The training set comprises articles published by news sources from the US. We compare the performance of these models using the Aggregate F1-measure and Binary classification Performance Metric as evaluation metrics to account for the presence of class imbalance in our data. Further, we gauged our models by comparing their performance on a Zero-Shot dataset which comprised 1805 news articles published by Metro Winnipeg, a Canadian news source.
textclassification could be defined as the way of allocating text into predefined groups according to its contents. Over the past few years, an increase emerged in the volume of information in the varied fields on th...
详细信息
textclassification could be defined as the way of allocating text into predefined groups according to its contents. Over the past few years, an increase emerged in the volume of information in the varied fields on the Internet, thus making the classification of texts one of the most important, yet challenging. textclassification is commonly employed in numerous applications and for different objectives. The extensive and broad use of the Internet, particularly in the Arab world, as well as the massive number of the documents and pages which are provided in the Arabic language, raised the need for having suitable tools for classification of these pages and documents by their main categories. The aim of this paper to study the effect of the improved CHI (ImpCHI) Square on the performance of six well-known classifiers: Random Forest, Decision Tree, Naive Bayes, Naive Bayes Multinomial, Bayes Net, and Artificial Neural Networks. These proposed techniques are quite important for improving classification of Arabic documents and can be regarded as a promising basis for the stage of textclassification because it contributes to the classification of the texts into predefined categories. This combination method takes the advantages of more than one technique, which can produce better results in the final outcomes. The dataset employed in this paper includes 9055 Arabic documents that were collected from various Arabic resources. Based on their content, these documents were divided into twelve categories. Four performance evaluation criteria were used: the F-measure, recall, precision, and Time build model. The experimental results show that the use of ImpCHI square gives better classification results than the normal CHI square method with all studied classifiers, in terms of all used performance criteria.
Amidst the continuous stream of diverse data on the Bloomberg terminal, distinguishing editorial news articles from regular articles is critical to aid its users in tailoring their news experience and further analyzin...
详细信息
ISBN:
(纸本)9798400709227
Amidst the continuous stream of diverse data on the Bloomberg terminal, distinguishing editorial news articles from regular articles is critical to aid its users in tailoring their news experience and further analyzing the impact of news on global financial markets. In this paper, we propose various Artificial Intelligence and Neural Networks models regarding developing an editorial classifier that generalizes well across various news sources. The training set comprises articles published by news sources from the US. We compare the performance of these models using the Aggregate F1-measure and Binary classification Performance Metric as evaluation metrics to account for the presence of class imbalance in our data. Further, we gauged our models by comparing their performance on a Zero-Shot dataset which comprised 1805 news articles published by Metro Winnipeg, a Canadian news source.
In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is comb...
详细信息
ISBN:
(纸本)9783319240336;9783319240329
In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.
暂无评论