A novel algorithm named NB which is an extended version of the traditional naive bayesian algorithm has been presented in this paper. An exception occurs when there is an equal probability for the class label value in...
详细信息
A novel algorithm named NB which is an extended version of the traditional naive bayesian algorithm has been presented in this paper. An exception occurs when there is an equal probability for the class label value in the naive bayesian algorithm. The approach aims to suggest a solution with the help of a partial matching method. Consequently, the classification accuracy has drastically improved. Experimental evaluation has been done on various databases to show that NB+ algorithm outperforms the traditional naive bayesian algorithm. (C) 2010 Elsevier B.V. All rights reserved.
Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency...
详细信息
ISBN:
(纸本)9783319281216;9783319281209
Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency of text classification. This paper proposes a method of text classification, which combines the naive Bayes and the similarity computing algorithm. Firstly, the text information is cut into several word segmentation vectors by the Paoding Analyzer;then the bayesianalgorithm is employed to conduct the first-level directory classification to the text information;after that, the improved similarity computing algorithm is adopted to carry out the second-level directory classification. Finally, the algorithm model is tested with actual data, and the results are compared with those of bayesianalgorithm and similarity computing algorithm respectively. The results show that the proposed method achieves a higher precision rate.
Automated information retrieval is critical for enterprise information systems to acquire knowledge from the vast amount of data sets. One challenge in information retrieval is text classification. Current practices r...
详细信息
Automated information retrieval is critical for enterprise information systems to acquire knowledge from the vast amount of data sets. One challenge in information retrieval is text classification. Current practices rely heavily on the classical naive Bayes algorithm due to its simplicity and robustness. However, results from this algorithm are not always satisfactory. In this article, the limitations of the naive Bayes algorithm are discussed, and it is found that the assumption on the independence of terms is the main reason for an unsatisfactory classification in many real-world applications. To overcome the limitations, the dependent factors are considered by integrating a term frequency-inverse document frequency (TF-IDF) weighting algorithm in the naive Bayes classification. Moreover, the TF-IDF algorithm itself is improved so that both frequencies and distribution information are taken into consideration. To illustrate the effectiveness of the proposed method, two simulation experiments were conducted, and the comparisons with other classification methods have shown that the proposed method has outperformed other existing algorithms in terms of precision and index recall rate.
It is a very important task that how to classify Web pages automatically and effectively in accordance with the given model for machine learning. The traditional operation modes, including artificial way and semiautom...
详细信息
ISBN:
(纸本)9781424452729
It is a very important task that how to classify Web pages automatically and effectively in accordance with the given model for machine learning. The traditional operation modes, including artificial way and semiautomatic way, form category abstracts after domain experts' personnel inspection and then put the results into a particular class library according to the scheduled requirements. An improved naivebayesian WEB text classification algorithm is proposed in this paper. The common bayesian classifier assumes that all the items are equally important while in this paper the terms in each title are considered to be more important than others. Experiments showed that, the improved naive bayesian algorithm is more precise in the text classification.
暂无评论