Phishing websites are a fooling technique by making victims as if they are accessing legitimate sites. Data mining is a technique for extracting hidden information in order to benefit more from existing data. Data min...
详细信息
ISBN:
(纸本)9781538670835;9781538670828
Phishing websites are a fooling technique by making victims as if they are accessing legitimate sites. Data mining is a technique for extracting hidden information in order to benefit more from existing data. Data mining is the process of discovering regularity, patterns, and relationships in large datasets. In this study, data mining will be used to determine the effect of feature selection on algorithm C4.5 and CART on phishing website dataset. From the tests that have been done the effect of feature selection on the phishing website, dataset proved to overcome the longer computational time. From the performance measurement of both algorithms that have been done, CART algorithm has a higher accuracy value than the algorithm C4.5 with an accuracy of 94.4%, while the algorithm C4.5 has an accuracy of 94.3%, so it can be concluded that CART algorithm has better performance value compared with the C4.5 algorithm.
The rapid development of the digital era is now a major challenge for the education system in Indonesia, especially for high school students. Today's teenagers are in a digital age where the development and enhanc...
详细信息
ISBN:
(纸本)9781538669228
The rapid development of the digital era is now a major challenge for the education system in Indonesia, especially for high school students. Today's teenagers are in a digital age where the development and enhancement of knowledge in the field of digital technology has had a dramatic impact. Common problems faced by senior high school students include their social relationship, difficulty concentrating in learning, lack of discipline, disobedience to orderliness and disrespect toward teacher and friends. Of these problems have an impact that the student's value is less than the average so that students become class. For that reason, we need a solution to reduce the number of students who failing a grade. Data mining concerns the education world to improve the quality of education by minimizing the number of failing grade students. classification techniques on data mining used to process data that is Decision Tree and Naïve Bayes. Many data can be managed in schools using the method of data mining, therefore it is necessary to limit the problem so that the assessment can be focused and not floating. Limitations in the problem include Cognitive values, Psychomotor values, Affective values, Number of attendance Number of remedial exams. The results of the analysis will be used to predict the class increase that will affect the quality improvement of students in the future. Measurement results using confusion matrix and ROC curve obtained decision tree accuracy values of 97.5% and naïve bayes of 97.5%.
The industry that are providing healthcare is producing a large amount of data. We know that Machine Learning algorithms can also be used to find hidden information for diagnosis and effective decision making. In rece...
详细信息
ISBN:
(纸本)9781538669488
The industry that are providing healthcare is producing a large amount of data. We know that Machine Learning algorithms can also be used to find hidden information for diagnosis and effective decision making. In recent years, Liver disorders have increased rapidly and it is considered to be a very fatal disease in many countries like - Egypt, Moldava etc. For this research paper, the main aim is to predict liver disease using different classification algorithms. The algorithms used for this purpose of work is Logistic Regression, K-Nearest Neighbour and Support Vector Machines. Accuracy score and confusion matrix is used to compare this classification algorithm.
Multi-label classification has generated enthusiasm in many fields over the last few years. It allows the classifications of dataset where each instance can be associated with one or more label. It has successfully en...
详细信息
ISBN:
(纸本)9781538663745;9781538663738
Multi-label classification has generated enthusiasm in many fields over the last few years. It allows the classifications of dataset where each instance can be associated with one or more label. It has successfully ended up being superiorstrategy as compared to Single labelclassification. In this paper, we provide an overview of multi-label classification approaches. We also discussed the various tools thatutilizes MLC approaches. Lastly, we have presented an experimental study to compare different algorithms of multi-label classification. After applying and studying the accuracies of various multilabel classification techniques, we have found that performance of Random Forest is better than the rest of the other compared multilabelclassification algorithms with 96% accuracy.
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to mode...
详细信息
ISBN:
(纸本)9781510608191;9781510608207
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, classification via regression, Naive Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Diabetes Mellitus is fast becoming an endemic in the world, especially in developing countries. An efficient prediction methodology is needed to diagnose the diabetes disease, which can be helpful for health care prof...
详细信息
ISBN:
(纸本)9781450347747
Diabetes Mellitus is fast becoming an endemic in the world, especially in developing countries. An efficient prediction methodology is needed to diagnose the diabetes disease, which can be helpful for health care professionals. Data mining techniques have been widely used in healthcare to mine knowledgeable information from medical data. Data mining is the process of analyzing data based on different perspectives and summarizing it into useful information. Data mining techniques are proven forearly prediction of several diseases with higher accuracy and lower error rate and cost. classification is one of the generally used techniques in medical data mining. In this paper, we intend to explore various data mining techniques to show the comparison of different classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) and analyze the results in order to find the best suitable classification algorithm for prediction of diabetes diseases. Various performance measures metrics such as sensitivity, specificity, accuracy and error rate are used for finding the accuracy of the classifier.
Through analyzing the limitations of modeling and evaluating the cost-sensitive multiclass classification algorithms, a series of models based on three classification algorithms are presented. On this basis, expected ...
详细信息
ISBN:
(纸本)9783319495682;9783319495675
Through analyzing the limitations of modeling and evaluating the cost-sensitive multiclass classification algorithms, a series of models based on three classification algorithms are presented. On this basis, expected cost of misclassification as a cost-sensitive metric, which is introduced for evaluating the more cost details of models.
Automated screening of diabetic retinopathy plays an important role in diagnosis of the disease in early stages and preventing blindness in patients with diabetes. Various machine learning approaches have been studied...
详细信息
ISBN:
(纸本)9781538608043
Automated screening of diabetic retinopathy plays an important role in diagnosis of the disease in early stages and preventing blindness in patients with diabetes. Various machine learning approaches have been studied in literature with the purpose of improving the accuracy of the screening methods. Although the performance of the machine learning algorithm depends on the application and the type of data, yet there is no comprehensive analysis of different approaches in the diabetic retinopathy screening to choose the best approach. To this end, in this study a comparative analysis of nine common classification algorithms is performed to select the most applicable approach for the specific problem of screening diabetic retinopathy patients. Individual algorithms are optimized with respect to their tunable parameters, and are compared together in terms of their accuracy, precision, recall, and F1-score. Simulation results demonstrate the difference between the performances of individual classification algorithms and can be used as a deciding factor in method selection for further research.
High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combin...
详细信息
ISBN:
(数字)9781728146447
ISBN:
(纸本)9781728146454
High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.
Student's Single Tuition Fee or Uang Kuliah Tunggal (UKT) is a subsidy policy in higher education by the Indonesian government. This policy regulates the tuition fees incurred by each student at each semester in e...
详细信息
ISBN:
(纸本)9781509062850
Student's Single Tuition Fee or Uang Kuliah Tunggal (UKT) is a subsidy policy in higher education by the Indonesian government. This policy regulates the tuition fees incurred by each student at each semester in every higher education institutions. Since the cost of UKT expenses is influenced by the financial ability of each student, therefore the cost of education among students must be grouped into several classes. Until recently, there has been no standard to make such classification whereas such determination is an important task to solve by every higher institution in Indonesia. This study aims to compare five data mining classification algorithms (Gaussian Naïve Bayes, Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Decision Tree and SVM) to find the best algorithm for the case of determining the UKT classes. The experiment is conducted using 230 training data and 10-fold cross-validation evaluation. Based on the result, Decision Tree managed to obtain average accuracy value of 0.814 or 81.4%. Finally, Decision Tree is used to classify the UKT classes of3258 data of students.
暂无评论