Fault prediction is the process of using data analysis and machine learning models to anticipate potential defects or faults in the software system. Using only the base machine learning models for software fault predi...
详细信息
Fault prediction is the process of using data analysis and machine learning models to anticipate potential defects or faults in the software system. Using only the base machine learning models for software fault prediction leads to limited performance, difficulty in handling non-linear relationships and imbalanced data, inadequate feature representation, and limited complexity handling. Hence, in order to overcome these challenges, this paper proposes a new technique for the selection of classifiers that forms a heterogeneous ensemble. The main goal is to remove or trim out the classifiers that show poor performance compared to the other base classifiers, which can result into a more effective ensemble and can produce better results. The algorithm proposed in this paper finds a set of classifiers that can perform better than using all the classifiers. The challenge that was faced was how to identify the poor-performing classifiers. This challenge is dealt with by performing an experiment using different threshold values to choose the trimmed set of classifiers. For evaluation of the proposed model, 8 different benchmark software fault datasets were used, which are taken from PROMISE and the Apache repository, and AUC is used as the performance measure. The results obtained after the experimental analysis demonstrate the effectiveness of our algorithm compared to the traditional approaches, which used all the base classifiers. There is a significant increase in the AUC values for 6 datasets out of 8, while using the average of probabilities and majority voting, it was seen that there is improvement in 7 out of 8 datasets used. The best-performing dataset by using the average of probabilities is ARC, where the AUC values increase from 0.6505 to 0.694, and while using majority voting, the best-performing dataset is XALAN, where the AUC values increase from 0.5455 to 0.679. From this, it can be seen that the proposed ensemble approach achieved higher AUC values for the
Data collection and analysis are critical aspects of various business processes. However, these tasks can be time-consuming, prone to errors, and delays employee productivity when done manually, especially when we hav...
详细信息
In the age of technology, every day, we ingest a variety of news data, either willingly or accidentally. People will always use various social media apps and keep looking for information. The information passed throug...
详细信息
Important applications such as fraud or spam detection or churn prediction involve binary classification problems where the datasets are imbalanced and the cost of false positives greatly differs from the cost of fals...
Important applications such as fraud or spam detection or churn prediction involve binary classification problems where the datasets are imbalanced and the cost of false positives greatly differs from the cost of false negatives. We focus on classification trees, in particular oblique trees, which subsume both the traditional axis-aligned trees and logistic regression, but are more accurate than both while providing interpretable models. Rather than using ROC curves, we advocate a loss based on minimizing the false negatives subject to a maximum false positive rate, which we prove to be equivalent to minimizing a weighted 0/1 loss. This yields a curve of classifiers that provably dominates the ROC curve, but is hard to optimize due to the 0/1 loss. We give the first algorithm that can iteratively update the tree parameters globally so that the weighted 0/1 loss decreases monotonically. Experiments on various datasets with class imbalance or class costs show this indeed dominates ROC-based classifiers and significantly improves over previous approaches to learn trees based on weighted purity criteria or over- or undersampling. Copyright 2024 by the author(s)
Chronic diseases, characterized by long-term effects and persistent symptoms, pose significant challenges to human health and well-being. The hepatitis C virus (HCV) is the cause of chronic hepatitis C, which affects ...
详细信息
The CRISPR-Cas9 genome editing system is a new technology that allows the modification of secret vital material using disruptions insertions or amendments of DNA sequence. It is an easy, widely accepted, and correct r...
详细信息
In this paper, we propose a time-dependent multi-objective trip planning using ant colony optimization. Especially, the proposed method deals with time-dependent POI factors by utilizing past-trip records with time st...
详细信息
The attention mechanism is a core module in today's well-established models like Transformers and Graph Attention Networks (GAT). The attention coefficients needed typically requires heavy computational resources ...
详细信息
Single neuron modulates the external stimuli, and neural population coordinates to encode information. An alternate method for examining the coordinated populational activity in neural encoding is conditional neural c...
详细信息
This project depicts how Artificial intelligence is used in developing a "User-Engaging Event Management System". The developed system will spontaneously record all the events and online contests registered ...
详细信息
暂无评论