Software Fault Prediction (SFP) research has made enormous endeavor to accurately predict fault proneness of software modules, thus maximize precious software test resources, reduce maintenance cost and contributes to...
详细信息
Software Fault Prediction (SFP) research has made enormous endeavor to accurately predict fault proneness of software modules, thus maximize precious software test resources, reduce maintenance cost and contributes to produce quality software products. In this regard, Machine learning (ML) has been successfully applied to solve classification problems for SFP. However, SFP has many challenges that are created due to redundant and irrelevant features, class imbalance problem and the presence of noise in software defect datasets. Yet, neither of ML techniques alone handles those challenges and those may deteriorate the performance depending on the predictor's sensitiveness to data corruptions. In the literature, it is widely claimed that building ensemble classifiers from preprocessed datasets and combining their predictions is an interesting method of overcoming the individual problems produced by each classifier. This statement is usually not supported by thorough empirical studies considering problems in combined implementation with resolving different types of challenges in defect datasets and, therefore, it must be carefully studied. Thus, the objective of this paper is to conduct large scale comprehensive experiments to study the effect of resolving those challenges in SFP in three stages in order to improve the practice and performance of SFP. In addition to that, the paper presents a thorough and statistically sound comparison of these techniques in each stage. Accordingly, a new three-stage based ensemblelearning framework that efficiently handles those challenges in a combined form is proposed. The experimental results confirm that the proposed framework has exhibited the robustness of combined techniques in each stage. Particularly high performance results have achieved using combined ELA on selected features of balanced data after removing noise instances. Therefore, as shown in this study, ensemble techniques used for SFP must be carefully examined and c
As the use of Wi-Fi networks grows, so does the increase in security threats. Attackers continue to improve their attack methods, which create the need for developing effective mechanisms to detect the sophisticated a...
详细信息
ISBN:
(纸本)9781538676592
As the use of Wi-Fi networks grows, so does the increase in security threats. Attackers continue to improve their attack methods, which create the need for developing effective mechanisms to detect the sophisticated attacks. In this work, we propose an implementation of intrusion detection system for Wi-Fi networks using an ensemblelearning method. The AWID Wi-Fi intrusion dataset is used to discover the necessary features needed for the efficient IDS implementation. We apply several ensemblelearning methods on this dataset and finalize the best one for the proposed IDS implementation. The performance of IDS is reported using well-known metrics including accuracy, precision, recall, and f-measure.
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identific...
详细信息
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemblelearning shows slightly higher prediction accuracy with as few as four features.
Software Fault Prediction (SFP) research has made enormous endeavor to accurately predict fault proneness of software modules to maximize precious software test resources, reduce maintenance cost, help to deliver soft...
详细信息
ISBN:
(纸本)9781538618295
Software Fault Prediction (SFP) research has made enormous endeavor to accurately predict fault proneness of software modules to maximize precious software test resources, reduce maintenance cost, help to deliver software products on time and satisfy customer, which ultimately contribute to produce quality software products. In this regard, Machine learning (ML) has been successfully applied to solve classification problems for SFP. Moreover, from ML, it has been observed that ensemble learning algorithms (ELA) are known to improve the performance of single learningalgorithms. However, neither of ELA alone handles the challenges created by redundant and irrelevant features and class imbalance problem in software defect datasets. Therefore, the objective of this paper is to independently examine and compare prominent ELA and improves their performance combined with Feature Selection (FS) and Data Balancing (DB) techniques to identify more efficient ELA that better predict the fault proneness of software modules. Accordingly, a new framework that efficiently handles those challenges in a combined form is proposed. The experimental results confirm that the proposed framework has exhibited the robustness of combined techniques. Particularly the framework has high performance when using combined bagging ELA with DB on selected features. Therefore, as shown in this study, ensemble techniques used for SFP must be carefully examined and combined with both FS and DB in order to obtain robust performance.
Multi-label classification exhibits several challenges not present in the binary case. The labels may be interdependent, so that the presence of a certain label affects the probability of other labels' presence. T...
详细信息
Multi-label classification exhibits several challenges not present in the binary case. The labels may be interdependent, so that the presence of a certain label affects the probability of other labels' presence. Thus, exploiting dependencies among the labels could be beneficial for the classifier's predictive performance. Surprisingly, only a few of the existing algorithms address this issue directly by identifying dependent labels explicitly from the dataset. In this paper we propose new approaches for identifying and modeling existing dependencies between labels. One principal contribution of this work is a theoretical confirmation of the reduction in sample complexity that is gained from unconditional dependence. Additionally, we develop methods for identifying conditionally and unconditionally dependent label pairs;clustering them into several mutually exclusive subsets;and finally, performing multi-label classification incorporating the discovered dependencies. We compare these two notions of label dependence (conditional and unconditional) and evaluate their performance on various benchmark and artificial datasets. We also compare and analyze labels identified as dependent by each of the methods. Moreover, we define an ensemble framework for the new methods and compare it to existing ensemble methods. An empirical comparison of the new approaches to existing base-line and state-of-the-art methods on 12 various benchmark datasets demonstrates that in many cases the proposed single-classifier and ensemble methods outperform many multi-label classification algorithms. Perhaps surprisingly, we discover that the weaker notion of unconditional dependence plays the decisive role.
暂无评论