Cross-project defect prediction (CPDP) utilizes the existing labeled data in the source project to assist with the prediction of unlabeled projects in the target dataset, which effectively improves the prediction perf...
详细信息
Cross-project defect prediction (CPDP) utilizes the existing labeled data in the source project to assist with the prediction of unlabeled projects in the target dataset, which effectively improves the prediction performance and has become a research hotspot in software engineering. At present, CPDP can be categorized into homogeneous cross-project defect prediction and heterogeneous cross-project defect prediction (HDP), in which HDP doesn’t require that the source project and the target project have the same feature space, thus, it is more widely used in the actual CPDP. Most of current HDP methods map the original features to the latent feature space and reduce the inter-project variation by transferring domain-independent features, but the transferring process ignores the use of domain-related features, which affects the prediction performance of the model. Moreover, the mapped latent features are not conducive to the model’s interpretability. Based on these, this paper proposes a heterogeneous defect prediction method based on feature disentanglement (FD-HDP). We disentangle the features using domain-related and domain-independent feature extractors, respectively, to improve the interpretability of the model by maximizing the domain adversarial loss during training and guiding the feature extractors to produce accurate domain-related and domain-independent features. The weighted sum of the prediction results from domain-related and domain-independent predictors is used as the final prediction result of the project during the prediction process, which realizes the combination of domain-independent and domain-related features and effectively improves the prediction performance. In this paper, we conducted experiments using four publicly available defect datasets to construct heterogeneous scenarios. The results demonstrate that the FD-HDP model shows significant advantages over state-of-the-art methods in six metrics.
暂无评论