Agile has been used in software development for over 20 years and is the preferred development method for more than 85% of software companies. However, cost estimation in agile development remains a significant challe...
详细信息
Agile has been used in software development for over 20 years and is the preferred development method for more than 85% of software companies. However, cost estimation in agile development remains a significant challenge. This is reflected in the fact that the accuracy of estimation still needs improvement, and most cost estimation techniques still rely on the team's experience and knowledge. While machine learning algorithms have performed better in this area, the lack of sufficient agile cost data hinders large-scale training and in-depth research. To address this issue, this study selected five data generation techniques-Variational Autoencoder (VAE), Wasserstein Generative Adversarial Network (WGAN), Synthetic Minority Over-sampling Technique for Nominal and Continuous Features (SMOTE-NC), data augmentation for tabular data (augmentation), and tabulardata Diffusion Probabilistic Models (TabDDPM)-based on the characteristics of agile cost data. Using cost data from 75 agile projects, these techniques were employed to generate three sets of data with sizes of 200, 500, and 1000. A performance evaluation model was created based on consistency, authenticity, diversity, and effectiveness to verify the performance of these generated data. The experimental results show that WGAN consistently scored 16 out of 20 points across all three data sets, excelling in data consistency and authenticity. SMOTE-NC and augmentation Were followed. SMOTE-NC scored 15 out of 20 points for all data sizes and performed best in terms of effectiveness, with an MMRE of 88.16% and a PRED (0.2) of 84.5%. augmentation performed the best when generating 1000 data points. These findings highlight the potential of data generation technologies, particularly WGAN, in enhancing agile cost estimation and providing guidance on selecting the appropriate amount of data. This lays a foundation for further development of machine learning algorithms in this field and offers valuable insights for other res
暂无评论