Synthetic Open-source Agile Software Estimation Performance

Nevena Ranković, Dragica Rankovic, Mirjana Ivanovic

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

1 Downloads (Pure)

Abstract

In this paper, we investigate whether Software Development Effort Estimations (SDEEs) predictions can be improved using commonly used machine learning algorithms such as Linear Regression, Decision Tree Regression, Random Forest Regression, XGBoost Regression, CatBoost Regression, and LightGBM Regression.
To prevent the data leakage and enhance the TAWOS agile open-source software project dataset using Tabular Variational Autoencoder (TVAE) and Truncation Normal Data distribution we also apply additional scaling.
Hyperparameter optimization with Optuna was conducted on 21 model-data combinations based on 5-fold crossvalidated adjusted R², mean squared prediction error (MSPE), and Pearson’s correlation coefficient.
The Random Forest Regressor trained on TVAE-augmented data achieved the best results, with an adjusted R² of 0.59, a Pearson’s correlation of 0.81, and an MSPE of 140011, indicating strong predictive accuracy. The CatBoost
Regressor on regular data ranked second, with an adjusted R² of 0.39, a Pearson’s correlation of 0.74, and an MSPE of 200011. The Decision Tree Regressor, despite a high training correlation, performed the worst, with an
adjusted R² of 0.35, a Pearson’s correlation of 0.76, and an MSPE of 234500, indicating weaker performance. Ultimately, we aimed to reduce the gap between expected and actual software development efforts, thereby minimizing associated risks. The results of this study can significantly enhance software development project planning and management.
Original languageEnglish
Title of host publicationSQAMIA2024: 11th Workshop on Software Quality Analysis, Monitoring, Improvement, and Applications
Publisherceur-ws.org
Number of pages12
Publication statusAccepted/In press - 2024

Keywords

  • software estimation
  • regression models
  • synthetic data generation
  • hyperparameter optimization

Fingerprint

Dive into the research topics of 'Synthetic Open-source Agile Software Estimation Performance'. Together they form a unique fingerprint.

Cite this