Predicting Software Defects at Package Level in Java Project Using Stacking of Ensemble Learning Approach
DOI:
https://doi.org/10.59395/ijadis.v6i1.1368Keywords:
Software Defects, Stacking Ensemble, Java, Java Project , Package Level , Predicting, software engineeringAbstract
Compared to manual and automated testing, AI-driven testing provides a more intelligent approach by enabling earlier prediction of software defects and improving testing efficiency. This research focuses on predicting software defects by analyzing CK software metrics using classification algorithms. A total of 8924 data points were collected from five open-source Java projects on GitHub. Due to class imbalance, undersampling was applied during preprocessing along with data cleaning and normalization. The final dataset consists of 1314 instances (746 clean and 568 buggy). The predictive model is developed in two stages: base learner (level-0) using AdaBoost, Random Forest (RF), Extra Trees (ET), Gradient Boosting (GB), Histogram-based Gradient Boosting (HGB), XGBoost (XGB), and CatBoost (CAT) algorithms, and meta-learner (level-1) that optimizes the results using ensemble stacking techniques. The stacking model achieved an ROC-AUC score of 0.8575, outperforming all individual classifiers and effectively distinguishing defective from non-defective software components. The comparison of performance improvements between the base model (tree-based ensemble) and stacking was statistically validated using paired t-tests. All p-values were below 0.05, confirming the significance of Stacking’s superior performance, with the largest gain observed against Gradient Boosting (+0.0411, p = 0.0030). The confusion matrix of stacking model is the most optimal model because it has high of True Positive and True Negative, while False Positive and False Negative values are relatively low. These findings affirm that ensemble stacking yields a more robust and balanced classification system, enhancing defect prediction accuracy and enabling earlier issue detection in the Software Development Life Cycle (SDLC).
Downloads
References
[1] C. Deming, M. A. Khair, S. R. Mallipeddi, and A. Varghese, “Software Testing in the Era of AI: Leveraging Machine Learning and Automation for Efficient Quality Assurance,” Asian J. Appl. Sci. Eng., vol. 10, no. 1, pp. 66–76, 2021, doi: 10.18034/ajase.v10i1.88. DOI: https://doi.org/10.18034/ajase.v10i1.88
[2] G. Singh, “A Study on Software Testing Life Cycle in Software Engineering,” Int. J. Soft Comput. Eng., vol. 9, no. 2, pp. 1–5, 2018.
[3] O. J. Amman Paul, Introduction to Software Testing. Cambridge Press, 2017. [Online]. Available: https://books.google.co.id/books?hl=id&lr=&id=bQtQDQAAQBAJ&oi=fnd&pg=PR9&dq=Introduction+to+Software+Testing&ots=fA6P213_pQ&sig=vTKZfwwNJMsYUzib1KSgQ-TjRDI&redir_esc=y#v=onepage&q&f=false
[4] V. H. S. Durelli et al., “Machine learning applied to software testing: A systematic mapping study,” IEEE Trans. Reliab., vol. 68, no. 3, pp. 1189–1212, 2019, doi: 10.1109/TR.2019.2892517. DOI: https://doi.org/10.1109/TR.2019.2892517
[5] U. Subbiah, M. Ramachandran, and Z. Mahmood, “Software engineering approach to bug prediction models using machine learning as a service (MLaaS),” ICSOFT 2018 - Proc. 13th Int. Conf. Softw. Technol., no. Icsoft, pp. 879–887, 2019, doi: 10.5220/0006926308790887. DOI: https://doi.org/10.5220/0006926308790887
[6] R. R. Saputra, E. Setiawan, and A. Ambarwati, “Manajemen Risiko Teknologi Informasi Menggunakan Metode OCTAVE Allegro pada PT . Hakiki Donarta Surabaya,” vol. 17, no. 1, pp. 1–10, 2019. DOI: https://doi.org/10.24014/sitekin.v16i2.7457
[7] A. Pandey, S. Maddula, G. P. Kumar, S. K. Shailendra, and K. Mudaliar, “A Comprehensive Analysis of Ensemble-based Fault Prediction Models Using Product , Process , and Object-Oriented Metrics in Software Engineering,” no. December 2023, 2024, doi: 10.5281/zenodo.10464708.
[8] A. O. Balogun, A. O. Bajeh, V. A. Orie, and A. W. Yusuf-Asaju, “Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method,” FUOYE J. Eng. Technol., vol. 3, no. 2, 2018, doi: 10.46792/fuoyejet.v3i2.200. DOI: https://doi.org/10.46792/fuoyejet.v3i2.200
[9] R. Malhotra, S. Chawla, and A. Sharma, Software defect prediction using hybrid techniques: a systematic literature review, vol. 27, no. 12. Springer Berlin Heidelberg, 2023. doi: 10.1007/s00500-022-07738-w. DOI: https://doi.org/10.1007/s00500-022-07738-w
[10] D. Gray, “Why Does Java Remain So Popular?,” blogs.oracle.com, 2019. https://blogs.oracle.com/oracleuniversity/post/why-does-java-remain-so-popular
[11] M. Crouse, “TIOBE Index for November 2024: Top 10 Most Popular Programming Languages,” 2024. https://www.techrepublic.com/article/tiobe-index-language-rankings/ (accessed Nov. 25, 2024).
[12] A. Alazba and H. Aljamaan, “Software Defect Prediction Using Stacking Generalization of Optimized Tree-Based Ensembles,” Appl. Sci., vol. 12, no. 9, 2022, doi: 10.3390/app12094577. DOI: https://doi.org/10.3390/app12094577
[13] Y. Peng, G. Kou, G. Wang, W. Wu, and Y. Shi, “Ensemble of software defect predictors: An AHP-based evaluation method,” Int. J. Inf. Technol. Decis. Mak., vol. 10, no. 1, pp. 187–206, 2011, doi: 10.1142/S0219622011004282. DOI: https://doi.org/10.1142/S0219622011004282
[14] M. A. Ihsan Aquil, “Predicting Software Defects using Machine Learning Techniques,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 4, pp. 6609–6616, 2020, doi: 10.30534/ijatcse/2020/352942020. DOI: https://doi.org/10.30534/ijatcse/2020/352942020
[15] M. Ali et al., “Software Defect Prediction Using an Intelligent Ensemble-Based Model,” IEEE Access, vol. 12, pp. 20376–20395, 2024, doi: 10.1109/ACCESS.2024.3358201. DOI: https://doi.org/10.1109/ACCESS.2024.3358201
[16] and B. H. Wang, Wenfeng, Jingjing Zhang, “Meta-learning with Logistic Regression for Multi-classification,” in New Approaches for Multidimensional Signal Processing: Proceedings of International Workshop, Singapore: Springer Singapore, 2022. doi: https://doi.org/10.1007/978-981-16-8558-.
[17] J. Xu, F. Wang, and J. Ai, “Defect Prediction with Semantics and Context Features of Codes Based on Graph Representation Learning,” IEEE Trans. Reliab., vol. 70, no. 2, pp. 613–625, 2021, doi: 10.1109/TR.2020.3040191. DOI: https://doi.org/10.1109/TR.2020.3040191
[18] Maurício Aniche, “Java Code Metrics Calculator (CK Metrics),” 2015.
[19] G. Esteves, E. Figueiredo, A. Veloso, M. Viggiato, and N. Ziviani, “Understanding machine learning software defect predictions,” Autom. Softw. Eng., vol. 27, no. 3–4, pp. 369–392, 2020, doi: 10.1007/s10515-020-00277-4. DOI: https://doi.org/10.1007/s10515-020-00277-4
[20] G. Gay and R. Just, “Defects4J as a Challenge Case for the Search-Based Software Engineering Community,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12420 LNCS, no. Section 2, pp. 255–261, 2020, doi: 10.1007/978-3-030-59762-7_19. DOI: https://doi.org/10.1007/978-3-030-59762-7_19
[21] D. Rajapaksha, C. Tantithamthavorn, J. Jiarpakdee, C. Bergmeir, J. Grundy, and W. Buntine, “SQAPlanner: Generating Data-Informed Software Quality Improvement Plans,” IEEE Trans. Softw. Eng., vol. 48, no. 8, pp. 2814–2835, 2022, doi: 10.1109/TSE.2021.3070559. DOI: https://doi.org/10.1109/TSE.2021.3070559
[22] Y. Yao, Z. Xiao, B. Wang, B. Viswanath, H. Zheng, and B. Y. Zhao, “Complexity vs. performance,” no. 119, pp. 384–397, 2017, doi: 10.1145/3131365.3131372. DOI: https://doi.org/10.1145/3131365.3131372
[23] I. Q. U. Fatwa Ramdani, Pengantar Data Science. Jakarta: Bumi Aksara, 2022.
[24] E. Ronchieri, M. Canaparo, and M. Belgiovine, Software Defect Prediction on Unlabelled Datasets: A Comparative Study, vol. 12250 LNCS. Springer International Publishing, 2020. doi: 10.1007/978-3-030-58802-1_25. DOI: https://doi.org/10.1007/978-3-030-58802-1_25
[25] A. A. Ibrahim, R. L. Ridwan, M. M. Muhammed, R. O. Abdulaziz, and G. A. Saheed, “Comparison of the CatBoost Classifier with other Machine Learning Methods,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 11, pp. 738–748, 2020, doi: 10.14569/IJACSA.2020.0111190. DOI: https://doi.org/10.14569/IJACSA.2020.0111190
[26] Guryanov, “Histogram-Based Algorithm for Building Gradient Boosting Ensembles of Piecewise Linear Decision Trees,” Anal. Images, Soc. Networks Texts, vol. 11832, pp. 39–50, 2019, doi: https://doi.org/10.1007/978-3-030-37334-4_4. DOI: https://doi.org/10.1007/978-3-030-37334-4_4
[27] G. Santos, E. Figueiredo, A. Veloso, M. Viggiato, and N. Ziviani, “Predicting Software Defects with Explainable Machine Learning,” ACM Int. Conf. Proceeding Ser., 2020, doi: 10.1145/3439961.3439979. DOI: https://doi.org/10.1145/3439961.3439979
[28] Z. Faska, L. Khrissi, K. Haddouch, and N. El Akkad, “A robust and consistent stack generalized ensemble-learning framework for image segmentation,” J. Eng. Appl. Sci., vol. 70, no. 1, pp. 1–20, 2023, doi: 10.1186/s44147-023-00226-4. DOI: https://doi.org/10.1186/s44147-023-00226-4
[29] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785. DOI: https://doi.org/10.1145/2939672.2939785
[30] S. K. Palaniswamy and R. Venkatesan, “Hyperparameters tuning of ensemble model for software effort estimation,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 6, pp. 6579–6589, 2021, doi: 10.1007/s12652-020-02277-4. DOI: https://doi.org/10.1007/s12652-020-02277-4
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Nabila Athifah Zahra, Amalia Anjani Arifiyanti, Dhian Satria Yudha Kartika

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.