NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi

Emre Can Yılmaz; Recai Oktaş

doi:10.34248/bsengineering.1792907

EN TR

The Impact of Resampling Techniques on the Performance of Software Defect Prediction Using the NASA Metrics Data Program Dataset

Öz

The NASA Metrics Data Program (MDP) is a widely used repository containing software metrics and defect data from various NASA projects. To narrow the study’s scope, the JM1 subset was selected due to its large size and class imbalance. The effects of resampling techniques on the performance of machine learning models were examined using the JM1 dataset. We evaluate several oversampling, undersampling, and hybrid resampling methods-including SMOTE, RUS, ROSE, ADASYN, Tomek Links, ENN, Near Miss, and Borderline-SMOTE-in conjunction with classifiers such as Naive Bayes (NB), Support Vector Machines (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF). Our results indicate that models trained without any resampling often perform poorly in identifying faulty modules (the minority class), despite achieving deceptively high values in overall accuracy metrics. Conversely, hybrid methods such as SMOTE+ENN and oversampling techniques like ROSE significantly improve performance-particularly when used with Random Forest and Naive Bayes classifiers-based on metrics sensitive to class imbalance such as AUC and F1-score. The best performance is observed when combining Random Forest with SMOTE+ENN, achieving 0.9350 accuracy, an AUC of 0.9837, and F1-scores of 0.9126 and 0.9483 for non-defective and defective modules, respectively. Consequently, for imbalanced software defect datasets, the selection of appropriate resampling methods and the use of metrics truly reflecting real-world performance are critically important.

Anahtar Kelimeler

NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi

Öz

NASA Metrics Data Program (MDP), NASA tarafından yürütülen ve çeşitli yazılım projelerinden elde edilen metrikleri ve hata bilgilerini içeren, araştırmalarda yaygın olarak kullanılan bir veri deposudur. Çok sayıda alt kümeye sahip NASA MDP verilerinde, çalışmanın kapsamını sınırlamak için uygun bir alt veri kümesinin seçilmesi uygun olacaktır. Bu amaçla, büyük ve dengesiz veriler içermesi nedeniyle JM1 alt kümesi tercih edilmiştir. JM1 üzerinde yeniden örnekleme tekniklerinin makine öğrenmesi modellerinin başarımına etkileri incelenmiştir. Bu kapsamda SMOTE, RUS, ROSE, ADASYN, Tomek Links, ENN, Near Miss ve Borderline-SMOTE gibi aşırı örnekleme, eksik örnekleme ve hibrit teknikler; Naive Bayes (NB), Destek Vektör Makineleri (DVM), Lojistik Regresyon (LR), Karar Ağacı (KA) ve Rastgele Orman (RO) sınıflandırıcıları ile birlikte değerlendirilmiştir. Sonuçlar, yeniden örnekleme uygulanmayan modellerin, özellikle azınlık sınıfı olarak tanımlanan hatalı modülleri tanımada düşük başarım sergilediğini; buna karşın genel doğruluk metriklerinde yanıltıcı şekilde yüksek değerler elde edebildiğini göstermektedir. Öte yandan, SMOTE+ENN gibi hibrit ve ROSE gibi aşırı örnekleme yöntemlerinin, özellikle Rastgele Orman ve Naive Bayes sınıflandırıcılarıyla birlikte kullanıldığında, AUC ve F1-ölçütü gibi dengesizliğe duyarlı metriklerde anlamlı iyileşmeler sağladığı gözlemlenmiştir. En iyi sonuç, SMOTE+ENN yöntemiyle birlikte kullanılan Rastgele Orman modeliyle elde edilmiş; 0,9350 doğruluk, 0,9837 AUC ve hatasız/hatalı modüller için sırasıyla 0,9126/0,9483 F1-ölçütü değerlerine ulaşılmıştır. Bu bulgular, yazılım hata tahmininde sınıf dengesizliğiyle mücadelede uygun yeniden örnekleme stratejilerinin seçiminin ve dengesizliğe duyarlı metriklerle değerlendirme yapılmasının önemini ortaya koymaktadır.

Anahtar Kelimeler

Etik Beyan

Bu araştırmada hayvanlar ve insanlar üzerinde herhangi bir çalışma yapılmadığı için etik kurul onayı alınmamıştır.

Kaynakça

Abdelmoumin, G., Rawat, D. B., & Rahman, A. (2023). Studying imbalanced learning for anomaly-based intelligent IDS for mission-critical Internet of Things. Journal of Cybersecurity and Privacy, 3(4), 706–743.
Agrawal, A., Menzies, T., Minku, L. L., Wagner, M., & Yu, Z. (2020). Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers. Empirical Software Engineering, 25(3), 2099–2136.
Alam, T. M., Shaukat, K., Khan, W. A., Hameed, I. A., Almuqren, L. A., Raza, M. A., Aslam, M., & Luo, S. (2022). An efficient deep learning-based skin cancer classifier for an imbalanced dataset. Diagnostics, 12(9), 2115.
Aljawazneh, H., Mora, A. M., García-Sánchez, P., & Castillo-Valdivieso, P. A. (2021). Comparing the performance of deep learning methods to predict companies’ financial failure. IEEE Access, 9, 97010–97038.
Arya, D. M., Nassif, M., & Robillard, M. P. (2021). A data-centric study of software tutorial design. IEEE Software, 39(3), 106–115.
Ayoub, S., Gulzar, Y., Rustamov, J., Jabbari, A., Reegu, F. A., & Turaev, S. (2023). Adversarial approaches to tackle imbalanced data in machine learning. Sustainability, 15(9), 7097.
Badvath, D., Miriyala, A. S., Gunupudi, S. C. K., & Kuricheti, P. V. K. (2022). Prediction of software defects using deep learning with improved cuckoo search algorithm. Concurrency and Computation: Practice and Experience, 34(26).
Balogun, A. O., Basri, S., Abdulkadir, S. J., & Hashim, A. S. (2019). Performance analysis of feature selection methods in software defect prediction: A search method approach. Applied Sciences, 9(13), 2764.

Boschi, R. S., Rodrigues, L. H. A., & Lopes-Assad, M. L. R. C. (2014). Using classification trees to evaluate the performance of pedotransfer functions. Vadose Zone Journal, 13(8), vzj2013-11.
Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 1–50.
Bui, M. T. T. (2023). Incremental ensemble learning model for imbalanced data: A case study of credit scoring. Journal of Advanced Engineering and Computation, 7(2), 105.
Byeon, H. (2021). Predicting the depression of the South Korean elderly using SMOTE and an imbalanced binary dataset. International Journal of Advanced Computer Science and Applications, 12(1).
Deng, J., Lu, L., & Qiu, S. (2020). Software defect prediction via LSTM. IET Software, 14(4), 443–450.
Elsobky, A. M., Keshk, A. E., & Malhat, M. G. (2023). A comparative study for different resampling techniques for imbalanced datasets. International Journal of Computers and Information, 10(3), 147–156.
Fan, Z., Sohail, S., Sabrina, F., & Gu, X. (2024). Sampling-based machine learning models for intrusion detection in imbalanced dataset. Electronics, 13(10), 1878.
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, 61, 863–905.
Fu, M., Tantithamthavorn, C., Le, T., Kume, Y., Nguyen, V., Phung, D., & Grundy, J. (2024). AIBugHunter: A practical tool for predicting, classifying and repairing software vulnerabilities. Empirical Software Engineering, 29(1).
Fu, T., Gao, W., Coley, C., & Sun, J. (2022). Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35, 12325–12338.
Goyal, S. (2022). Effective software defect prediction using support vector machines (SVMs). International Journal of System Assurance Engineering and Management, 13(2), 681–696.
Guo, H., Diao, X., & Liu, H. (2019). Improving undersampling-based ensemble with rotation forest for imbalanced problem. Turkish Journal of Electrical Engineering and Computer Sciences, 27(2), 1371–1386.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
Halstead, M. H. (1977). Elements of software science. Elsevier North-Holland.
Hryszko, J., & Madeyski, L. (2018). Cost effectiveness of software defect prediction in an industrial project. Foundations of Computing and Decision Sciences, 43(1), 7–35.
Huang, C., Li, Y., Loy, C. C., & Tang, X. (2020). Deep imbalanced learning for face recognition and attribute prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2781–2794.
Jeon, Y. S., & Lim, D. J. (2020). Psu: Particle stacking undersampling method for highly imbalanced big data. IEEE Access, 8, 131920–131927.
Khuat, T. T., & Gabrys, B. (2020). A comparative study of general fuzzy min-max neural networks for pattern classification problems. Neurocomputing, 386, 110–125.
Kou, G., Peng, Y., Shi, Y., & Wu, W. (2012). Classifier evaluation for software defect prediction. Studies in Informatics and Control, 21(2), 118.
Li, Y., Soliman, M., & Avgeriou, P. (2022). Identifying self-admitted technical debt in issue tracking systems using machine learning. Empirical Software Engineering, 27(6).
Li, Z., Jing, X. Y., & Zhu, X. (2018). Progress on approaches to software defect prediction. IET Software, 12(3), 161–175.
Loukili, M., Messaoudi, F., & El Ghazi, M. (2024). Enhancing customer retention through deep learning and imbalanced data techniques. Iraqi Journal of Science, 2853–2866.
Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: a package for binary imbalanced learning. The R Journal, 6(1), 79–89.
Mabayoje, M. A., Balogun, A. O., Jibril, H. A., Atoyebi, J. O., Mojeed, H. A., & Adeyemo, V. E. (2019). Parameter tuning in KNN for software defect prediction: An empirical analysis. Jurnal Teknologi dan Sistem Komputer, 7(4), 121–126.
McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2(4), 308–320.
McKendrick, R., Feest, B., Harwood, A., & Falcone, B. (2019). Theories and methods for labeling cognitive workload: Classification and transfer learning. Frontiers in Human Neuroscience, 13, 295.
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. (2010). Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering, 17(4), 375–407.
Meysami, M., Kumar, V., Pugh, M., Lowery, S. T., Sur, S., Mondal, S., & Greene, J. M. (2023). Utilizing logistic regression to compare risk factors in disease modeling with imbalanced data: A case study in vitamin D and cancer incidence. Frontiers in Oncology, 13, 1227842.
Mohapatra, U. (2024). An efficient convolutional neural network-based classifier for an imbalanced oral squamous carcinoma cell dataset. IAES International Journal of Artificial Intelligence, 13(1), 487.
Mqadi, N. M., Naicker, N., & Adeliyi, T. (2021). Solving misclassification of the credit card imbalance problem using near miss. Mathematical Problems in Engineering, 2021(1), 7194728.
Najeeb, M. A., & Alariyibi, A. (2024). Imbalanced dataset effect on CNN-based classifier performance for face recognition. International Journal of Artificial Intelligence & Applications, 15(1), 25–41.
Nguyen, H. M., Cooper, E. W., & Kamei, K. (2011). Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), 4–21.
Pandey, S. K., Mishra, R. B., & Tripathi, A. K. (2020). BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Systems with Applications, 144, 113085.
Patel, H., & Thakur, G. S. (2017). Classification of imbalanced data using a modified fuzzy-neighbor weighted approach. International Journal of Intelligent Engineering and Systems, 10(1), 56–64.
Rahardian, H., Faisal, M. R., Abadi, F., Nugroho, R. A., & Herteno, R. (2020). Implementation of data level approach techniques to solve unbalanced data case on software defect classification. Journal of Data Science and Software Engineering, 1(01), 53–62.
Ren, J., Qin, K., Ma, Y., & Luo, G. (2014). On software defect prediction using machine learning. Journal of Applied Mathematics, 2014, 1–8.
Rendón, E., Alejo, R., Castorena, C., Isidro-Ortega, F. J., & Granda-Gutiérrez, E. E. (2020). Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem. Applied Sciences, 10(4), 1276.
Salunkhe, U. R., & Mali, S. N. (2018). A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling. International Journal of Intelligent Systems and Applications, 14(5), 71.
Sơn, L. H., Pritam, N., Khari, M., Kumar, R., Phuong, P. T. M., & Thong, P. H. (2019). Empirical study of software defect prediction: A systematic mapping. Symmetry, 11(2), 212.
Taherkhani, A., Cosma, G., & McGinnity, T. M. (2020). AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing, 404, 351–366.
Tomar, D., & Agarwal, S. (2016). Prediction of defective software modules using class imbalance learning. Applied Computational Intelligence and Soft Computing, 2016, 1–12.
Wang, S., Liu, T., Nam, J., & Tan, L. (2020). Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering, 46(12), 1267–1293.
Wu, S., Yau, W. C., Ong, T. S., & Chong, S. C. (2021). Integrated churn prediction and customer segmentation framework for telco business. IEEE Access, 9, 62118–62136.
Zakariah, M., AlQahtani, S. A., & Al-Rakhami, M. S. (2023). Machine learning-based adaptive synthetic sampling technique for intrusion detection. Applied Sciences, 13(11), 6504.
Zhang, D., Tian, W., Cheng, X., Shi, F., Qiu, H., Liu, X., & Chen, S. (2023). FedBIP: A federated learning-based model for wind turbine blade icing prediction. IEEE Transactions on Instrumentation and Measurement, 72, 1–11.
Zhang, J., & Chen, L. (2019). Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery, 24(sup2), 62–72.
Zheng, M., Wang, F., Hu, X., Miao, Y, Cao, H., & Tang, M. (2022). A method for analyzing the performance impact of imbalanced binary data on machine learning models. Axioms, 11(11), 607.

Ayrıntılar

Birincil Dil

Türkçe

Konular

Bilgi Sistemleri (Diğer)

Bölüm

Araştırma Makalesi

Yazarlar

Emre Can Yılmaz ^*
0000-0003-4365-9131
Türkiye

Recai Oktaş
0000-0003-3282-3549
Türkiye

Erken Görünüm Tarihi

4 Aralık 2025

Yayımlanma Tarihi

15 Ocak 2026

Gönderilme Tarihi

29 Eylül 2025

Kabul Tarihi

7 Kasım 2025

Yayımlandığı Sayı

Yıl 2026 Cilt: 9 Sayı: 1

DOI

https://doi.org/10.34248/bsengineering.1792907

IZ

https://izlik.org/JA72YM67WA

Kaynak Göster

RIS / Bibtex

APA

Yılmaz, E. C., & Oktaş, R. (2026). NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi. Black Sea Journal of Engineering and Science, 9(1), 41-52. https://doi.org/10.34248/bsengineering.1792907

AMA

1.Yılmaz EC, Oktaş R. NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi. BSJ Eng. Sci. 2026;9(1):41-52. doi:10.34248/bsengineering.1792907

Chicago

Yılmaz, Emre Can, ve Recai Oktaş. 2026. “NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi”. Black Sea Journal of Engineering and Science 9 (1): 41-52. https://doi.org/10.34248/bsengineering.1792907.

EndNote

Yılmaz EC, Oktaş R (01 Ocak 2026) NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi. Black Sea Journal of Engineering and Science 9 1 41–52.

IEEE

[1]E. C. Yılmaz ve R. Oktaş, “NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi”, BSJ Eng. Sci., c. 9, sy 1, ss. 41–52, Oca. 2026, doi: 10.34248/bsengineering.1792907.

ISNAD

Yılmaz, Emre Can - Oktaş, Recai. “NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi”. Black Sea Journal of Engineering and Science 9/1 (01 Ocak 2026): 41-52. https://doi.org/10.34248/bsengineering.1792907.

JAMA

1.Yılmaz EC, Oktaş R. NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi. BSJ Eng. Sci. 2026;9:41–52.

MLA

Yılmaz, Emre Can, ve Recai Oktaş. “NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi”. Black Sea Journal of Engineering and Science, c. 9, sy 1, Ocak 2026, ss. 41-52, doi:10.34248/bsengineering.1792907.

Vancouver

1.Emre Can Yılmaz, Recai Oktaş. NASA Metrics Data Program Veri Seti Üzerinde Yeniden Örnekleme Yöntemlerinin Yazılımda Hata Tahmini Başarımına Etkisi. BSJ Eng. Sci. 01 Ocak 2026;9(1):41-52. doi:10.34248/bsengineering.1792907