opus jsr

OPUS Journal of Society Research

2791-9862

İdeal Kent Yayınları

10.26466/opusjsr.1837461

Labor Sociology Labor and Organisition Sociology Strategy, Management and Organisational Behaviour (Other)

Çalışma Sosyolojisi İş ve Örgüt Sosyolojisi Strateji, Yönetim ve Örgütsel Davranış (Diğer)

Topluluk öğrenme yöntemleri ve SHAP tabanlı açıklamalar kullanılarak çalışan ayrılmalarında azınlık sınıfının tespit performansının artırılması

Improving minority-class detection in employee attrition with ensemble learning and SHAP-Based explanations

https://orcid.org/0000-0002-8445-4629

Özden

Cevher

ÇUKUROVA ÜNİVERSİTESİ

03 31 2026

23 2026 1 14 12 06 2025 03 30 2026

2011

OPUS Journal of Society Research

Bu çalışma, özellikle küçük ve dengesiz tablosal veri setlerinde, geleneksel topluluk (ensemble) yöntemleri ile güncel derin öğrenme yaklaşımlarının çalışan yıpranmasını (işten ayrılma) tahmin etmedeki performansını incelemekte; ayrıca yalnızca teknik değerlendirmeye değil, pratik İK kullanımına da uygun, yorumlanabilir bir karar destek yapısı önermektedir. Analiz, veri sızıntısını önlemek amacıyla özel olarak yapılandırılmış bir doğrulama tasarımıyla IBM İK Analitiği veri seti üzerinde yürütülmüştür. Sınıf dengesizliğini gidermek için SMOTE, doğrulama verilerine müdahale edilmeden yalnızca eğitim katmanlarında uygulanmıştır. Karşılaştırılan modeller Rastgele Orman, XGBoost, CatBoost, Yapay Sinir Ağları (ANN) ve TabNet’tir. Varsayılan olasılık eşikleri yerine, azınlık sınıfına duyarlılığı artırmak amacıyla dinamik eşik ayarı kullanılmıştır. Model performansı Recall, Precision, F1-skoru ve hata matrisleriyle değerlendirilmiş; yorumlanabilirliği güçlendirmek için SHAP analizinden yararlanılmıştır. Güvenilirliği artırmak üzere Tabakalı 5-Katlı Çapraz Doğrulama uygulanmıştır. Bulgular, CatBoost’un 0,097 Brier skoru ve 0,468 ± 0,053 ortalama F1-skoruyla en dengeli sonuçları verdiğini göstermektedir. Dinamik eşik ayarı sonrasında TabNet en yüksek duyarlılığı (Recall: 0,573 ± 0,064) sağlamış, bu da onu erken risk tespitinde öne çıkarmıştır. SHAP sonuçlarına göre en etkili değişkenler Fazla Mesai ve Hisse Senedi Opsiyon Seviyesidir.

This study examines how traditional ensemble techniques and recent deep learning approaches perform in predicting employee attrition, particularly when working with small and imbalanced tabular datasets. In addition, it proposes an interpretable decision-support framework designed not only for technical evaluation but also for practical HR use. The analysis was conducted on the IBM HR Analytics dataset using a specially structured validation design to prevent data leakage. To address class imbalance, SMOTE was applied strictly within the training folds, without touching the validation data. The models compared include Random Forest, XGBoost, CatBoost, Artificial Neural Networks (ANN), and TabNet. Rather than relying on default probability thresholds, dynamic threshold adjustment was introduced to improve sensitivity to the minority class. Model performance was evaluated using Recall, Precision, F1-score, and confusion matrices, while SHAP analysis was employed to enhance interpretability and support transparent decision-making in HR contexts. To strengthen the reliability of the evaluation, a Stratified 5-Fold Cross-Validation scheme was adopted. The findings show that CatBoost produced the most balanced and consistent results, achieving a mean F1-score of 0.468 ± 0.053 together with a Brier score of 0.097. After dynamic threshold adjustment, TabNet demonstrated the highest sensitivity (Recall: 0.573 ± 0.064), making it particularly effective for early risk detection. According to the SHAP-based interpretation, OverTime and Stock Option Level emerged as the most influential predictors.

tabnet xgboost explainable ai (xai) shap Employee churn prediction tabnet xgboost explainable ai (xai) shap imbalanced classification

tabnet xgboost açıklanabilir yapay zekâ shap Çalışan ayrılma tahmini tabnet xgboost açıklanabilir yapay zekâ shap sınıf dengesizliği problem

No funding was received from any public, commercial, or non-profit organization for this study.

Alao, D., & Adeyemo, A. B. (2013). Analyzing employee attrition using decision tree algorithms. Computing, Information Systems & Development Informatics, 4(1), 17-28.

Arik, S. O., & Pfister, T. (2021). TabNet: Attentive interpretable tabular learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 6679-6687. https://doi.org/10.1609/aaai.v35i8.16826

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Fallucchi, F., Coladangelo, M., Giuliano, R., & William De Luca, E. (2020). Predicting employee attrition using machine learning techniques. Computers, 9(4), 86. https://doi.org/10.3390/computers9040086

Fang, Y., & Zhang, Z. (2025). Employee turnover prediction model based on feature selection and imbalanced data handling. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3589491

Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems, 35, 507-520. https://doi.org/10.48550/arXiv.2207.08815

He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239

Jain, R., & Nayyar, A. (2019). Predicting employee attrition using XGBoost machine learning approach. Proceedings of the 2019 International Conference on System Modeling & Advancement in Research Trends (SMART), 113-120. https://doi.org/10.1109/SYSMART.2018.8746940

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1705.07874

Molnar, C. (2020). Interpretable machine learning: A guide for making black box models explainable. Retrieved from: https://christophm.github.io/interpretable-ml-book/

Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Fujii, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic performance estimation. IEEE Computational Intelligence Magazine, 13(4), 59-79. https://doi.org/10.1109/MCI.2018.2866730

Saradhi, V. V., & Palshikar, G. K. (2011). Employee churn prediction. Expert Systems with Applications, 38(3), 1999-2006. https://doi.org/10.1016/j.eswa.2010.07.134

Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90. https://doi.org/10.1016/j.inffus.2021.11.011

Subhash, P. (2017). IBM HR analytics employee attrition & performance [Data set]. Kaggle. https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset

Thapliyal, N., Solanki, S., Pandey, N. K., & Papola, S. (2024). Employee attrition analysis using XGBoost. In 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE) (pp. 1–6). IEEE. https://doi.org/10.1109/IC3SE62002.2024.10593326

Wang, X., & Zhi, J. (2021). A machine learning-based analytical framework for employee turnover prediction. Journal of Management Analytics, 8(3), 351–370. https://doi.org/10.1080/23270012.2021.1961318

Yurtsever, M. (2024). Çalışan yıpranmasını tahmin etmede analitik bir yaklaşım: Topluluk öğrenme yöntemi. Afyon Kocatepe Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 26(Özel Sayı), 150-160. https://doi.org/10.33707/akuiibfd.1462567