Improving minority-class detection in employee attrition with ensemble learning and SHAP-Based explanations
Öz
This study examines how traditional ensemble techniques and recent deep learning approaches perform in predicting employee attrition, particularly when working with small and imbalanced tabular datasets. In addition, it proposes an interpretable decision-support framework designed not only for technical evaluation but also for practical HR use. The analysis was conducted on the IBM HR Analytics dataset using a specially structured validation design to prevent data leakage. To address class imbalance, SMOTE was applied strictly within the training folds, without touching the validation data. The models compared include Random Forest, XGBoost, CatBoost, Artificial Neural Networks (ANN), and TabNet. Rather than relying on default probability thresholds, dynamic threshold adjustment was introduced to improve sensitivity to the minority class. Model performance was evaluated using Recall, Precision, F1-score, and confusion matrices, while SHAP analysis was employed to enhance interpretability and support transparent decision-making in HR contexts. To strengthen the reliability of the evaluation, a Stratified 5-Fold Cross-Validation scheme was adopted. The findings show that CatBoost produced the most balanced and consistent results, achieving a mean F1-score of 0.468 ± 0.053 together with a Brier score of 0.097. After dynamic threshold adjustment, TabNet demonstrated the highest sensitivity (Recall: 0.573 ± 0.064), making it particularly effective for early risk detection. According to the SHAP-based interpretation, OverTime and Stock Option Level emerged as the most influential predictors.
Anahtar Kelimeler
tabnet, xgboost, explainable ai (xai), shap, Employee churn prediction, tabnet, xgboost, explainable ai (xai), shap, imbalanced classification
Destekleyen Kurum
Etik Beyan
Teşekkür
Kaynakça
- Alao, D., & Adeyemo, A. B. (2013). Analyzing employee attrition using decision tree algorithms. Computing, Information Systems & Development Informatics, 4(1), 17-28.
- Arik, S. O., & Pfister, T. (2021). TabNet: Attentive interpretable tabular learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 6679-6687. https://doi.org/10.1609/aaai.v35i8.16826
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
- Fallucchi, F., Coladangelo, M., Giuliano, R., & William De Luca, E. (2020). Predicting employee attrition using machine learning techniques. Computers, 9(4), 86. https://doi.org/10.3390/computers9040086
- Fang, Y., & Zhang, Z. (2025). Employee turnover prediction model based on feature selection and imbalanced data handling. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3589491
- Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems, 35, 507-520. https://doi.org/10.48550/arXiv.2207.08815
- He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239
- Jain, R., & Nayyar, A. (2019). Predicting employee attrition using XGBoost machine learning approach. Proceedings of the 2019 International Conference on System Modeling & Advancement in Research Trends (SMART), 113-120. https://doi.org/10.1109/SYSMART.2018.8746940