Research Article
BibTex RIS Cite

Early-Stage Diabetes Detection Using Age-Based Feature Selection and Machine Learning Methods

Year 2025, Volume: 12 Issue: 4, 953 - 965, 31.12.2025
https://doi.org/10.54287/gujsa.1753574

Abstract

Diabetes is spreading rapidly around the world, and there is an urgent need for governments to develop comprehensive strategies for diabetes detection. Early detection of diabetes is important for early initiation of treatment. In this paper, Data Mining (DM) and Machine Learning (ML) techniques are used to detect early diabetes by age from survey data. The dataset was divided into 3 groups (young, middle-aged, elderly), and a unique feature selection process was performed by averaging the feature importance obtained in Random Forest (RF), Gradient Boosting (GB), and eXtreme Gradient Boosting (XGBoost) algorithms for each group, and the features that should be considered in diabetes detection according to age groups were determined. Then, the features selected for each age group were classified using different ML methods. Accuracies of 96.77%, 98.10% and 99% were obtained for the young, middle-aged and elderly groups, respectively. The characteristics that should be taken into account in the assessment of diabetes according to age groups were also identified.

References

  • American Diabetes Association. (2014). Diagnosis and classification of diabetes mellitus. Diabetes Care, 37(Suppl 1), S81–S90. https://doi.org/10.2337/dc14-S081
  • Arif Ali, Z., H. Abduljabbar, Z., A. Tahir, H., Bibo Sallow, A., & Almufti, S. M. (2023). eXtreme Gradient Boosting algorithm with machine learning: A review. Academic Journal of Nawroz University, 12(2), 320–334. https://doi.org/10.25007/ajnu.v12n2a1612
  • Beagley, J., Guariguata, L., Weil, C., & Motala, A. A. (2014). Global estimates of undiagnosed diabetes in adults. Diabetes Research and Clinical Practice, 103(2), 150–160. https://doi.org/10.1016/j.diabres.2013.11.001
  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July 27-29). A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory (COLT’92) (pp. 144–152). Pittsburgh, PA. https://doi.org/10.1145/130385.130401
  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Chen, T. Q., & Guestrin, C. (2016, August 13-17). XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
  • Cho, N. H., Shaw, J. E., Karuranga, S., Huang, Y., da Rocha Fernandes, J. D., Ohlrogge, A. W., & Malanda, B. (2018). IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Research and Clinical Practice, 138, 271–281. https://doi.org/10.1016/j.diabres.2018.02.023
  • Doğru, A., Buyrukoğlu, S., & Arı, M. (2023). A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Medical & Biological Engineering & Computing, 61, 785–797. https://doi.org/10.1007/s11517-022-02749-z
  • Dyussenbayev, A. (2017). Age periods of human life. Advances in Social Sciences Research Journal, 4(6), 258–263. https://doi.org/10.14738/assrj.46.2924
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  • Gündoğdu, S. (2023). Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique. Multimedia Tools and Applications, 82, 34163–34181. https://doi.org/10.1007/s11042-023-15165-8
  • Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Elsevier. https://doi.org/10.1016/C2009-0-61819-5
  • Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844. https://doi.org/10.1109/34.709601
  • Islam, M. M. F., Ferdousi, R., Rahman, S., & Bushra, H. Y. (2020). Likelihood prediction of diabetes at early stage using data mining techniques. In: Advances in Intelligent Systems and Computing (pp. 113–125). Springer. https://doi.org/10.1007/978-981-13-8798-2_12
  • Joseph, L. P., Joseph, E. A., & Prasad, R. (2022). Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Computers in Biology and Medicine, 151, 106178. https://doi.org/10.1016/j.compbiomed.2022.106178
  • Kearns, M. (1988). Thoughts on hypothesis boosting. Machine Learning Class Project, 1–9. http://www.cis.upenn.edu/~mkearns/papers/boostnote.pdf
  • Le, T. M., Vo, T. M., Pham, T. N., & Dao, S. V. T. (2021). A novel wrapper-based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access, 9, 7869–7884. https://doi.org/10.1109/ACCESS.2020.3047942
  • Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
  • Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., & Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Research and Clinical Practice, 157, 107843. https://doi.org/10.1016/j.diabres.2019.107843
  • Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517. https://doi.org/10.1093/bioinformatics/btm344
  • Samreen, S. (2021). Memory-efficient, accurate and early diagnosis of diabetes through a machine learning pipeline employing crow search-based feature engineering and a stacking ensemble. IEEE Access, 9, 134335–134354. https://doi.org/10.1109/ACCESS.2021.3116383
  • Saxena, S., Mohapatra, D., Padhee, S., & Sahoo, G. K. (2023). Machine learning algorithms for diabetes detection: A comparative evaluation of performance of algorithms. Evolutionary Intelligence, 16, 587–603. https://doi.org/10.1007/s12065-021-00685-9
  • Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining (2nd ed.). Pearson Education.
  • Tan, Y., Chen, H., Zhang, J., Tang, R., & Liu, P. (2022). Early risk prediction of diabetes based on GA-stacking. Applied Sciences, 12(20), 20632. https://doi.org/10.3390/app12020632
  • Wu, Y., Zhang, Q., Hu, Y., Sun-Woo, K., Zhang, X., Zhu, H., jie, L., & Li, S. (2022). Novel binary logistic regression model based on feature transformation of XGBoost for type 2 diabetes mellitus prediction in healthcare systems. Future Generation Computer Systems, 129, 1–12. https://doi.org/10.1016/j.future.2021.11.003
There are 26 citations in total.

Details

Primary Language English
Subjects Computing Applications in Health
Journal Section Research Article
Authors

Betül Uzbaş 0000-0002-0255-5988

Submission Date July 29, 2025
Acceptance Date October 21, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 12 Issue: 4

Cite

APA Uzbaş, B. (2025). Early-Stage Diabetes Detection Using Age-Based Feature Selection and Machine Learning Methods. Gazi University Journal of Science Part A: Engineering and Innovation, 12(4), 953-965. https://doi.org/10.54287/gujsa.1753574