Improvement of machine learning-based diabetes diagnosis via resampling techniques
Abstract
The objective of this study is to enhance the accuracy of diabetes diagnosis through the utilisation of machine learning techniques and resampling methods. The imbalanced nature of diabetes datasets presents a significant challenge for traditional classification algorithms, which often struggle to accurately predict results. In order to enhance the efficacy of the model, a comparative analysis was conducted to assess the performance of a range of over-sampling and under-sampling techniques, including SMOTE, ADASYN, Borderline SMOTE, SVM SMOTE, Random Under Sampler, Near Miss, One Sided Selection, Neighbourhood Cleaning Rule, Edited Nearest Neighbours, Instance Hardness Threshold, AllKNN and Tomek Links. The aforementioned techniques were then applied to the Decision Tree, Random Forest, K-Nearest Neighbours, AdaBoost, Extra Tree Classifier, and machine learning classifiers, and their performance was evaluated using the accuracy, recall, precision, F-Score, and AUC-ROC performance metrics. The SVMSMOTE resampling technique was identified as the most successful method, achieving 99.06% accuracy when used in combination with the decision tree classifier. The findings demonstrate that the incorporation of resampling techniques markedly enhances diagnostic proficiency and yields more dependable forecasts. This research makes a significant contribution to the field of medical informatics, providing a robust framework for diabetes diagnosis and offering valuable insights into the application of machine learning in healthcare.
Keywords
References
- [1] International Diabetes Federation. “IDF Diabetes Atlas”. https://diabetesatlas.org (11.11.2024).
- [2] International Diabetes Federation. “Diabetes Now Affects One in 10 Adults Worldwide,” https://idf.org/news/diabetes-now-affects-one-in-10-adults-worldwide/ (11.11.2024).
- [3] Özmen T, Kuzu Ü, Koçyiğit Y, Sarnel H. “Early stage diabetes prediction by features selection with metaheuristic methods”. Pamukkale University Journal of Engineering Sciences, 29(6), 596-606, 2023.
- [4] Pradhan N, Rani G, Dhaka VS, Poonia RC. Diabetes Prediction Using Artificial Neural Network. Editors: Basant A, Valentina EB, Lakhmi CJ, Ramesh CP. Deep Learning Techniques for Biomedical and Health Informatics. 327-339, Singapore, Springer Academic Press, 2020.
- [5] Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. “Classification and prediction of diabetes disease using machine learning paradigm”. Health Information Science and Systems, 8(1), 7-14, 2020.
- [6] Daghistani T, Alshammari R. “Comparison of statistical logistic regression and RandomForest machine learning techniques in predicting diabetes”. Journal of Advances in Information Technology, 11(1), 78-83, 2020.
- [7] Shuja M, Mittal S, Zaman M. “Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE”. Advances in Computing and Intelligent Systems: Proceedings of ICACM 2019, Singapore, 14-16 December 2020.
- [8] Butt UM, Letchmunan S, Ali M, Hassan FH, Baqir A, Sherazi HHR. “Machine learning based diabetes classification and prediction for healthcare applications”. Journal of healthcare engineering, 2021(1), 933-985, 2021.
Details
Primary Language
English
Subjects
Machine Learning (Other)
Journal Section
Research Article
Early Pub Date
November 2, 2025
Publication Date
March 16, 2026
Submission Date
November 26, 2024
Acceptance Date
August 20, 2025
Published in Issue
Year 2026 Volume: 32 Number: 2