The Role of Machine Learning Models in Early Diabetes Diagnosis: A Dataset Based Analysis

Ekemen Çengel; Emine Cengil; Muhammed Yıldırım

doi:10.53070/bbd.1632130

Theoretical Article

The Role of Machine Learning Models in Early Diabetes Diagnosis: A Dataset Based Analysis

Year 2025, Volume: 10 Issue: 1, 33 - 42, 01.06.2025

Ekemen Çengel , Emine Cengil , Muhammed Yıldırım

Abstract

Diabetes is a chronic metabolic disease in which the level of glucose in the blood rises above normal. The main reason for this is that the pancreas cannot produce enough insulin or the insulin produced cannot be used effectively. For diabetes to be managed and complications to be avoided, early diagnosis is essential. Advanced technologies such as machine learning contribute to both individual health management and public health systems by providing high accuracy rates in early diagnosis. In this study, it is aimed to examine the role of machine learning methods in the early diagnosis of diabetes. For this purpose, the methods were analysed on two different datasets. Support Vector Machines, Decision Trees, and Artificial Neural Networks were among the machine learning classifiers that were employed. In both datasets, the performance of the models in terms of metrics such as accuracy, sensitivity, and specificity were evaluated and compared. According to the results, the Bagged Trees algorithm showed the best performance with 96.2% in the first dataset we used, BIT Mesra Dataset. In the Pima Indian dataset, the SVM algorithm achieved an accuracy rate of 77.2%. The study provides a method for early diagnosis of diabetes, and emphasises the importance of data diversity in this field.

Keywords

Diabetes , Machine Learning , BIT Mesra Dataset , Pima Indian Dataset , Artificial Intelligence

References

Arıkoğlu H, Kaya DE. (2015). Tip 2 diyabetin moleküler genetik temeli; Son gelişmeler. Genel Tıp Dergisi, 25(4), 147-159.
Başer BÖ, Yangın M, Sarıdaş ES. (2021). Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 25(1), 112-120.
Buscema M. (2002). A brief overview and introduction to artificial neural networks. Substance use & misuse, 37(8-10), 1093-1148.
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges, and trends. Neurocomputing, 408, 189-215.
Modak, S. K. S., & Jha, V. K. (2024). Diabetes prediction model using machine learning techniques. Multimedia Tools and Applications, 83(13), 38523-38549.
Önsüz MF, Topuzoğlu A. (2018). İstanbul İlinde Üç Hastanede Ayaktan İzlenen Tip II Diyabetik Hastalarda Glisemik Kontrolün Maliyet Etkinliğinin Değerlendirilmesi. ESTÜDAM Halk Sağlığı Dergisi, 3(2), 1-14.
Patro VM, Patra MR. (2014). Augmenting weighted average with confusion matrix to enhance classification accuracy. Transactions on Machine Learning and Artificial Intelligence, 2(4), 77-91.
Pima Indians Diabetes dataset (May 2008). Pima Indians Diabetes dataset. Available from: http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes data. Accessed 15 Dec 2024.
Rahman A. Tasnim S. (2014). Ensemble classifiers and their applications: a review. arXiv preprint arXiv:1404.4088.
Sisodia, D., & Sisodia, D. S. (2018). Prediction of diabetes using classification algorithms. Procedia computer science, 132, 1578-1585.
Tigga, N. P., Garg, S. (2020). Prediction of type 2 diabetes using machine learning classification methods. Procedia Computer Science, 167, 706-716.
Tigganeha N. (2019). Diabetes dataset 2019. https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019. Accessed 15 Dec 2024.
Todkar, SS. (2016). Diabetes mellitus the silent Killer'of mankind: An overview on the eve of upcoming World Health Day!. Journal of Medical & Allied Sciences, 6(1), 39.
Yıldırım M, Çınar A, Cengil E. (2021). Investigation of Cloud Computing Based Big Data on Machine Learning Algorithms. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(2), 670-682.

Erken Diyabet Tanısında Makine Öğrenme Modellerinin Rolü: Veri Seti Tabanlı Bir Analiz

Year 2025, Volume: 10 Issue: 1, 33 - 42, 01.06.2025

Ekemen Çengel , Emine Cengil , Muhammed Yıldırım

Abstract

Diyabet, kandaki glikoz düzeyinin normalin üzerine çıktığı kronik bir metabolik hastalıktır. Bunun temel nedeni pankreasın yeterli insülin üretememesi veya üretilen insülinin etkili bir şekilde kullanılamamasıdır. Diyabetin yönetilebilmesi ve komplikasyonların önlenebilmesi için erken tanı olmazsa olmazdır. Makine öğrenimi gibi ileri teknolojiler erken tanıda yüksek doğruluk oranları sağlayarak hem bireysel sağlık yönetimine hem de toplum sağlığı sistemlerine katkı sağlamaktadır. Bu çalışmada, diyabetin erken tanısında makine öğrenimi yöntemlerinin rolünün incelenmesi amaçlanmıştır. Bu amaçla yöntemler iki farklı veri seti üzerinde analiz edilmiştir. Destek Vektör Makineleri, Karar Ağaçları ve Yapay Sinir Ağları kullanılan makine öğrenimi sınıflandırıcıları arasındaydı. Her iki veri setinde de modellerin doğruluk, duyarlılık ve özgüllük gibi metrikler açısından performansları değerlendirilmiş ve karşılaştırılmıştır. Sonuçlara göre Bagged Trees algoritması kullandığımız ilk veri seti olan BIT Mesra Veri kümesinde %96,2 ile en iyi performansı göstermiştir. Pima Indian veri kümesinde, SVM algoritması %77,2'lik bir doğruluk oranına ulaştı. Çalışma, diyabetin erken teşhisi için bir yöntem sunmakta ve bu alanda veri çeşitliliğinin önemini vurgulamaktadır.

Keywords

Diyabet , Makine Öğrenmesi , BIT Mesra Veri kümesi , Pima Indian Veri Kümesi , Yapay Zeka

References

Arıkoğlu H, Kaya DE. (2015). Tip 2 diyabetin moleküler genetik temeli; Son gelişmeler. Genel Tıp Dergisi, 25(4), 147-159.
Başer BÖ, Yangın M, Sarıdaş ES. (2021). Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 25(1), 112-120.
Buscema M. (2002). A brief overview and introduction to artificial neural networks. Substance use & misuse, 37(8-10), 1093-1148.
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges, and trends. Neurocomputing, 408, 189-215.
Modak, S. K. S., & Jha, V. K. (2024). Diabetes prediction model using machine learning techniques. Multimedia Tools and Applications, 83(13), 38523-38549.
Önsüz MF, Topuzoğlu A. (2018). İstanbul İlinde Üç Hastanede Ayaktan İzlenen Tip II Diyabetik Hastalarda Glisemik Kontrolün Maliyet Etkinliğinin Değerlendirilmesi. ESTÜDAM Halk Sağlığı Dergisi, 3(2), 1-14.
Patro VM, Patra MR. (2014). Augmenting weighted average with confusion matrix to enhance classification accuracy. Transactions on Machine Learning and Artificial Intelligence, 2(4), 77-91.
Pima Indians Diabetes dataset (May 2008). Pima Indians Diabetes dataset. Available from: http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes data. Accessed 15 Dec 2024.
Rahman A. Tasnim S. (2014). Ensemble classifiers and their applications: a review. arXiv preprint arXiv:1404.4088.
Sisodia, D., & Sisodia, D. S. (2018). Prediction of diabetes using classification algorithms. Procedia computer science, 132, 1578-1585.
Tigga, N. P., Garg, S. (2020). Prediction of type 2 diabetes using machine learning classification methods. Procedia Computer Science, 167, 706-716.
Tigganeha N. (2019). Diabetes dataset 2019. https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019. Accessed 15 Dec 2024.
Todkar, SS. (2016). Diabetes mellitus the silent Killer'of mankind: An overview on the eve of upcoming World Health Day!. Journal of Medical & Allied Sciences, 6(1), 39.
Yıldırım M, Çınar A, Cengil E. (2021). Investigation of Cloud Computing Based Big Data on Machine Learning Algorithms. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(2), 670-682.

There are 14 citations in total.

Details

Primary Language	English
Subjects	Machine Learning (Other)
Journal Section	PAPERS
Authors	Ekemen Çengel 0009-0001-3102-115X Emine Cengil 0000-0003-4313-8694 Muhammed Yıldırım 0000-0003-1866-4721
Publication Date	June 1, 2025
Submission Date	February 3, 2025
Acceptance Date	February 20, 2025
Published in Issue	Year 2025 Volume: 10 Issue: 1

Cite

APA	Çengel, E., Cengil, E., & Yıldırım, M. (2025). The Role of Machine Learning Models in Early Diabetes Diagnosis: A Dataset Based Analysis. Computer Science, 10(1), 33-42. https://doi.org/10.53070/bbd.1632130