Research Article
BibTex RIS Cite
Year 2022, Volume: 2 Issue: 2, 1 - 9, 01.10.2022

Abstract

References

  • [1] World Health Organisation (WHO). 2016. “Global report on diabetes” https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf Accessed: 17 December 2021.
  • [2] Türkiye Endokrinoloji ve Metabolizma Derneği (TEMD). Diabetes Mellitus Çalışma ve Eğitim Grubu. “Diabetes Mellitus ve Komplikasyonlarının Tanı, Tedavi ve İzlem Klavuzu 2019”. https://temd.org.tr/admin/uploads/tbl_kilavuz/20190819095854-2019tbl_kilavuzb48da47363.pdf Accessed: 27 December 2021.
  • [3] Tekir, O., Çevik, C., Kaymak, G. Ö., & Kaya, A. (2021). The Effect of Diabetes Symptoms on Quality of Life in Individuals with Type 2 Diabetes. Acta Endocrinologica (Bucharest), 17(2), 186.
  • [4] TÜRKDİAB (2019). Diyabet Tanı ve Tedavi Rehberi. Güncellenmiş 9. Baskı. Armoni Nüans Baskı Sanatları A.Ş. İstanbul, s. 16.
  • [5] World Health Organization (2019). Classification of Diabetes Mellitus 2019. ISBN: 9789241515702.
  • [6] Guo, Y., Zhao, J., Wang, H., Liu, S., Huang, T., & Chang, G. (2020). Metabolic disorder-related hypertension. In Secondary hypertension (pp. 507-545). Springer, Singapore.
  • [7] Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., ... & IDF Diabetes Atlas Committee. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice, 157, 107843.
  • [8] International Diabetes Federation (IDF). “IDF Diabetes Atlas 10th Edition 2021”. https://diabetesatlas.org/idfawp/resource-files/2021/07/IDF_Atlas_10th_Edition_2021.pdf Son erişim tarihi: 27 Aralık 2021.
  • [9] Ulusal Diyabet Konsensus Grubu. “TÜRKDİAB Diyabet Tanı ve Tedavi Rehberi 2019”. https://www.turkdiab.org/admin/PICS/files/Diyabet_Tani_ve_Tedavi_Rehberi_2019.pdf Accessed: 27 Aralık 2021.
  • [10] World Health Organization (2016). Global Report on Diabetes, https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf;jsessionid=1C047E5A6F657E8A51DB41D7512B089E?sequence=1 Accessed 24 May 2022.
  • [11] Diabetes Prevention Program Research Group. (2009). 10-year follow-up of diabetes incidence and weight loss in the Diabetes Prevention Program Outcomes Study. The Lancet, 374(9702), 1677-1686.
  • [12] Özsezer Kaymak, G., & Tekir Ö. (2020). Diyabet Bakımında Yapay Zeka Kullanımı. Eds. (B. Tunçsiper, F. Taşpınar, Ö. Erkin Geyiktepe). Sağlık Bilimlerinde Multisipliner Yaklaşımlar 2. P.393-410.
  • [13] Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes care, 26(3), 725-731.
  • [14] Chen, L., Magliano, D. J., Balkau, B., Colagiuri, S., Zimmet, P. Z., Tonkin, A. M., ... & Shaw, J. E. (2010). AUSDRISK: an Australian Type 2 Diabetes Risk Assessment Tool based on demographic, lifestyle and simple anthropometric measures. Medical Journal of Australia, 192(4), 197-202.
  • [15] Balkau, B., Lange, C., Fezeu, L., Tichet, J., de Lauzon-Guillain, B., Czernichow, S., ... & Eschwège, E. (2008). Predicting diabetes: clinical, biological, and genetic approaches: data from the Epidemiological Study on the Insulin Resistance Syndrome (DESIR). Diabetes care, 31(10), 2056-2061.
  • [16] Rosella, L. C., Manuel, D. G., Burchill, C., & Stukel, T. A. (2011). A population-based risk algorithm for the development of diabetes: development and validation of the Diabetes Population Risk Tool (DPoRT). Journal of Epidemiology & Community Health, 65(7), 613-620.
  • [17] Hippisley-Cox, J., Coupland, C., Robson, J., Sheikh, A., & Brindle, P. (2009). Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore. Bmj, 338.
  • [18] Ergün, Ö. N., & Ilhan, H. O. (2021). Early Stage Diabetes Prediction Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (29), 52-57.
  • [19] Bilgin, G. (2021). Makine öğrenmesi algoritmaları kullanarak erken dönemde diyabet hastalığı riskinin araştırılması. Journal of Intelligent Systems: Theory and Applications, 4(1), 55-64.
  • [20] Özkan, Y., Yürekli, B. S., & Suner, A. Diyabet tanısının tahminlenmesinde denetimli makine öğrenme algoritmalarının performans karşılaştırması. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12(1), 211-226.
  • [21] Cihan, P., & Coşkun, H. Diyabet Tahmini için Makine Öğrenmesi Modellerinin Performans Karşılaştırılması. 2021 29th Signal Processing and Communications Applications Conference (SIU).
  • [22] Akyol, K., & Karaci, A. Diyabet Hastalığının Erken Aşamada Tahmin Edilmesi İçin Makine Öğrenme Algoritmalarının Performanslarının Karşılaştırılması. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 9(6), 123-134.
  • [23] Harman, G. (2021). Destek Vektör Makineleri ve Naive Bayes Sınıflandırma Algoritmalarını Kullanarak Diabetes Mellitus Tahmini. Avrupa Bilim ve Teknoloji Dergisi, (32), 7-13.
  • [24] İlyas, Ö. (2020). Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini. Mühendislik Bilimleri ve Araştırmaları Dergisi, 2(2), 50-57.
  • [25] Sousa Lima, W., Souto, E., El-Khatib, K., Jalali, R., & Gama, J. (2019). Human activity recognition using inertial sensors in a smartphone: An overview. Sensors, 19(14), 3213.
  • [26] Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • [27] Kramer, O. (2013). K-nearest neighbors. In Dimensionality reduction with unsupervised nearest neighbors (pp. 13-23). Springer, Berlin, Heidelberg.
  • [28] Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia medica, 24(1), 12-18.
  • [29] Priyam, A., Abhijeeta, G. R., Rathee, A., & Srivastava, S. (2013). Comparative analysis of decision tree classification algorithms. International Journal of current engineering and technology, 3(2), 334-337.
  • [30] Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.
  • [31] Webb, G. I., Keogh, E., & Miikkulainen, R. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714.
  • [32] Baratloo, A., Hosseini, M., Negida, A., & El Ashal, G. (2015). Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Archives of Academic Emergency Medicine (Emergency), 3(2), 48-49.
  • [33] Koenig, I. R., Fuchs, O., Hansen, G., von Mutius, E., & Kopp, M. V. (2017). What is precision medicine?. European respiratory journal, 50(4).
  • [34] Torgo, L., & Ribeiro, R. (2009, October). Precision and recall for regression. In International Conference on Discovery Science (pp. 332-346). Springer, Berlin, Heidelberg.
  • [35] Lipton, Z. C., Elkan, C., & Narayanaswamy, B. (2014). Thresholding classifiers to maximize F1 score. arXiv preprint arXiv:1402.1892.
  • [36] Visa, S., Ramsay, B., Ralescu, A. L., & Van Der Knaap, E. (2011). Confusion matrix-based feature selection. MAICS, 710, 120-127.

Diabetes Risk Prediction with Machine Learning Models

Year 2022, Volume: 2 Issue: 2, 1 - 9, 01.10.2022

Abstract

Diabetes mellitus (DM) is one of the most common chronic diseases worldwide, which is a major public health problem. The aim of this study is to predict DM risk with machine learning (ML) models using available data. In the analytical study, the “Diabetes Health Indicators Dataset” consisting of 253680 data and 21 variables collected annually by the CDC was used. The open access dataset was retrieved from Kaggle on March 5, 2022. Data analysis was done with Phyton 3.0 programming language using numpy, pandas, matplotlib, seaborn, sciktlearn, imblearn libraries. With data pre-processing, outliers and missing data were removed. KNN, Logistic regression, Decision tree, Random forest and Naive Bayes from ML algorithms were used in predictive modeling. The prediction rate of the algorithms was evaluated with accuracy, precision, recall and F1 Score. It did not require permission as the data was open access. KNN’s accuracy was 0.74, precision 0.31, recall 0.55, F1 score 0.39; Logistic regression’s accuracy was 0.72; precision 0.33, recall 0.74, F1 score 0.46; Decision tree’s was accuracy 0.84, precision 0.54 recall 0.15, F1 score 0.24; Random forest’s accuracy was 0.84, precision 0.56, recall 0.16, F1 score 0.25; Naive bayes's accuracy was 0.84, precision 0.52, recall 0.19, F1 score 0.28. In this study, ML algorithms were used for DM risk estimation. According to the experimental results, when the data set is divided into random training (80%) and testing (20%), the accuracy values of random forest and decision tree algorithms are very close to each other (RF: 0.848, DT: 0.847). Therefore, it can be said that the two best algorithms for diabetes risk estimation are random forest and decision tree.

References

  • [1] World Health Organisation (WHO). 2016. “Global report on diabetes” https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf Accessed: 17 December 2021.
  • [2] Türkiye Endokrinoloji ve Metabolizma Derneği (TEMD). Diabetes Mellitus Çalışma ve Eğitim Grubu. “Diabetes Mellitus ve Komplikasyonlarının Tanı, Tedavi ve İzlem Klavuzu 2019”. https://temd.org.tr/admin/uploads/tbl_kilavuz/20190819095854-2019tbl_kilavuzb48da47363.pdf Accessed: 27 December 2021.
  • [3] Tekir, O., Çevik, C., Kaymak, G. Ö., & Kaya, A. (2021). The Effect of Diabetes Symptoms on Quality of Life in Individuals with Type 2 Diabetes. Acta Endocrinologica (Bucharest), 17(2), 186.
  • [4] TÜRKDİAB (2019). Diyabet Tanı ve Tedavi Rehberi. Güncellenmiş 9. Baskı. Armoni Nüans Baskı Sanatları A.Ş. İstanbul, s. 16.
  • [5] World Health Organization (2019). Classification of Diabetes Mellitus 2019. ISBN: 9789241515702.
  • [6] Guo, Y., Zhao, J., Wang, H., Liu, S., Huang, T., & Chang, G. (2020). Metabolic disorder-related hypertension. In Secondary hypertension (pp. 507-545). Springer, Singapore.
  • [7] Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., ... & IDF Diabetes Atlas Committee. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice, 157, 107843.
  • [8] International Diabetes Federation (IDF). “IDF Diabetes Atlas 10th Edition 2021”. https://diabetesatlas.org/idfawp/resource-files/2021/07/IDF_Atlas_10th_Edition_2021.pdf Son erişim tarihi: 27 Aralık 2021.
  • [9] Ulusal Diyabet Konsensus Grubu. “TÜRKDİAB Diyabet Tanı ve Tedavi Rehberi 2019”. https://www.turkdiab.org/admin/PICS/files/Diyabet_Tani_ve_Tedavi_Rehberi_2019.pdf Accessed: 27 Aralık 2021.
  • [10] World Health Organization (2016). Global Report on Diabetes, https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf;jsessionid=1C047E5A6F657E8A51DB41D7512B089E?sequence=1 Accessed 24 May 2022.
  • [11] Diabetes Prevention Program Research Group. (2009). 10-year follow-up of diabetes incidence and weight loss in the Diabetes Prevention Program Outcomes Study. The Lancet, 374(9702), 1677-1686.
  • [12] Özsezer Kaymak, G., & Tekir Ö. (2020). Diyabet Bakımında Yapay Zeka Kullanımı. Eds. (B. Tunçsiper, F. Taşpınar, Ö. Erkin Geyiktepe). Sağlık Bilimlerinde Multisipliner Yaklaşımlar 2. P.393-410.
  • [13] Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes care, 26(3), 725-731.
  • [14] Chen, L., Magliano, D. J., Balkau, B., Colagiuri, S., Zimmet, P. Z., Tonkin, A. M., ... & Shaw, J. E. (2010). AUSDRISK: an Australian Type 2 Diabetes Risk Assessment Tool based on demographic, lifestyle and simple anthropometric measures. Medical Journal of Australia, 192(4), 197-202.
  • [15] Balkau, B., Lange, C., Fezeu, L., Tichet, J., de Lauzon-Guillain, B., Czernichow, S., ... & Eschwège, E. (2008). Predicting diabetes: clinical, biological, and genetic approaches: data from the Epidemiological Study on the Insulin Resistance Syndrome (DESIR). Diabetes care, 31(10), 2056-2061.
  • [16] Rosella, L. C., Manuel, D. G., Burchill, C., & Stukel, T. A. (2011). A population-based risk algorithm for the development of diabetes: development and validation of the Diabetes Population Risk Tool (DPoRT). Journal of Epidemiology & Community Health, 65(7), 613-620.
  • [17] Hippisley-Cox, J., Coupland, C., Robson, J., Sheikh, A., & Brindle, P. (2009). Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore. Bmj, 338.
  • [18] Ergün, Ö. N., & Ilhan, H. O. (2021). Early Stage Diabetes Prediction Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (29), 52-57.
  • [19] Bilgin, G. (2021). Makine öğrenmesi algoritmaları kullanarak erken dönemde diyabet hastalığı riskinin araştırılması. Journal of Intelligent Systems: Theory and Applications, 4(1), 55-64.
  • [20] Özkan, Y., Yürekli, B. S., & Suner, A. Diyabet tanısının tahminlenmesinde denetimli makine öğrenme algoritmalarının performans karşılaştırması. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12(1), 211-226.
  • [21] Cihan, P., & Coşkun, H. Diyabet Tahmini için Makine Öğrenmesi Modellerinin Performans Karşılaştırılması. 2021 29th Signal Processing and Communications Applications Conference (SIU).
  • [22] Akyol, K., & Karaci, A. Diyabet Hastalığının Erken Aşamada Tahmin Edilmesi İçin Makine Öğrenme Algoritmalarının Performanslarının Karşılaştırılması. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 9(6), 123-134.
  • [23] Harman, G. (2021). Destek Vektör Makineleri ve Naive Bayes Sınıflandırma Algoritmalarını Kullanarak Diabetes Mellitus Tahmini. Avrupa Bilim ve Teknoloji Dergisi, (32), 7-13.
  • [24] İlyas, Ö. (2020). Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini. Mühendislik Bilimleri ve Araştırmaları Dergisi, 2(2), 50-57.
  • [25] Sousa Lima, W., Souto, E., El-Khatib, K., Jalali, R., & Gama, J. (2019). Human activity recognition using inertial sensors in a smartphone: An overview. Sensors, 19(14), 3213.
  • [26] Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • [27] Kramer, O. (2013). K-nearest neighbors. In Dimensionality reduction with unsupervised nearest neighbors (pp. 13-23). Springer, Berlin, Heidelberg.
  • [28] Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia medica, 24(1), 12-18.
  • [29] Priyam, A., Abhijeeta, G. R., Rathee, A., & Srivastava, S. (2013). Comparative analysis of decision tree classification algorithms. International Journal of current engineering and technology, 3(2), 334-337.
  • [30] Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.
  • [31] Webb, G. I., Keogh, E., & Miikkulainen, R. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714.
  • [32] Baratloo, A., Hosseini, M., Negida, A., & El Ashal, G. (2015). Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Archives of Academic Emergency Medicine (Emergency), 3(2), 48-49.
  • [33] Koenig, I. R., Fuchs, O., Hansen, G., von Mutius, E., & Kopp, M. V. (2017). What is precision medicine?. European respiratory journal, 50(4).
  • [34] Torgo, L., & Ribeiro, R. (2009, October). Precision and recall for regression. In International Conference on Discovery Science (pp. 332-346). Springer, Berlin, Heidelberg.
  • [35] Lipton, Z. C., Elkan, C., & Narayanaswamy, B. (2014). Thresholding classifiers to maximize F1 score. arXiv preprint arXiv:1402.1892.
  • [36] Visa, S., Ramsay, B., Ralescu, A. L., & Van Der Knaap, E. (2011). Confusion matrix-based feature selection. MAICS, 710, 120-127.
There are 36 citations in total.

Details

Primary Language English
Subjects Clinical Sciences
Journal Section Research Articles
Authors

Gözde Özsezer 0000-0003-4352-1124

Gülengül Mermer 0000-0002-0566-5656

Publication Date October 1, 2022
Published in Issue Year 2022 Volume: 2 Issue: 2

Cite

APA Özsezer, G., & Mermer, G. (2022). Diabetes Risk Prediction with Machine Learning Models. Artificial Intelligence Theory and Applications, 2(2), 1-9.