Research Article
BibTex RIS Cite

Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods

Year 2024, Volume: 8 Issue: 1, 22 - 32, 30.06.2024
https://doi.org/10.47897/bilmes.1447878

Abstract

Diabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively.

References

  • [1] B. Ö. Başer, M. Yangın, and E. S. Sarıdaş, "Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması," Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 25, no. 1, pp. 112-120, 2021.
  • [2] W. W. H. Organization. " “Diabetes.”." https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Feb. 12, 2024).
  • [3] H. Zhou et al., "A computer simulation model of diabetes progression, quality of life, and cost," Diabetes care, vol. 28, no. 12, pp. 2856-2863, 2005.
  • [4] U. Köse, "Zeki optimizasyon tabanlı destek vektör makineleri ile diyabet teşhisi," Politeknik Dergisi, vol. 22, no. 3, pp. 557-566, 2019.
  • [5] A. D. Khare. "“Diabetes Dataset.”." https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset/data (accessed Feb. 1, 2024).
  • [6] T. A. a. İ. M. Temel. "“Diagnosing Diabetes Streamlit Web Page.”." https://github.com/tubaaktas/DiabetesPred (accessed Feb. 1, 2024).
  • [7] G. Bonaccorso, "Machine learning algorithms Packt Publishing Ltd," ed: Packt Publishing Ltd, 2017.
  • [8] E. Dağdevir and M. Tokmakçı, "The Role of Feature Selection in Significant Information Extraction from EEG Signals," International Scientific and Vocational Studies Journal, vol. 5, no. 1, pp. 1-6, 2021.
  • [9] J. P. Mueller and L. Massaron, Machine learning for dummies. John Wiley & Sons, 2021.
  • [10] A. Saygılı, "Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers," International Scientific and Vocational Studies Journal pp. 48-56, 2018.
  • [11] A. Saygılı and S. Varlı, "Automated diagnosis of meniscus tears from MRI of the knee," International Scientific and Vocational Studies Journal, vol. 3, no. 2, pp. 92-104, 2019.
  • [12] S. Suthaharan and S. Suthaharan, "Support vector machine," Machine learning models and algorithms for big data classification: thinking with examples for effective learning, pp. 207-235, 2016.
  • [13] W. Y. Loh, "Classification and regression trees," Wiley interdisciplinary reviews: data mining and knowledge discovery, vol. 1, no. 1, pp. 14-23, 2011.
  • [14] G. Biau and E. Scornet, "A random forest guided tour," Test, vol. 25, pp. 197-227, 2016.
  • [15] A. Natekin and A. Knoll, "Gradient boosting machines, a tutorial," Frontiers in neurorobotics, vol. 7, p. 21, 2013.
  • [16] T. Chen et al., "Xgboost: extreme gradient boosting," R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [17] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, "Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM)," Diagnostics, vol. 11, no. 9, p. 1714, 2021.
  • [18] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical features," Advances in neural information processing systems, vol. 31, 2018.
  • [19] M. B. Er and İ. Işık, "LSTM tabanlı derin ağlar kullanılarak diyabet hastalığı tahmini," Türk Doğa ve Fen Dergisi, vol. 10, no. 1, pp. 68-74, 2021.
  • [20] G. Harman, "Destek vektör makineleri ve naive bayes sınıflandırma algoritmalarını kullanarak diabetes mellitus tahmini," Avrupa Bilim ve Teknoloji Dergisi, no. 32, pp. 7-13, 2021.
  • [21] F. Hassan and M. E. Shaheen, "Predicting diabetes from health-based streaming data using social media, machine learning and stream processing technologies," International Journal of Engineering Research and Technology, vol. 13, no. 8, pp. 1957-1967, 2020.

Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods

Year 2024, Volume: 8 Issue: 1, 22 - 32, 30.06.2024
https://doi.org/10.47897/bilmes.1447878

Abstract

Diabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively.

References

  • [1] B. Ö. Başer, M. Yangın, and E. S. Sarıdaş, "Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması," Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 25, no. 1, pp. 112-120, 2021.
  • [2] W. W. H. Organization. " “Diabetes.”." https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Feb. 12, 2024).
  • [3] H. Zhou et al., "A computer simulation model of diabetes progression, quality of life, and cost," Diabetes care, vol. 28, no. 12, pp. 2856-2863, 2005.
  • [4] U. Köse, "Zeki optimizasyon tabanlı destek vektör makineleri ile diyabet teşhisi," Politeknik Dergisi, vol. 22, no. 3, pp. 557-566, 2019.
  • [5] A. D. Khare. "“Diabetes Dataset.”." https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset/data (accessed Feb. 1, 2024).
  • [6] T. A. a. İ. M. Temel. "“Diagnosing Diabetes Streamlit Web Page.”." https://github.com/tubaaktas/DiabetesPred (accessed Feb. 1, 2024).
  • [7] G. Bonaccorso, "Machine learning algorithms Packt Publishing Ltd," ed: Packt Publishing Ltd, 2017.
  • [8] E. Dağdevir and M. Tokmakçı, "The Role of Feature Selection in Significant Information Extraction from EEG Signals," International Scientific and Vocational Studies Journal, vol. 5, no. 1, pp. 1-6, 2021.
  • [9] J. P. Mueller and L. Massaron, Machine learning for dummies. John Wiley & Sons, 2021.
  • [10] A. Saygılı, "Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers," International Scientific and Vocational Studies Journal pp. 48-56, 2018.
  • [11] A. Saygılı and S. Varlı, "Automated diagnosis of meniscus tears from MRI of the knee," International Scientific and Vocational Studies Journal, vol. 3, no. 2, pp. 92-104, 2019.
  • [12] S. Suthaharan and S. Suthaharan, "Support vector machine," Machine learning models and algorithms for big data classification: thinking with examples for effective learning, pp. 207-235, 2016.
  • [13] W. Y. Loh, "Classification and regression trees," Wiley interdisciplinary reviews: data mining and knowledge discovery, vol. 1, no. 1, pp. 14-23, 2011.
  • [14] G. Biau and E. Scornet, "A random forest guided tour," Test, vol. 25, pp. 197-227, 2016.
  • [15] A. Natekin and A. Knoll, "Gradient boosting machines, a tutorial," Frontiers in neurorobotics, vol. 7, p. 21, 2013.
  • [16] T. Chen et al., "Xgboost: extreme gradient boosting," R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [17] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, "Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM)," Diagnostics, vol. 11, no. 9, p. 1714, 2021.
  • [18] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical features," Advances in neural information processing systems, vol. 31, 2018.
  • [19] M. B. Er and İ. Işık, "LSTM tabanlı derin ağlar kullanılarak diyabet hastalığı tahmini," Türk Doğa ve Fen Dergisi, vol. 10, no. 1, pp. 68-74, 2021.
  • [20] G. Harman, "Destek vektör makineleri ve naive bayes sınıflandırma algoritmalarını kullanarak diabetes mellitus tahmini," Avrupa Bilim ve Teknoloji Dergisi, no. 32, pp. 7-13, 2021.
  • [21] F. Hassan and M. E. Shaheen, "Predicting diabetes from health-based streaming data using social media, machine learning and stream processing technologies," International Journal of Engineering Research and Technology, vol. 13, no. 8, pp. 1957-1967, 2020.
There are 21 citations in total.

Details

Primary Language English
Subjects Image Processing, Machine Learning (Other)
Journal Section Articles
Authors

Tuğba Aktaş This is me 0009-0005-0580-7502

İsmail Mert Temel This is me 0009-0008-7989-9747

Ahmet Saygılı 0000-0001-8625-4842

Publication Date June 30, 2024
Submission Date March 6, 2024
Acceptance Date May 5, 2024
Published in Issue Year 2024 Volume: 8 Issue: 1

Cite

APA Aktaş, T., Temel, İ. M., & Saygılı, A. (2024). Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods. International Scientific and Vocational Studies Journal, 8(1), 22-32. https://doi.org/10.47897/bilmes.1447878
AMA Aktaş T, Temel İM, Saygılı A. Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods. ISVOS. June 2024;8(1):22-32. doi:10.47897/bilmes.1447878
Chicago Aktaş, Tuğba, İsmail Mert Temel, and Ahmet Saygılı. “Comparative Analysis of Diabetes Diagnosis With Machine Learning Methods”. International Scientific and Vocational Studies Journal 8, no. 1 (June 2024): 22-32. https://doi.org/10.47897/bilmes.1447878.
EndNote Aktaş T, Temel İM, Saygılı A (June 1, 2024) Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods. International Scientific and Vocational Studies Journal 8 1 22–32.
IEEE T. Aktaş, İ. M. Temel, and A. Saygılı, “Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods”, ISVOS, vol. 8, no. 1, pp. 22–32, 2024, doi: 10.47897/bilmes.1447878.
ISNAD Aktaş, Tuğba et al. “Comparative Analysis of Diabetes Diagnosis With Machine Learning Methods”. International Scientific and Vocational Studies Journal 8/1 (June 2024), 22-32. https://doi.org/10.47897/bilmes.1447878.
JAMA Aktaş T, Temel İM, Saygılı A. Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods. ISVOS. 2024;8:22–32.
MLA Aktaş, Tuğba et al. “Comparative Analysis of Diabetes Diagnosis With Machine Learning Methods”. International Scientific and Vocational Studies Journal, vol. 8, no. 1, 2024, pp. 22-32, doi:10.47897/bilmes.1447878.
Vancouver Aktaş T, Temel İM, Saygılı A. Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods. ISVOS. 2024;8(1):22-3.


Creative Commons Lisansı


Creative Commons Atıf 4.0 It is licensed under an International License