Research Article
BibTex RIS Cite
Year 2024, Issue: 006, 1 - 11, 30.04.2024

Abstract

References

  • [1] A. D. Deshpande, M. Harris-Hayes, and M. Schootman, “Epidemiology of diabetes and diabetes-related complications,” Phys. Ther., vol. 88, no. 11, pp. 1254-1264, Nov. 2008, doi: https://doi.org/10.2522/ptj.20080020.
  • [2] H. D.McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen, and P. Damm, “Gestational diabetes mellitus,” Natur. Rev. Dis. Prim., vol. 5, no. 1, pp. 47, Jul. 2019, doi: https://doi.org/10.1038/s41572-019-0098-8.
  • [3] İ. Akgül, Ö. Çağrı Yavuz, and U. Yavuz, “Deep Learning Based Models for Detection of Diabetic Retinopathy,” Tehničk glasnik, vol. 17, no. 4, pp. 581-587, Dec. 2023, doi: https://doi.org/10.31803/tg-20220905123827.
  • [4] N. Yuvaraj and K.R. SriPreethaa, “Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster,” Clust. Comp., vol. 22, no. 1, pp. 1-9, Jan. 2019, doi: https://doi.org/10.1007/s10586-017-1532-x.
  • [5] Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Peer reviewed: building risk prediction models for type 2 diabetes using machine learning techniques,” Prev. Chro. Dis., vol. 16, Sep. 2019, doi: 10.5888/pcd16.190109.
  • [6] S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Int. of Things, 2018, pp. 291-295, doi: 10.1109/WF-IoT.2018.8355130.
  • [7] A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A decision support system for diabetes prediction using machine learning and deep learning techniques,” in 2019 1st Inter. Inform. and Soft. Eng. Conf., 2019, pp. 1-4, doi: 10.1109/UBMYK48245.2019.8965556.
  • [8] Diabetes Health Indicators Dataset, https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset (accessed December 12, 2023).
  • [9] M. Crowther, W. Lim, and M. A. Crowther, “Systematic review and meta-analysis methodology,” The Jour. of the Amer. Soc. of Hemat., vol. 116, no. 17, pp. 3140-3146, Oct. 2010, doi: https://doi.org/10.1182/blood-2010-05-280883.
  • [10] S. Pandey, “Principles of correlation and regression analysis,” Jour. of the prac. of cardio. Sci., vol. 6, no. 1, pp. 7-11, Apr. 2020, doi: 10.4103/jpcs.jpcs_2_20.
  • [11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Jour. of artif. Intel. Res., vol. 16, pp. 321-357, Jun. 2002, doi: https://doi.org/10.1613/jair.953.
  • [12] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front. of Comp. Sci., vol. 14, pp. 241-258, Apr. 2020, doi: https://doi.org/10.1007/s11704-019-8208-z.
  • [13] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123-140, Aug. 1996, doi: https://doi.org/10.1007/BF00058655.
  • [14] R. E. Schapire, “A brief introduction to boosting,” Ijcai, Vol. 99, No. 999, pp. 1401-1406, 1999.
  • [15] S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Infor. Fus., vol. 64, pp. 205-237, Dec. 2020, doi: https://doi.org/10.1016/j.inffus.2020.07.007.
  • [16] D. R. Cox, “The regression analysis of binary sequences,” Jour. of the Roy. Stat. Soc. Ser. B: Stat. Meth., vol. 20, no. 2, pp. 215-232, Jul. 1958, doi: https://doi.org/10.1111/j.2517-6161.1958.tb00292.x.
  • [17] S. Jayachitra, and A. Prasanth, “Multi-feature analysis for automated brain stroke classification using weighted Gaussian naïve Bayes classifier,” Jour. of Circ. Sys. and Comp., vol. 30, no. 10, pp. 2150178, 2021, doi: https://doi.org/10.1142/S0218126621501784.
  • [18] C. Kingsford and S. L. Salzberg, “What are decision trees?,” Nat. Biotech., vol. 26, no. 9, pp. 1011-1013, Sep. 2008, doi: https://doi.org/10.1038/nbt0908-1011.
  • [19] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbor (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, pp. 6256, Apr. 2022, doi: https://doi.org/10.1038/s41598-022-10358-x.
  • [20] S. Priyadarshinee and M. Panda, “Cardiac disease prediction using smote and machine learning classifiers,” Jour. of Pharma. Neg. Res., vol. 13, no.8, pp. 856-862, Nov. 2022, doi: https://doi.org/10.47750/pnr.2022.13.S08.108.
  • [21] L. Breiman, “Random forests,” Mac. Learn., vol. 45, pp. 5-32, Oct. 2001, doi: https://doi.org/10.1023/A:1010933404324.
  • [22] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. in Neurorob., vol. 7, pp. 21, Dec. 2013, doi: https://doi.org/10.3389/fnbot.2013.00021.
  • [23] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, and T. Zhou, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [24] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, and T. Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. in Neur. Info. Process. Sys., 30, 2017.
  • [25] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. in Neur. Info. Process. Sys., 31, 2018.

A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction

Year 2024, Issue: 006, 1 - 11, 30.04.2024

Abstract

This research investigates the use of machine learning algorithms for early detection of diabetes. Due to its global prevalence and significant impact on health, timely identification of diabetes is crucial for effective treatment. In this study, machine learning models including Gradient Boosting Machines, Extreme Gradient Boosting, Light gradient-boosting machine, Categorical Boosting, k-Nearest Neighbors, Random Forest, Ridge Classifier, Logistic Regression, Gaussian Naive Bayes, and Decision Tree are utilized to assess their capabilities in diabetes diagnosis. The primary aim is to train these models to distinguish between individuals with diabetes and those without, using relevant features from the dataset. Since the classes in the dataset are imbalanced, the SMOTE technique is applied to improve model performance. Categorical Boosting achieved the highest accuracy rate of 90.05%, making it the most successful model. By systematically evaluating the performance of these prominent machine learning models, valuable insights can be gathered regarding their ability to recognize complex patterns indicative of diabetes. As a result, healthcare professionals and researchers can leverage this newfound understanding to develop more accurate and effective diagnostic tools, enabling early intervention and subsequently improving the overall quality of life for individuals affected by diabetes.

References

  • [1] A. D. Deshpande, M. Harris-Hayes, and M. Schootman, “Epidemiology of diabetes and diabetes-related complications,” Phys. Ther., vol. 88, no. 11, pp. 1254-1264, Nov. 2008, doi: https://doi.org/10.2522/ptj.20080020.
  • [2] H. D.McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen, and P. Damm, “Gestational diabetes mellitus,” Natur. Rev. Dis. Prim., vol. 5, no. 1, pp. 47, Jul. 2019, doi: https://doi.org/10.1038/s41572-019-0098-8.
  • [3] İ. Akgül, Ö. Çağrı Yavuz, and U. Yavuz, “Deep Learning Based Models for Detection of Diabetic Retinopathy,” Tehničk glasnik, vol. 17, no. 4, pp. 581-587, Dec. 2023, doi: https://doi.org/10.31803/tg-20220905123827.
  • [4] N. Yuvaraj and K.R. SriPreethaa, “Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster,” Clust. Comp., vol. 22, no. 1, pp. 1-9, Jan. 2019, doi: https://doi.org/10.1007/s10586-017-1532-x.
  • [5] Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Peer reviewed: building risk prediction models for type 2 diabetes using machine learning techniques,” Prev. Chro. Dis., vol. 16, Sep. 2019, doi: 10.5888/pcd16.190109.
  • [6] S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Int. of Things, 2018, pp. 291-295, doi: 10.1109/WF-IoT.2018.8355130.
  • [7] A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A decision support system for diabetes prediction using machine learning and deep learning techniques,” in 2019 1st Inter. Inform. and Soft. Eng. Conf., 2019, pp. 1-4, doi: 10.1109/UBMYK48245.2019.8965556.
  • [8] Diabetes Health Indicators Dataset, https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset (accessed December 12, 2023).
  • [9] M. Crowther, W. Lim, and M. A. Crowther, “Systematic review and meta-analysis methodology,” The Jour. of the Amer. Soc. of Hemat., vol. 116, no. 17, pp. 3140-3146, Oct. 2010, doi: https://doi.org/10.1182/blood-2010-05-280883.
  • [10] S. Pandey, “Principles of correlation and regression analysis,” Jour. of the prac. of cardio. Sci., vol. 6, no. 1, pp. 7-11, Apr. 2020, doi: 10.4103/jpcs.jpcs_2_20.
  • [11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Jour. of artif. Intel. Res., vol. 16, pp. 321-357, Jun. 2002, doi: https://doi.org/10.1613/jair.953.
  • [12] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front. of Comp. Sci., vol. 14, pp. 241-258, Apr. 2020, doi: https://doi.org/10.1007/s11704-019-8208-z.
  • [13] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123-140, Aug. 1996, doi: https://doi.org/10.1007/BF00058655.
  • [14] R. E. Schapire, “A brief introduction to boosting,” Ijcai, Vol. 99, No. 999, pp. 1401-1406, 1999.
  • [15] S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Infor. Fus., vol. 64, pp. 205-237, Dec. 2020, doi: https://doi.org/10.1016/j.inffus.2020.07.007.
  • [16] D. R. Cox, “The regression analysis of binary sequences,” Jour. of the Roy. Stat. Soc. Ser. B: Stat. Meth., vol. 20, no. 2, pp. 215-232, Jul. 1958, doi: https://doi.org/10.1111/j.2517-6161.1958.tb00292.x.
  • [17] S. Jayachitra, and A. Prasanth, “Multi-feature analysis for automated brain stroke classification using weighted Gaussian naïve Bayes classifier,” Jour. of Circ. Sys. and Comp., vol. 30, no. 10, pp. 2150178, 2021, doi: https://doi.org/10.1142/S0218126621501784.
  • [18] C. Kingsford and S. L. Salzberg, “What are decision trees?,” Nat. Biotech., vol. 26, no. 9, pp. 1011-1013, Sep. 2008, doi: https://doi.org/10.1038/nbt0908-1011.
  • [19] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbor (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, pp. 6256, Apr. 2022, doi: https://doi.org/10.1038/s41598-022-10358-x.
  • [20] S. Priyadarshinee and M. Panda, “Cardiac disease prediction using smote and machine learning classifiers,” Jour. of Pharma. Neg. Res., vol. 13, no.8, pp. 856-862, Nov. 2022, doi: https://doi.org/10.47750/pnr.2022.13.S08.108.
  • [21] L. Breiman, “Random forests,” Mac. Learn., vol. 45, pp. 5-32, Oct. 2001, doi: https://doi.org/10.1023/A:1010933404324.
  • [22] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. in Neurorob., vol. 7, pp. 21, Dec. 2013, doi: https://doi.org/10.3389/fnbot.2013.00021.
  • [23] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, and T. Zhou, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [24] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, and T. Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. in Neur. Info. Process. Sys., 30, 2017.
  • [25] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. in Neur. Info. Process. Sys., 31, 2018.
There are 25 citations in total.

Details

Primary Language English
Subjects Computer Software, Software Engineering (Other)
Journal Section Research Articles
Authors

Naciye Nur Arslan 0000-0002-3208-7986

Durmuş Özdemir 0000-0002-9543-4076

Publication Date April 30, 2024
Submission Date September 25, 2023
Published in Issue Year 2024 Issue: 006

Cite

APA Arslan, N. N., & Özdemir, D. (2024). A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. Journal of Scientific Reports-C(006), 1-11.
AMA Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. April 2024;(006):1-11.
Chicago Arslan, Naciye Nur, and Durmuş Özdemir. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C, no. 006 (April 2024): 1-11.
EndNote Arslan NN, Özdemir D (April 1, 2024) A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. Journal of Scientific Reports-C 006 1–11.
IEEE N. N. Arslan and D. Özdemir, “A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction”, JSR-C, no. 006, pp. 1–11, April 2024.
ISNAD Arslan, Naciye Nur - Özdemir, Durmuş. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C 006 (April 2024), 1-11.
JAMA Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. 2024;:1–11.
MLA Arslan, Naciye Nur and Durmuş Özdemir. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C, no. 006, 2024, pp. 1-11.
Vancouver Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. 2024(006):1-11.