Araştırma Makalesi
BibTex RIS Kaynak Göster
Yıl 2024, Sayı: 006, 1 - 11, 30.04.2024

Öz

Kaynakça

  • [1] A. D. Deshpande, M. Harris-Hayes, and M. Schootman, “Epidemiology of diabetes and diabetes-related complications,” Phys. Ther., vol. 88, no. 11, pp. 1254-1264, Nov. 2008, doi: https://doi.org/10.2522/ptj.20080020.
  • [2] H. D.McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen, and P. Damm, “Gestational diabetes mellitus,” Natur. Rev. Dis. Prim., vol. 5, no. 1, pp. 47, Jul. 2019, doi: https://doi.org/10.1038/s41572-019-0098-8.
  • [3] İ. Akgül, Ö. Çağrı Yavuz, and U. Yavuz, “Deep Learning Based Models for Detection of Diabetic Retinopathy,” Tehničk glasnik, vol. 17, no. 4, pp. 581-587, Dec. 2023, doi: https://doi.org/10.31803/tg-20220905123827.
  • [4] N. Yuvaraj and K.R. SriPreethaa, “Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster,” Clust. Comp., vol. 22, no. 1, pp. 1-9, Jan. 2019, doi: https://doi.org/10.1007/s10586-017-1532-x.
  • [5] Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Peer reviewed: building risk prediction models for type 2 diabetes using machine learning techniques,” Prev. Chro. Dis., vol. 16, Sep. 2019, doi: 10.5888/pcd16.190109.
  • [6] S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Int. of Things, 2018, pp. 291-295, doi: 10.1109/WF-IoT.2018.8355130.
  • [7] A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A decision support system for diabetes prediction using machine learning and deep learning techniques,” in 2019 1st Inter. Inform. and Soft. Eng. Conf., 2019, pp. 1-4, doi: 10.1109/UBMYK48245.2019.8965556.
  • [8] Diabetes Health Indicators Dataset, https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset (accessed December 12, 2023).
  • [9] M. Crowther, W. Lim, and M. A. Crowther, “Systematic review and meta-analysis methodology,” The Jour. of the Amer. Soc. of Hemat., vol. 116, no. 17, pp. 3140-3146, Oct. 2010, doi: https://doi.org/10.1182/blood-2010-05-280883.
  • [10] S. Pandey, “Principles of correlation and regression analysis,” Jour. of the prac. of cardio. Sci., vol. 6, no. 1, pp. 7-11, Apr. 2020, doi: 10.4103/jpcs.jpcs_2_20.
  • [11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Jour. of artif. Intel. Res., vol. 16, pp. 321-357, Jun. 2002, doi: https://doi.org/10.1613/jair.953.
  • [12] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front. of Comp. Sci., vol. 14, pp. 241-258, Apr. 2020, doi: https://doi.org/10.1007/s11704-019-8208-z.
  • [13] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123-140, Aug. 1996, doi: https://doi.org/10.1007/BF00058655.
  • [14] R. E. Schapire, “A brief introduction to boosting,” Ijcai, Vol. 99, No. 999, pp. 1401-1406, 1999.
  • [15] S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Infor. Fus., vol. 64, pp. 205-237, Dec. 2020, doi: https://doi.org/10.1016/j.inffus.2020.07.007.
  • [16] D. R. Cox, “The regression analysis of binary sequences,” Jour. of the Roy. Stat. Soc. Ser. B: Stat. Meth., vol. 20, no. 2, pp. 215-232, Jul. 1958, doi: https://doi.org/10.1111/j.2517-6161.1958.tb00292.x.
  • [17] S. Jayachitra, and A. Prasanth, “Multi-feature analysis for automated brain stroke classification using weighted Gaussian naïve Bayes classifier,” Jour. of Circ. Sys. and Comp., vol. 30, no. 10, pp. 2150178, 2021, doi: https://doi.org/10.1142/S0218126621501784.
  • [18] C. Kingsford and S. L. Salzberg, “What are decision trees?,” Nat. Biotech., vol. 26, no. 9, pp. 1011-1013, Sep. 2008, doi: https://doi.org/10.1038/nbt0908-1011.
  • [19] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbor (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, pp. 6256, Apr. 2022, doi: https://doi.org/10.1038/s41598-022-10358-x.
  • [20] S. Priyadarshinee and M. Panda, “Cardiac disease prediction using smote and machine learning classifiers,” Jour. of Pharma. Neg. Res., vol. 13, no.8, pp. 856-862, Nov. 2022, doi: https://doi.org/10.47750/pnr.2022.13.S08.108.
  • [21] L. Breiman, “Random forests,” Mac. Learn., vol. 45, pp. 5-32, Oct. 2001, doi: https://doi.org/10.1023/A:1010933404324.
  • [22] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. in Neurorob., vol. 7, pp. 21, Dec. 2013, doi: https://doi.org/10.3389/fnbot.2013.00021.
  • [23] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, and T. Zhou, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [24] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, and T. Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. in Neur. Info. Process. Sys., 30, 2017.
  • [25] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. in Neur. Info. Process. Sys., 31, 2018.

A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction

Yıl 2024, Sayı: 006, 1 - 11, 30.04.2024

Öz

This research investigates the use of machine learning algorithms for early detection of diabetes. Due to its global prevalence and significant impact on health, timely identification of diabetes is crucial for effective treatment. In this study, machine learning models including Gradient Boosting Machines, Extreme Gradient Boosting, Light gradient-boosting machine, Categorical Boosting, k-Nearest Neighbors, Random Forest, Ridge Classifier, Logistic Regression, Gaussian Naive Bayes, and Decision Tree are utilized to assess their capabilities in diabetes diagnosis. The primary aim is to train these models to distinguish between individuals with diabetes and those without, using relevant features from the dataset. Since the classes in the dataset are imbalanced, the SMOTE technique is applied to improve model performance. Categorical Boosting achieved the highest accuracy rate of 90.05%, making it the most successful model. By systematically evaluating the performance of these prominent machine learning models, valuable insights can be gathered regarding their ability to recognize complex patterns indicative of diabetes. As a result, healthcare professionals and researchers can leverage this newfound understanding to develop more accurate and effective diagnostic tools, enabling early intervention and subsequently improving the overall quality of life for individuals affected by diabetes.

Kaynakça

  • [1] A. D. Deshpande, M. Harris-Hayes, and M. Schootman, “Epidemiology of diabetes and diabetes-related complications,” Phys. Ther., vol. 88, no. 11, pp. 1254-1264, Nov. 2008, doi: https://doi.org/10.2522/ptj.20080020.
  • [2] H. D.McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen, and P. Damm, “Gestational diabetes mellitus,” Natur. Rev. Dis. Prim., vol. 5, no. 1, pp. 47, Jul. 2019, doi: https://doi.org/10.1038/s41572-019-0098-8.
  • [3] İ. Akgül, Ö. Çağrı Yavuz, and U. Yavuz, “Deep Learning Based Models for Detection of Diabetic Retinopathy,” Tehničk glasnik, vol. 17, no. 4, pp. 581-587, Dec. 2023, doi: https://doi.org/10.31803/tg-20220905123827.
  • [4] N. Yuvaraj and K.R. SriPreethaa, “Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster,” Clust. Comp., vol. 22, no. 1, pp. 1-9, Jan. 2019, doi: https://doi.org/10.1007/s10586-017-1532-x.
  • [5] Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Peer reviewed: building risk prediction models for type 2 diabetes using machine learning techniques,” Prev. Chro. Dis., vol. 16, Sep. 2019, doi: 10.5888/pcd16.190109.
  • [6] S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Int. of Things, 2018, pp. 291-295, doi: 10.1109/WF-IoT.2018.8355130.
  • [7] A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A decision support system for diabetes prediction using machine learning and deep learning techniques,” in 2019 1st Inter. Inform. and Soft. Eng. Conf., 2019, pp. 1-4, doi: 10.1109/UBMYK48245.2019.8965556.
  • [8] Diabetes Health Indicators Dataset, https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset (accessed December 12, 2023).
  • [9] M. Crowther, W. Lim, and M. A. Crowther, “Systematic review and meta-analysis methodology,” The Jour. of the Amer. Soc. of Hemat., vol. 116, no. 17, pp. 3140-3146, Oct. 2010, doi: https://doi.org/10.1182/blood-2010-05-280883.
  • [10] S. Pandey, “Principles of correlation and regression analysis,” Jour. of the prac. of cardio. Sci., vol. 6, no. 1, pp. 7-11, Apr. 2020, doi: 10.4103/jpcs.jpcs_2_20.
  • [11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Jour. of artif. Intel. Res., vol. 16, pp. 321-357, Jun. 2002, doi: https://doi.org/10.1613/jair.953.
  • [12] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front. of Comp. Sci., vol. 14, pp. 241-258, Apr. 2020, doi: https://doi.org/10.1007/s11704-019-8208-z.
  • [13] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123-140, Aug. 1996, doi: https://doi.org/10.1007/BF00058655.
  • [14] R. E. Schapire, “A brief introduction to boosting,” Ijcai, Vol. 99, No. 999, pp. 1401-1406, 1999.
  • [15] S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Infor. Fus., vol. 64, pp. 205-237, Dec. 2020, doi: https://doi.org/10.1016/j.inffus.2020.07.007.
  • [16] D. R. Cox, “The regression analysis of binary sequences,” Jour. of the Roy. Stat. Soc. Ser. B: Stat. Meth., vol. 20, no. 2, pp. 215-232, Jul. 1958, doi: https://doi.org/10.1111/j.2517-6161.1958.tb00292.x.
  • [17] S. Jayachitra, and A. Prasanth, “Multi-feature analysis for automated brain stroke classification using weighted Gaussian naïve Bayes classifier,” Jour. of Circ. Sys. and Comp., vol. 30, no. 10, pp. 2150178, 2021, doi: https://doi.org/10.1142/S0218126621501784.
  • [18] C. Kingsford and S. L. Salzberg, “What are decision trees?,” Nat. Biotech., vol. 26, no. 9, pp. 1011-1013, Sep. 2008, doi: https://doi.org/10.1038/nbt0908-1011.
  • [19] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbor (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, pp. 6256, Apr. 2022, doi: https://doi.org/10.1038/s41598-022-10358-x.
  • [20] S. Priyadarshinee and M. Panda, “Cardiac disease prediction using smote and machine learning classifiers,” Jour. of Pharma. Neg. Res., vol. 13, no.8, pp. 856-862, Nov. 2022, doi: https://doi.org/10.47750/pnr.2022.13.S08.108.
  • [21] L. Breiman, “Random forests,” Mac. Learn., vol. 45, pp. 5-32, Oct. 2001, doi: https://doi.org/10.1023/A:1010933404324.
  • [22] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. in Neurorob., vol. 7, pp. 21, Dec. 2013, doi: https://doi.org/10.3389/fnbot.2013.00021.
  • [23] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, and T. Zhou, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
  • [24] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, and T. Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. in Neur. Info. Process. Sys., 30, 2017.
  • [25] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. in Neur. Info. Process. Sys., 31, 2018.
Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgisayar Yazılımı, Yazılım Mühendisliği (Diğer)
Bölüm Araştırma Makaleleri
Yazarlar

Naciye Nur Arslan 0000-0002-3208-7986

Durmuş Özdemir 0000-0002-9543-4076

Yayımlanma Tarihi 30 Nisan 2024
Gönderilme Tarihi 25 Eylül 2023
Yayımlandığı Sayı Yıl 2024 Sayı: 006

Kaynak Göster

APA Arslan, N. N., & Özdemir, D. (2024). A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. Journal of Scientific Reports-C(006), 1-11.
AMA Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. Nisan 2024;(006):1-11.
Chicago Arslan, Naciye Nur, ve Durmuş Özdemir. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C, sy. 006 (Nisan 2024): 1-11.
EndNote Arslan NN, Özdemir D (01 Nisan 2024) A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. Journal of Scientific Reports-C 006 1–11.
IEEE N. N. Arslan ve D. Özdemir, “A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction”, JSR-C, sy. 006, ss. 1–11, Nisan 2024.
ISNAD Arslan, Naciye Nur - Özdemir, Durmuş. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C 006 (Nisan 2024), 1-11.
JAMA Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. 2024;:1–11.
MLA Arslan, Naciye Nur ve Durmuş Özdemir. “A Comparison of Traditional and State-of-the-Art Machine Learning Algorithms for Type 2 Diabetes Prediction”. Journal of Scientific Reports-C, sy. 006, 2024, ss. 1-11.
Vancouver Arslan NN, Özdemir D. A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction. JSR-C. 2024(006):1-11.