Optimized machine learning based predictive diagnosis approach for diabetes mellitus

Erkan Akkur; Fuat Türk

doi:10.47582/jompac.1307319

Araştırma Makalesi

BibTex

RIS

Kaynak Göster

Optimized machine learning based predictive diagnosis approach for diabetes mellitus

Yıl 2023, Cilt: 4 Sayı: 4, 270 - 276, 30.08.2023

Erkan Akkur , Fuat Türk

https://doi.org/10.47582/jompac.1307319

Öz

Aims: Diabetes mellitus is a metabolic disease caused by elevated blood sugar. If this disease is not diagnosed on time, it has the potential to pose a risk to other organs and tissues. Machine learning algorithms have started to preferred day by day in the detection of this disease, as in many other diseases. This study suggests a diabetes prediction approach incorporating optimized machine learning (ML) algorithms.
Methods: The framework presented in this study starts with the application of different data pre-processing processes. Random forest (RF), support vector machine (SVM), K-nearest neighbor (K-NN) and decision tree (DT) algorithms are used for classification. Grid search is utilized for hyperparameter optimization of algorithms. Different performance evaluation measures are used to find the algorithm that best predicts diabetes. PIMA Indian dataset (PID) is chosen for testing the experiments. In addition, it is investigated to what extent the attributes in the data set affect the result using Shapley additive explanations (SHAP) analysis.
Results: As a result of the experiments, the RF algorithm achieved the highest success rate with 89.06%, 84.33%, 84.33%, 84.33% and 0.88% accuracy, precision, sensitivity, F1-score and AUC scores. As a result of the SHAP analysis, it is found that the “Insulin”, “Age” and “Glucose” attributes contributed the most to the prediction model in identifying patients with diabetes.
Conclusion: The hyperparameter optimized RF approach proposed in the framework of the study provided a good result in the prediction and diagnosis of diabetes mellitus when compared with similar studies in the literature. As a result, an expert system can be designed to detect diabetes early in real time using the proposed method.

Anahtar Kelimeler

Machine learning, diabetes mellitus, data preprocessing, grid search, random forest

Kaynakça

American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2011;37 (Suppl_1):62-S69.
Priya G, Kalra S, Dasgupta A, Grewal E, Diabetes insipidus: a pragmatic approach to management. Cureus. 2021;13(1).
Prabhakar PK, Pathophysiology of secondary complications of diabetes mellitus. Pathophysiology. 2016;9(1):32-36.
Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119.
Sönmez A, Ozdoğan O, Arıcı M, et al. Diyabette kardiyovasküler ve renal komplikasyonların önlenmesi, tanısı ve tedavisi için Endokrinoloji Kardiyoloji Nefroloji (ENKARNE) Uzlaşı Raporu. Turk J Endocrinol Metab. 2021;25(4):392-411.
Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat. Med. 2022;28(1):31-38.
Ghaffar Nia N, Kaplanoglu E, Nasab A. Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discover Artificial Intelligence. 2023;3(1):5.
Ali YA, Awwad EM, Al-Razgan M, Maarouf A, Hyperparameter search for machine learning algorithms for optimizing the computational complexity. Processes. 2023;11(2):349.
Birjais R, Mourya AK, Chauhan R, Kaur H, Prediction and diagnosis of future diabetes risk: A machine learning approach. SN Appl Sci. 2019;1:1–8.
Tigga, NP, Garg S. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci. 2020;167:706-716.
Singh, N, Singh P. Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern Biomed Eng. 2020;40(1):1-22.
Lyngdoh AC, Choudhury NA, Moulik S. Diabetes disease prediction using machine learning algorithms. 2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Langkawi Island, Malaysia. 2021:517-521.
Kumari S, Kumar D, Mittal M, An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cog Comp in Eng. 2021;2:40-46
Chang V, Ganatra MA, Hall K, Golightly L, Xu QA. An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators. Healthcare Analytics. 2022;2:100118.
Yakut Ö. Diabetes prediction using colab notebook-based machine learning methods. IJCESEN. 2023;9(1):36-41.
Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic, Players, Agents and Agendas; IOS Press: Amsterdam, The Netherlands. 2016;pp. 87–90.
The Python Library Reference, Release 3.8.8, Python Software Foundation. Available online: https://www.python.org/downloads/release/python-388/ (accessed on 10 May 2023).
Kumar VH. Python libraries, development frameworks and algorithms for machine learning applications. IJERT. 2018;7(4):2278-0181.
Pima Indians Diabetes Database | Kaggle, https://www.kaggle.com/datasets/uciml/pima-indiansdiabetes-database/ Accessed 09 May. 2023.
Joshi, AP, Patel BV, Data preprocessing: The techniques for preparing clean and quality data for data analytics process. Orient. J Comput Sci Technol. 2021;13(0203):78-81.
Ahsan MM, Mahmud MP, Saha PK, Gupta KD, Siddique Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies. 2021;9(3):52.
Venkatesh B, Anuradha J, A review of feature selection and its methods. Cybern Inform Tech (CIT). 2019;19(1):3-26.
Jamaluddin NSA, Kadir SA, Abdullah A, Alias SN, Learning strategy and higher order thinking skills of students in accounting studies:Correlation and regression analysis. Univers J Educ. 2020;8(3C):85-90.
Prusty S, Patnaik S, Dash SK. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front Nanosci. 2022;4:972421.
Ibrahim I, Abdulazeez A, The role of machine learning algorithms for diagnosing diseases. J App Sci Techol Trends. 2021;2(01):10-19.
Belete DM, Huchaiah MD, Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int J Comput Appl. 2022;44(9):875-886.
Nohara Y, Matsumoto K, Soejima H, Nakashima N, Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed. 2022;214, 106584.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830.

Diabetes mellitus için optimize edilmiş makine öğrenimi tabanlı öngörücü tanı yaklaşımı

Yıl 2023, Cilt: 4 Sayı: 4, 270 - 276, 30.08.2023

Erkan Akkur , Fuat Türk

https://doi.org/10.47582/jompac.1307319

Öz

Amaç: Diabetes mellitus, kan şekeri yüksekliğinin neden olduğu metabolik bir hastalıktır. Bu hastalık zamanında teşhis edilmezse diğer organ ve dokular için risk oluşturma potansiyeline sahiptir. Diğer birçok hastalıkta olduğu gibi bu hastalığın tespitinde de makine öğrenimi algoritmaları gün geçtikçe tercih edilmeye başlandı. Bu çalışma, optimize edilmiş makine öğrenimi (ML) algoritmalarını içeren bir diyabet tahmin yaklaşımı önermektedir.
Gereç ve Yöntem: Bu çalışmada sunulan çerçeve, farklı veri ön işleme süreçlerinin uygulanmasıyla başlamaktadır. Sınıflandırma için rastgele orman (RF), destek vektör makinesi (SVM), K-en yakın komşu (K-NN) ve karar ağacı (DT) algoritmaları kullanılmaktadır. Grid arama, algoritmaların hiperparametre optimizasyonu için kullanılır. Diyabeti en iyi tahmin eden algoritmayı bulmak için farklı performans değerlendirme ölçütleri kullanılır. Deneylerin test edilmesi için PIMA Indian veri seti (PID) seçilmiştir. Ayrıca Shapley toplamsal açıklamalar (SHAP) analizi kullanılarak veri setindeki özniteliklerin sonucu ne ölçüde etkilediği araştırılmıştır.
Bulgular: Yapılan deneyler sonucunda RF algoritması %89,06, %84,33, %84,33, %84,33 ve %0,88 doğruluk, kesinlik, hassasiyet, F1-skoru ve AUC puanları ile en yüksek başarı oranına ulaşmıştır. SHAP analizi sonucunda diyabetli hastaların belirlenmesinde tahmin modeline en çok "İnsülin", "Yaş" ve "Glikoz" özniteliklerinin katkı sağladığı bulunmuştur.
Sonuç: Çalışma çerçevesinde önerilen hiperparametre optimize edilmiş RF yaklaşımı, literatürdeki benzer çalışmalarla karşılaştırıldığında diabetes mellitusun öngörü ve tanısında iyi bir sonuç sağlamıştır. Sonuç olarak, önerilen yöntem kullanılarak diyabetin gerçek zamanlı olarak erken saptanması için bir uzman sistem tasarlanabilir.

Anahtar Kelimeler

Machine learning, diabetes mellitus, data preprocessing, grid search, random forest

Kaynakça

American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2011;37 (Suppl_1):62-S69.
Priya G, Kalra S, Dasgupta A, Grewal E, Diabetes insipidus: a pragmatic approach to management. Cureus. 2021;13(1).
Prabhakar PK, Pathophysiology of secondary complications of diabetes mellitus. Pathophysiology. 2016;9(1):32-36.
Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119.
Sönmez A, Ozdoğan O, Arıcı M, et al. Diyabette kardiyovasküler ve renal komplikasyonların önlenmesi, tanısı ve tedavisi için Endokrinoloji Kardiyoloji Nefroloji (ENKARNE) Uzlaşı Raporu. Turk J Endocrinol Metab. 2021;25(4):392-411.
Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat. Med. 2022;28(1):31-38.
Ghaffar Nia N, Kaplanoglu E, Nasab A. Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discover Artificial Intelligence. 2023;3(1):5.
Ali YA, Awwad EM, Al-Razgan M, Maarouf A, Hyperparameter search for machine learning algorithms for optimizing the computational complexity. Processes. 2023;11(2):349.
Birjais R, Mourya AK, Chauhan R, Kaur H, Prediction and diagnosis of future diabetes risk: A machine learning approach. SN Appl Sci. 2019;1:1–8.
Tigga, NP, Garg S. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci. 2020;167:706-716.
Singh, N, Singh P. Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern Biomed Eng. 2020;40(1):1-22.
Lyngdoh AC, Choudhury NA, Moulik S. Diabetes disease prediction using machine learning algorithms. 2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Langkawi Island, Malaysia. 2021:517-521.
Kumari S, Kumar D, Mittal M, An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cog Comp in Eng. 2021;2:40-46
Chang V, Ganatra MA, Hall K, Golightly L, Xu QA. An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators. Healthcare Analytics. 2022;2:100118.
Yakut Ö. Diabetes prediction using colab notebook-based machine learning methods. IJCESEN. 2023;9(1):36-41.
Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic, Players, Agents and Agendas; IOS Press: Amsterdam, The Netherlands. 2016;pp. 87–90.
The Python Library Reference, Release 3.8.8, Python Software Foundation. Available online: https://www.python.org/downloads/release/python-388/ (accessed on 10 May 2023).
Kumar VH. Python libraries, development frameworks and algorithms for machine learning applications. IJERT. 2018;7(4):2278-0181.
Pima Indians Diabetes Database | Kaggle, https://www.kaggle.com/datasets/uciml/pima-indiansdiabetes-database/ Accessed 09 May. 2023.
Joshi, AP, Patel BV, Data preprocessing: The techniques for preparing clean and quality data for data analytics process. Orient. J Comput Sci Technol. 2021;13(0203):78-81.
Ahsan MM, Mahmud MP, Saha PK, Gupta KD, Siddique Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies. 2021;9(3):52.
Venkatesh B, Anuradha J, A review of feature selection and its methods. Cybern Inform Tech (CIT). 2019;19(1):3-26.
Jamaluddin NSA, Kadir SA, Abdullah A, Alias SN, Learning strategy and higher order thinking skills of students in accounting studies:Correlation and regression analysis. Univers J Educ. 2020;8(3C):85-90.
Prusty S, Patnaik S, Dash SK. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front Nanosci. 2022;4:972421.
Ibrahim I, Abdulazeez A, The role of machine learning algorithms for diagnosing diseases. J App Sci Techol Trends. 2021;2(01):10-19.
Belete DM, Huchaiah MD, Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int J Comput Appl. 2022;44(9):875-886.
Nohara Y, Matsumoto K, Soejima H, Nakashima N, Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed. 2022;214, 106584.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830.

Toplam 28 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Biyomedikal Tanı, Sağlık Kurumları Yönetimi
Bölüm	Research Articles [en] Araştırma Makaleleri [tr]
Yazarlar	Erkan Akkur 0000-0001-5573-5096 Fuat Türk 0000-0001-8159-360X
Yayımlanma Tarihi	30 Ağustos 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 4 Sayı: 4

Kaynak Göster

AMA	Akkur E, Türk F. Optimized machine learning based predictive diagnosis approach for diabetes mellitus. J Med Palliat Care / JOMPAC / Jompac. Ağustos 2023;4(4):270-276. doi:10.47582/jompac.1307319

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

images?q=tbn:ANd9GcRrI_RWgGRe7JRpz3PAnkt2YEFD2l6WEmgHMzuM2w9b&s

images?q=tbn:ANd9GcRrI_RWgGRe7JRpz3PAnkt2YEFD2l6WEmgHMzuM2w9b&s

Dergimiz; TR-Dizin ULAKBİM, ICI World of Journal's, Index Copernicus, Directory of Research Journals Indexing (DRJI), General Impact Factor, Google Scholar, Researchgate, WorldCat (OCLC), CrossRef (DOI), ROAD, ASOS İndeks, Türk Medline İndeks, Eurasian Scientific Journal Index (ESJI) ve Türkiye Atıf Dizini'nde indekslenmektedir.

EBSCO, DOAJ, OAJI, ProQuest dizinlerine müracaat yapılmış olup, değerlendirme aşamasındadır.

Makaleler "Çift-Kör Hakem Değerlendirmesi”nden geçmektedir.

Üniversitelerarası Kurul (ÜAK) Eşdeğerliği: Ulakbim TR Dizin'de olan dergilerde yayımlanan makale [10 PUAN] ve 1a, b, c hariç uluslararası indekslerde (1d) olan dergilerde yayımlanan makale [5 PUAN].

Note: Our journal is not WOS indexed and therefore is not classified as Q.

You can download Council of Higher Education (CoHG) [Yüksek Öğretim Kurumu (YÖK)] Criteria) decisions about predatory/questionable journals and the author's clarification text and journal charge policy from your browser. About predatory/questionable journals and journal charge policy

Not: Dergimiz WOS indeksli değildir ve bu nedenle Q sınıflamasına dahil değildir.
Yağmacı/şüpheli dergilerle ilgili Yüksek Öğretim Kurumu (YÖK) kararları ve yazar açıklama metni ile dergi ücret politikası: Yağmacı/Şaibeli Dergiler ve Dergi Ücret Politikası