Early Stage Diabetes Prediction Using Machine Learning Methods

Özge Nur Ergün; Hamza O.ilhan

doi:10.31590/ejosat.1015816

Araştırma Makalesi

Makine Öğrenimi Yöntemleriyle Erken Evre Diyabet Tahmini

Yıl 2021, Sayı: 29, 52 - 57, 01.12.2021

Özge Nur Ergün , Hamza O.ilhan

https://doi.org/10.31590/ejosat.1015816

Cited By: 3

https://izlik.org/JA68EP83WZ

Öz

Diyabet, tedavisi olmayan, yaygın ve ölümcül bir hastalıktır. Milyonlarca insan diyabet hastasıdır ve bu hastalık hayatlarını doğrudan etkilemektedir. Erken tedavi sayesinde diyabetin etkilerini azaltmak ve hastaların hayat standartlarını arttırmak mümkün olsa da çoğunlukla teşhis konulması yıllar sürebilen bir süreçtir. Diyabetin erken teşhisi için mevcut hastaların verileri kullanılarak makine öğrenmesi uygulanabilir. Bu sayede kan testi, glukoz ölçümü veya bu gibi herhangi bir tıbbi işleme gerek kalmadan diyabet teşhisi konulabilecek, diyabete yakalanma riski olan kişiler saptanabilecektir. Bu yaklaşımla diyabet teşhisinde kullanılabilecek bir makine öğrenmesi modeli geliştirmek çalışmanın konusunu oluşturmaktadır. Sunulan çalışmada 520 hastanın 16 farklı kategoride verisi işlenerek oluşturulan diyabet veri seti üzerinde sekiz makine öğrenmesi yaklaşımı uygulanmış, performans kıyaslaması 10 katlamalı çapraz doğrulama ile doğruluk, kesinlik, duyarlılık ve f skoru metrikleri ile ölçümlenmiştir. Ek olarak veri setinde yer alan özelliklerin diyabet teşhisindeki anlam önceliği araştırılmıştır. Geliştirilen modellerin hepsi belli düzeyde başarı oranını yakalamıştır. En düşük doğruluk oranı %88.82 sınıflandırma başarımı ile basit bir makine öğrenmesi tekniği olan Naive Bayes tekniği kullanılarak elde edilmiştir. En iyi sonuç 1 boyutlu evrişimsel sinir ağı ile elde edilmiştir. Evrişimsel sinir ağı kullanılarak elde edilen modelin doğruluğu %99.04, kesinliği %100, hassasiyet oranı %98.63 ve f skoru %99.31 olarak ölçülmüştür. Elde edilen sonuçlar, geliştirilen sınıflandırmanın diyabet teşhisinde bir soru seti olarak kullanılabileceğini göstermektedir.

Anahtar Kelimeler

Diyabet , Makine Öğrenmesi , Diyabet , Makine Öğrenmesi , K-en Yakın Komşu , Destek Vektör Makinesi , Naive Bayes , Karar Ağacı , Rastgele Orman , XGBoost , Yapay Sinir Ağları , Evrişimsel Sinir Ağları

Kaynakça

Ampadu, H. (2021, May 01). Random Forests Understanding. AI Pool. https://ai-pool.com/a/s/random-forests-understanding
Berkley, C. (2021, May 18). How Is Rapid Weight Loss Related to Diabetes. Verywell Health. https://www.verywellhealth.com/rapid-weight-loss-5101064
Bilgin, G. (2021). Makine Öğrenmesi Algoritmaları Kullanarak Erken Dönemde Diyabet Hastalığı Riskinin Araştırılması. Zeki Sistemler Teori ve Uygulamaları Dergisi, 4(1), 55-64. https://doi.org/10.46387/bjesr.790225
Cirino, E. (2019, July 6). What Causes Muscle Rigidity. Healthline. https://www.healthline.com/health/muscle-rigidity
Coelho, S. (2021, April 28). What Is Blurred Vision. Verywell Health. https://www.verywellhealth.com/blurred-vision-5114184
Draelos, R. (2019). Measuring Performance: The Confusion Matrix. Glass Box Medicine. https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
Harris, M. I., Klein, R., Welborn, T. A. & Knuiman, M. W. (1992). Onset of NIDDM occurs at least 4–7 yr before clinical diagnosis. Diabetes Care, 15(7), 815-819. DOI: 10.2337/diacare.15.7.815
Hawkins, D. M., Subhash, C. B. & Mills, D. (2003). Assessing Model Fit by Cross-Validation. Journal of Chemical Information and Computer Sciences, 43(2), 579–586. https://doi.org/10.1021/ci025626i
Hickman, R. J. (2020, July 28). What Is Polydipsia. Verywell Health. https://www.verywellhealth.com/polydipsia-4783881
IBM Cloud Education. (2020, July 15). What is machine learning. IBM. https://www.ibm.com/cloud/learn/machine-learning
Islam, M. M., Ferdousi, R., Rahman, S. & Bushra, H. Y. (2020). Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. Computer Vision and Machine Intelligence in Medical Image Analysis, 113-125. DOI:10.1007/978-981-13-8798-2_12
Jones, H. (2021, April 5). Causes of Polyphagia. Verywell Health. verywellhealth.com/polyphagia-5114624
Le, T. M., Vo, T. M., Pham, T. N. & Dao, S. V. T. (2020). A Novel Wrapper–Based Feature Selection for Early Diabetes Prediction Enhanced With a Metaheuristic. IEEE Access, 9, 7869-7884. DOI:10.1109/ACCESS.2020.3047942
Nahzat, S , Yağanoğlu, M . (2021). Diabetes Prediction Using Machine Learning Classification Algorithms . Avrupa Bilim ve Teknoloji Dergisi , Ejosat Özel Sayı 2021 (ARACONF) , 53-59 . DOI: 10.31590/ejosat.899716
Oladimeji, O. O., Oladimeji, A. & Oladimeji, O. (2021). Classification Models for Likelihood Prediction of Diabetes at Early Stage Using Feature Selection. Applied Computing and Informatics. https://doi.org/10.1108/ACI-01-2021-0022
Oleiwi, A. K., Shi, L., Tao, Y. & Wei, L. (2020). A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach. International Journal of Future Generation Communication and Networking, 13(3), 4151-4163.
Özer, İ. (2020). Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini. Mühendislik Bilimleri ve Araştırmaları Dergisi, 2(2), 50-57. https://doi.org/10.38016/jista.877292
Petrie, T. (2021, June 07). What Is Paresis. Verywell Health. https://www.verywellhealth.com/paresis-5184820
Ramachandran, A. & Chamukuttan, S. (2008). Early Diagnosis and Prevention of Diabetes in Developing Countries. Reviews in Endocrine and Metabolic Disorders, 9(3), 193-201. DOI: 10.1007/s11154-008-9079-z
Rish, I. (2001). An Empirical Study of the Naïve Bayes Classifier. IJCAI Workshop on Empirical Methods in AI, 3(22). 41-46.
Sadhu, A. & Jadli, A. (2021). Early-Stage Diabetes Risk Prediction: A Comparative Analysis of Classification Algorithms. International Advanced Research Journal in Science, Engineering and Technology (IARJSET), 8(2), 193-201. DOI: 10.17148/IARJSET.2021.8228
Thrush. (2019, January 15). Diabetes UK. https://www.diabetes.co.uk/diabetes-complications/diabetes-and-yeast-infections.html.
UCI Machine Learning Repository. (2020, July 12). Early stage diabetes risk prediction dataset. https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.
U.S. Department of Health & Human Services. (2004, January 12). Diabetes: A National Plan For Action. The Importance Of Early Diabetes Detection. https://aspe.hhs.gov/report/diabetes-national-plan-action/importance-early-diabetes-detection
Watson, S. (2018, September 29). Does Diabetes Cause Hair Loss. Healthline. https://www.healthline.com/health/does-diabetes-cause-hair-loss
WHO. (n.d.). Diabetes. Retrieved July 15, 2021, from https://www.who.int/health-topics/diabetes
Wood, T. (n.d.). What is a Random Forest. DeepAI. Retrieved August 01, 2021, from https://deepai.org/machine-learning-glossary-and-terms/random-forest

Early Stage Diabetes Prediction Using Machine Learning Methods

Yıl 2021, Sayı: 29, 52 - 57, 01.12.2021

Özge Nur Ergün , Hamza O.ilhan

https://doi.org/10.31590/ejosat.1015816

Cited By: 3

https://izlik.org/JA68EP83WZ

Öz

Diabetes is a common disease that is incurable and fatal. Millions of people worldwide have diabetes and it directly affects people’s lives. Early diagnosis helps reduce the effects of diabetes and improve the life quality of patients, but in common case people live with diabetes for years before getting diagnosed. Early diagnosis can be done by applying machine learning methods on existing data of patients. In this way, people can quickly get diagnosed without taking a glucose screening test or any blood test. Answering a simple question set would be enough to determine if a person is diabetic or has a risk of being diabetic. In the proposed study, determination of diabetes is performed by machine learning techniques. In this scope, a publicly available diabetes dataset, which includes 16 features that are collected from 520 people, was used to create predictive models. Eight machine learning methods were individually performed over the dataset. The results of each model were validated by using a 10 fold cross validation schema. Addition to accuracy metric, confusion matrix based other performance metrics; precision, recall and f1 score, were also reported. All of the created models resulted in high accuracy scores. The minimum accuracy score was measured as 88.85% by using one of the basic machine learning techniques, Naive Bayes. The highest accuracy rate was 99.04%, which is obtained by using a one dimensional convolutional neural network model. The designed Convolutional Neural Network model also resulted in highest performance scores for other metrics as 100.00%, 98.63% and 99.31% for precision, recall and f1 scores, respectively. These findings indicate that the created 1D CNN model can be utilized in the determination of diabetic patients by asking only several questions to patients.

Anahtar Kelimeler

Diabetes , Machine Learning , K Nearest Neighbor , Support Vector Machine , Naïve Bayes , Decision Tree , Random Forest , XGBoost , Artificial Neural Network , Convolutional Neural Network

Kaynakça

Ampadu, H. (2021, May 01). Random Forests Understanding. AI Pool. https://ai-pool.com/a/s/random-forests-understanding
Berkley, C. (2021, May 18). How Is Rapid Weight Loss Related to Diabetes. Verywell Health. https://www.verywellhealth.com/rapid-weight-loss-5101064
Bilgin, G. (2021). Makine Öğrenmesi Algoritmaları Kullanarak Erken Dönemde Diyabet Hastalığı Riskinin Araştırılması. Zeki Sistemler Teori ve Uygulamaları Dergisi, 4(1), 55-64. https://doi.org/10.46387/bjesr.790225
Cirino, E. (2019, July 6). What Causes Muscle Rigidity. Healthline. https://www.healthline.com/health/muscle-rigidity
Coelho, S. (2021, April 28). What Is Blurred Vision. Verywell Health. https://www.verywellhealth.com/blurred-vision-5114184
Draelos, R. (2019). Measuring Performance: The Confusion Matrix. Glass Box Medicine. https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
Harris, M. I., Klein, R., Welborn, T. A. & Knuiman, M. W. (1992). Onset of NIDDM occurs at least 4–7 yr before clinical diagnosis. Diabetes Care, 15(7), 815-819. DOI: 10.2337/diacare.15.7.815
Hawkins, D. M., Subhash, C. B. & Mills, D. (2003). Assessing Model Fit by Cross-Validation. Journal of Chemical Information and Computer Sciences, 43(2), 579–586. https://doi.org/10.1021/ci025626i
Hickman, R. J. (2020, July 28). What Is Polydipsia. Verywell Health. https://www.verywellhealth.com/polydipsia-4783881
IBM Cloud Education. (2020, July 15). What is machine learning. IBM. https://www.ibm.com/cloud/learn/machine-learning
Islam, M. M., Ferdousi, R., Rahman, S. & Bushra, H. Y. (2020). Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. Computer Vision and Machine Intelligence in Medical Image Analysis, 113-125. DOI:10.1007/978-981-13-8798-2_12
Jones, H. (2021, April 5). Causes of Polyphagia. Verywell Health. verywellhealth.com/polyphagia-5114624
Le, T. M., Vo, T. M., Pham, T. N. & Dao, S. V. T. (2020). A Novel Wrapper–Based Feature Selection for Early Diabetes Prediction Enhanced With a Metaheuristic. IEEE Access, 9, 7869-7884. DOI:10.1109/ACCESS.2020.3047942
Nahzat, S , Yağanoğlu, M . (2021). Diabetes Prediction Using Machine Learning Classification Algorithms . Avrupa Bilim ve Teknoloji Dergisi , Ejosat Özel Sayı 2021 (ARACONF) , 53-59 . DOI: 10.31590/ejosat.899716
Oladimeji, O. O., Oladimeji, A. & Oladimeji, O. (2021). Classification Models for Likelihood Prediction of Diabetes at Early Stage Using Feature Selection. Applied Computing and Informatics. https://doi.org/10.1108/ACI-01-2021-0022
Oleiwi, A. K., Shi, L., Tao, Y. & Wei, L. (2020). A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach. International Journal of Future Generation Communication and Networking, 13(3), 4151-4163.
Özer, İ. (2020). Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini. Mühendislik Bilimleri ve Araştırmaları Dergisi, 2(2), 50-57. https://doi.org/10.38016/jista.877292
Petrie, T. (2021, June 07). What Is Paresis. Verywell Health. https://www.verywellhealth.com/paresis-5184820
Ramachandran, A. & Chamukuttan, S. (2008). Early Diagnosis and Prevention of Diabetes in Developing Countries. Reviews in Endocrine and Metabolic Disorders, 9(3), 193-201. DOI: 10.1007/s11154-008-9079-z
Rish, I. (2001). An Empirical Study of the Naïve Bayes Classifier. IJCAI Workshop on Empirical Methods in AI, 3(22). 41-46.
Sadhu, A. & Jadli, A. (2021). Early-Stage Diabetes Risk Prediction: A Comparative Analysis of Classification Algorithms. International Advanced Research Journal in Science, Engineering and Technology (IARJSET), 8(2), 193-201. DOI: 10.17148/IARJSET.2021.8228
Thrush. (2019, January 15). Diabetes UK. https://www.diabetes.co.uk/diabetes-complications/diabetes-and-yeast-infections.html.
UCI Machine Learning Repository. (2020, July 12). Early stage diabetes risk prediction dataset. https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.
U.S. Department of Health & Human Services. (2004, January 12). Diabetes: A National Plan For Action. The Importance Of Early Diabetes Detection. https://aspe.hhs.gov/report/diabetes-national-plan-action/importance-early-diabetes-detection
Watson, S. (2018, September 29). Does Diabetes Cause Hair Loss. Healthline. https://www.healthline.com/health/does-diabetes-cause-hair-loss
WHO. (n.d.). Diabetes. Retrieved July 15, 2021, from https://www.who.int/health-topics/diabetes
Wood, T. (n.d.). What is a Random Forest. DeepAI. Retrieved August 01, 2021, from https://deepai.org/machine-learning-glossary-and-terms/random-forest

Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Özge Nur Ergün 0000-0002-9997-0853 Hamza O.ilhan 0000-0002-1753-2703
Yayımlanma Tarihi	1 Aralık 2021
DOI	https://doi.org/10.31590/ejosat.1015816
IZ	https://izlik.org/JA68EP83WZ
Yayımlandığı Sayı	Yıl 2021 Sayı: 29

Kaynak Göster

APA	Ergün, Ö. N., & O.ilhan, H. (2021). Early Stage Diabetes Prediction Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, 29, 52-57. https://doi.org/10.31590/ejosat.1015816

Cited By

Desenleştirilmiş Karma Verilerin Transfer Öğrenme Yöntemi Kullanılarak Evrişimli Sinir Ağlarıyla Sınıflandırılması

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

https://doi.org/10.19113/sdufenbed.1293579

Feature Selection in the Diabetes Dataset with the Marine Predator Algorithm and Classification using Machine Learning Methods

Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji

https://doi.org/10.29109/gujsc.1396051

An explainable analysis of diabetes mellitus using statistical and artificial intelligence techniques

BMC Medical Informatics and Decision Making

https://doi.org/10.1186/s12911-024-02810-x

Makale Dosyaları

Tam Metin