EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH

Tuğba Palabaş

doi:10.18036/estubtdc.1320922

Araştırma Makalesi

TOPLULUK YAKLAŞIMINA DAYALI MAKİNE ÖĞRENME TEKNİKLERİ KULLANARAK ERKEN DÖNEM DİYABET RİSK TAHMİNİ

Yıl 2024, , 74 - 85, 30.07.2024

Tuğba Palabaş

https://doi.org/10.18036/estubtdc.1320922

Öz

En ölümcül hastalıklardan biri olarak kabul edilen Diabetes Mellitus yaygın görülen kronik bir hastalıktır. Aynı zamanda başta nöropati, nefropati ve retinopati olmak üzere birçok hastalığın ortaya çıkmasına neden olur. Bu bağlamda semptomların doğru değerlendirilerek hastalığın erken teşhisi ve hızlı bir tedavi sürecinin başlatılması çok önemlidir. Bu çalışmanın amacı, diyabet riskini erken evrede en iyi doğrulukla belirleyebilecek etkili bir model sunmaktır. Bunun için diyabet risk tahmininde sıklıkla kullanılan sınıflandırma algoritmaları topluluk yaklaşımları ile desteklenmektedir. İlk olarak, Naive Bayes (NB), Trees-J48, k En Yakın Komşu (kNN) ve Sıralı Minimal Optimizasyon (SMO) sınıflandırıcılarının performansı, Sylhet'teki Sylhet Diyabet Hastanesi hastalarından doğrudan anketlerle toplanan 520 örneklik bir veri seti kullanılarak ayrı ayrı analiz edilmiştir. , Bangladeş. Daha sonra Adabost, Bagging ve Random Sub-Space (RSS) algoritmalarının sınıflandırıcı başarısı üzerindeki etkileri incelenmiş ve Adabost yaklaşımına dayalı j48 sınıflandırıcının bu veri setinde en iyi doğruluğa sahip olduğu gösterilmiştir. Son olarak, diyabetin tahmin maliyetini azaltmak ve sınıflandırma başarısını artırmak için Wrapper Subset Eval (WSE) öznitelik çıkarım algoritması uygulanmaktadır. Böylece önerilen sınıflandırıcı yöntemi ile indirgenmiş veri seti kullanılarak %99 ile en iyi doğruluk elde edilmiştir.

Anahtar Kelimeler

Diyabet, Sınıflandırma, Topluluk Yaklaşımı, Özellik Çıkarma

Kaynakça

[1] Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. How cells obtain energy from food. In Molecular Biology of the Cell. 4th edition. Garland Science, 2002.
[2] Mergenthaler P, Lindauer U, Dienel GA, Meisel A. Sugar for the brain: the role of glucose in physiological and pathological brain function. Trends in neurosciences, 36(10), 587-597, 2013.
[3] Brutsaert EF. Diabetes mellitus (DM). Merck Manual, 2020.
[4] International Diabet Federation, “IDF Diabetes Atlas”. https://diabetesatlas.org/(16.05.2023).
[5] Sağlık Bakanlığı, “Kronik Hastalıklar”. https://www.saglik.gov.tr/yazdir?2DE933CD45A7AD200096270A9E25E935 (16.05.2023).
[6] Marshall SM, Flyvbjerg A. Prevention and early detection of vascular complications of diabetes. Bmj, 333(7566), 475-480, 2006.
[7] Sümbül H, Yüzer AH. Development of diagnostic device for COPD: a MEMS based approach. Int J Comput Sci Network Secur. 2017;17 (7):196–203.
[8] Sümbül H, Yüzer AH. Estimating the value of the volume from acceleration on the diaphragm movements during breathing. J Eng Sci Technol. 2018;13(5):1205–1221.
[9] Sümbül H, Yüzer AH. Measuring of diaphragm movements by using iMEMS acceleration sensor. In 2015 9th International Conference on Electrical and Electronics Engineering (ELECO) EEE; 2015: 166–170.
[10] Laila UE, Mahboob K, Khan AW, Khan F, Taekeun W. An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study. Sensors 2022; 22(14), p 5247.
[11] Khafaga DS, Alharbi AH, Mohamed I, Hosny KM. An Integrated Classification and Association Rule Technique for Early-Stage Diabetes Risk Prediction. In Healthcare 2022; 10(10), p 2070.
[12] Islam MM, Ferdousi R, Rahman S, Bushra HY. Likelihood prediction of diabetes at early stage using data mining techniques. In Computer vision and machine intelligence in medical image analysis (pp. 113-125). Springer, Singapore, 2020.
[13] Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia computer science 2018; 132, p 1578-1585.
[14] Naz H, Ahuja S. Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders 2020; 19, p 391-403.
[15] Peker M, Özkaraca O, Şaşar A. Use of orange data mining toolbox for data analysis in clinical decision making: The diagnosis of diabetes disease. In Expert System Techniques in Biomedical Science Practice 2018; p 143-167.
[16] Kalaycı TE. Comparison of machine learning techniques for classification of phishing web sites. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2018; 24(5), p 870-878.
[17] Aytuğ O, Korukoğlu S. A review of literature on the use of machine learning. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2016; 22(2), p 111-122.
[18] Özdemir A, Aytuğ O, Ergene VÇ. Machine learning and ensemble learning based method using online employee assessments to identify and analyze job satisfaction factors. Avrupa Bilim ve Teknoloji Dergisi 2022; 40, p 19-28.
[19] UCI Machine Learning Repository, “Early-stage diabetes risk prediction dataset”. https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset (16.05.2023).
[20] Tsymbal A, Puuronen S, Patterson DW. Ensemble feature selection with the simple Bayesian classification. Information fusion 2003; 4(2), p 87-100.
[21] Banchhor C, Srinivasu N. Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification. Data & Knowledge Engineering 2020; 127, p 101788.
[22] Altaş D, Gülpınar V. A Comparison of Classification Performances Of The Decision Trees and The Artificial Neural Networks: European Union, Trakya Üniversitesi Sosyal Bilimler Dergisi 2012; 14(1) p 1-22.
[23] Kavzoğlu T, Çölkesen İ. Classification of Satellite Images Using Decision Trees: Kocaeli Case. Harita Teknolojileri Elektronik Dergisi 2010; 2(1), p 36-45.
[24] Sangeorzan L. Effectiveness analysis of ZeroR and J48 classifiers using WEKA toolkit. Bulletin of the Transilvania University of Brasov. Series III: Mathematics and Computer Science 2019; p 481-486.
[25] Chen CH. A novel multi-criteria decision-making model for building material supplier selection based on entropy-AHP weighted TOPSIS. Entropy 2020; 22(2), p 259.
[26] Hemmatian F, Sohrabi MK. A survey on classification techniques for opinion mining and sentiment analysis. Artificial intelligence review 2019; 52(3), p 1495-1545.
[27] Alharbi Y, Alferaidi A, Yadav K, Dhiman G, Kautish S. Denial-of-service attack detection over IPv6 network based on KNN algorithm. Wireless Communications and Mobile Computing 2021; p 1-6.
[28] Platt JC. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods. Support vector learning 1999; p 185-208.
[29] McHugh ML. Interrater reliability: the kappa statistic. Biochemia medica 2012; 22(3), p 276-282.
[30] Alghamdi A S, Polat K, Alghoson A, Alshdadi AA, Abd El-Latif AA. A novel blood pressure estimation method based on the classification of oscillometric waveforms using machine-learning methods. Applied Acoustics 2020; 164, p 107279.
[31] Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 2006; 62(1), p 221-229.
[32] Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian journal of internal medicine 2013; 4(2), p 627.
[33] Kemalbay G, Alkış BN. Prediction of stock index movement direction with multiple logistic regression and k-nearest neighbors algorithm. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2020; 27(4), p 556-569.
[34] Janssens ACJ, Martens FK. Reflection on modern methods: Revisiting the area under the ROC Curve. International journal of epidemiology 2020; 49(4), p 1397-1403.
[35] Ruisánchez I, Jiménez-Carvelo AM, Callao MP. ROC curves for the optimization of one-class model parameters. A case study: Authenticating extra virgin olive oil from a Catalan protected designation of origin. Talanta 2021; 222, p 121564.
[36] Cihan P, Kalipsiz O, Gökçe E. Computer-aided diagnosis in neonatal lambs. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2020; 26(2), p 385-391.
[37] Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry 1993; 39(4), p 561-577.
[38] Taser PY. Application of bagging and boosting approaches using decision tree-based algorithms in diabetes risk prediction. Proceedings 2021; 74(1), p 6.
[39] Wijayaningrum VN, Saragih TH, Putriwijaya NN. Optimal multi-layer perceptron parameters for early stage diabetes risk prediction. In IOP Conference Series: Materials Science and Engineering 2021; 1073(1), p 012070.

EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH

Yıl 2024, , 74 - 85, 30.07.2024

Tuğba Palabaş

https://doi.org/10.18036/estubtdc.1320922

Öz

Diabetes Mellitus which is considered as one of the deadliest is a common, chronic disease. It also causes the emergence of many diseases, especially neuropathy, nephropathy and retinopathy. In this context, early diagnosis of the disease by accurately evaluating the symptoms and initiating a rapid treatment process is very important. The aim of this study is to present an effective model that can determine the diabetes risk in eary-stage with the best accuracy. To do so, the classification algorithms that are frequently used in diabetes risk estimation are supported with ensemble approaches. Firstly, the performance of Naive Bayes (NB), Trees-J48, k Nearest Neighbor (kNN) and Sequential Minimal Optimization (SMO) classifiers is analyzed separately by using a dataset of 520 samples collected with direct questionnaires from Sylhet Diabetes Hospital patients in Sylhet, Bangladesh. Then, the effects of Adabost, Bagging and Random Sub-Space (RSS) algorithms on classifier success are investigated and it is shown that the j48 classifier based on Adabost approach has the best accuracy in this dataset. Finally, the Wrapper Subset Eval (WSE) feature extraction algorithm is applied to reduce the estimation cost of diabetes and increase classification success. Thus, the best accuracy at 99% is achieved using reduced data set with proposed classifier method.

Anahtar Kelimeler

Diabetes, Classification, Ensemble Approach, Feature Extraction

Kaynakça

[1] Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. How cells obtain energy from food. In Molecular Biology of the Cell. 4th edition. Garland Science, 2002.
[2] Mergenthaler P, Lindauer U, Dienel GA, Meisel A. Sugar for the brain: the role of glucose in physiological and pathological brain function. Trends in neurosciences, 36(10), 587-597, 2013.
[3] Brutsaert EF. Diabetes mellitus (DM). Merck Manual, 2020.
[4] International Diabet Federation, “IDF Diabetes Atlas”. https://diabetesatlas.org/(16.05.2023).
[5] Sağlık Bakanlığı, “Kronik Hastalıklar”. https://www.saglik.gov.tr/yazdir?2DE933CD45A7AD200096270A9E25E935 (16.05.2023).
[6] Marshall SM, Flyvbjerg A. Prevention and early detection of vascular complications of diabetes. Bmj, 333(7566), 475-480, 2006.
[7] Sümbül H, Yüzer AH. Development of diagnostic device for COPD: a MEMS based approach. Int J Comput Sci Network Secur. 2017;17 (7):196–203.
[8] Sümbül H, Yüzer AH. Estimating the value of the volume from acceleration on the diaphragm movements during breathing. J Eng Sci Technol. 2018;13(5):1205–1221.
[9] Sümbül H, Yüzer AH. Measuring of diaphragm movements by using iMEMS acceleration sensor. In 2015 9th International Conference on Electrical and Electronics Engineering (ELECO) EEE; 2015: 166–170.
[10] Laila UE, Mahboob K, Khan AW, Khan F, Taekeun W. An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study. Sensors 2022; 22(14), p 5247.
[11] Khafaga DS, Alharbi AH, Mohamed I, Hosny KM. An Integrated Classification and Association Rule Technique for Early-Stage Diabetes Risk Prediction. In Healthcare 2022; 10(10), p 2070.
[12] Islam MM, Ferdousi R, Rahman S, Bushra HY. Likelihood prediction of diabetes at early stage using data mining techniques. In Computer vision and machine intelligence in medical image analysis (pp. 113-125). Springer, Singapore, 2020.
[13] Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia computer science 2018; 132, p 1578-1585.
[14] Naz H, Ahuja S. Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders 2020; 19, p 391-403.
[15] Peker M, Özkaraca O, Şaşar A. Use of orange data mining toolbox for data analysis in clinical decision making: The diagnosis of diabetes disease. In Expert System Techniques in Biomedical Science Practice 2018; p 143-167.
[16] Kalaycı TE. Comparison of machine learning techniques for classification of phishing web sites. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2018; 24(5), p 870-878.
[17] Aytuğ O, Korukoğlu S. A review of literature on the use of machine learning. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2016; 22(2), p 111-122.
[18] Özdemir A, Aytuğ O, Ergene VÇ. Machine learning and ensemble learning based method using online employee assessments to identify and analyze job satisfaction factors. Avrupa Bilim ve Teknoloji Dergisi 2022; 40, p 19-28.
[19] UCI Machine Learning Repository, “Early-stage diabetes risk prediction dataset”. https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset (16.05.2023).
[20] Tsymbal A, Puuronen S, Patterson DW. Ensemble feature selection with the simple Bayesian classification. Information fusion 2003; 4(2), p 87-100.
[21] Banchhor C, Srinivasu N. Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification. Data & Knowledge Engineering 2020; 127, p 101788.
[22] Altaş D, Gülpınar V. A Comparison of Classification Performances Of The Decision Trees and The Artificial Neural Networks: European Union, Trakya Üniversitesi Sosyal Bilimler Dergisi 2012; 14(1) p 1-22.
[23] Kavzoğlu T, Çölkesen İ. Classification of Satellite Images Using Decision Trees: Kocaeli Case. Harita Teknolojileri Elektronik Dergisi 2010; 2(1), p 36-45.
[24] Sangeorzan L. Effectiveness analysis of ZeroR and J48 classifiers using WEKA toolkit. Bulletin of the Transilvania University of Brasov. Series III: Mathematics and Computer Science 2019; p 481-486.
[25] Chen CH. A novel multi-criteria decision-making model for building material supplier selection based on entropy-AHP weighted TOPSIS. Entropy 2020; 22(2), p 259.
[26] Hemmatian F, Sohrabi MK. A survey on classification techniques for opinion mining and sentiment analysis. Artificial intelligence review 2019; 52(3), p 1495-1545.
[27] Alharbi Y, Alferaidi A, Yadav K, Dhiman G, Kautish S. Denial-of-service attack detection over IPv6 network based on KNN algorithm. Wireless Communications and Mobile Computing 2021; p 1-6.
[28] Platt JC. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods. Support vector learning 1999; p 185-208.
[29] McHugh ML. Interrater reliability: the kappa statistic. Biochemia medica 2012; 22(3), p 276-282.
[30] Alghamdi A S, Polat K, Alghoson A, Alshdadi AA, Abd El-Latif AA. A novel blood pressure estimation method based on the classification of oscillometric waveforms using machine-learning methods. Applied Acoustics 2020; 164, p 107279.
[31] Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 2006; 62(1), p 221-229.
[32] Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian journal of internal medicine 2013; 4(2), p 627.
[33] Kemalbay G, Alkış BN. Prediction of stock index movement direction with multiple logistic regression and k-nearest neighbors algorithm. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2020; 27(4), p 556-569.
[34] Janssens ACJ, Martens FK. Reflection on modern methods: Revisiting the area under the ROC Curve. International journal of epidemiology 2020; 49(4), p 1397-1403.
[35] Ruisánchez I, Jiménez-Carvelo AM, Callao MP. ROC curves for the optimization of one-class model parameters. A case study: Authenticating extra virgin olive oil from a Catalan protected designation of origin. Talanta 2021; 222, p 121564.
[36] Cihan P, Kalipsiz O, Gökçe E. Computer-aided diagnosis in neonatal lambs. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2020; 26(2), p 385-391.
[37] Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry 1993; 39(4), p 561-577.
[38] Taser PY. Application of bagging and boosting approaches using decision tree-based algorithms in diabetes risk prediction. Proceedings 2021; 74(1), p 6.
[39] Wijayaningrum VN, Saragih TH, Putriwijaya NN. Optimal multi-layer perceptron parameters for early stage diabetes risk prediction. In IOP Conference Series: Materials Science and Engineering 2021; 1073(1), p 012070.

Toplam 39 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Biyoelektronik
Bölüm	Makaleler
Yazarlar	Tuğba Palabaş 0000-0002-6985-6494
Yayımlanma Tarihi	30 Temmuz 2024
Yayımlandığı Sayı	Yıl 2024

Kaynak Göster

APA	Palabaş, T. (2024). EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH. Eskişehir Teknik Üniversitesi Bilim Ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji, 13(2), 74-85. https://doi.org/10.18036/estubtdc.1320922
AMA	Palabaş T. EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH. Estuscience - Life. Temmuz 2024;13(2):74-85. doi:10.18036/estubtdc.1320922
Chicago	Palabaş, Tuğba. “EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH”. Eskişehir Teknik Üniversitesi Bilim Ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji 13, sy. 2 (Temmuz 2024): 74-85. https://doi.org/10.18036/estubtdc.1320922.
EndNote	Palabaş T (01 Temmuz 2024) EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji 13 2 74–85.
IEEE	T. Palabaş, “EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH”, Estuscience - Life, c. 13, sy. 2, ss. 74–85, 2024, doi: 10.18036/estubtdc.1320922.
ISNAD	Palabaş, Tuğba. “EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH”. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji 13/2 (Temmuz 2024), 74-85. https://doi.org/10.18036/estubtdc.1320922.
JAMA	Palabaş T. EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH. Estuscience - Life. 2024;13:74–85.
MLA	Palabaş, Tuğba. “EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH”. Eskişehir Teknik Üniversitesi Bilim Ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji, c. 13, sy. 2, 2024, ss. 74-85, doi:10.18036/estubtdc.1320922.
Vancouver	Palabaş T. EARLY-STAGE DIABETES RISK PREDICTION USING MACHINE LEARNING TECHNIQUES BASED ON ENSEMBLE APPROACH. Estuscience - Life. 2024;13(2):74-85.

Makale Dosyaları

Tam Metin