Predicting Stroke Risk with Machine Learning and Hyperparameter Optimization

Burak Özkanat; Evin Şahin Sadık

doi:10.31466/kfbd.1538305

Araştırma Makalesi

Predicting Stroke Risk with Machine Learning and Hyperparameter Optimization

Yıl 2025, Cilt: 15 Sayı: 2, 633 - 647, 15.06.2025

Burak Özkanat , Evin Şahin Sadık

https://doi.org/10.31466/kfbd.1538305

https://izlik.org/JA96YY78CR

Öz

Stroke is a serious medical condition that causes the death of brain cells due to insufficient blood flow due to blockage or rupture in the blood vessels leading to the brain. Stroke is the most common cause of death and disability in adults after heart attack and cancer, causing individuals to not only die but also live with permanent disabilities. In this study, 12 features and 7 different machine learning methods belonging to 5100 individuals in an open-source dataset were used to predict stroke risk. Hyperparameter optimization was applied to increase the performance of machine learning methods and the best parameters were selected. When the results were examined, the random forest algorithm was able to detect the risk of stroke with an accuracy of 96.98%, which is higher than other studies in literature. This study discusses the effective use of machine learning algorithms to predict stroke risk and efforts to improve model performance. The results obtained may help in more accurate determination of stroke risk and taking preventive measures.

Anahtar Kelimeler

Classification , Hyperparameter optimization , Stroke , Machine learning

Kaynakça

Agatonovic-Kustrin, S., & Beresford, R. (2000). Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. Journal of Pharmaceutical and Biomedical Analysis, 22(5), 717–727.
Alomoush, W., Houssein, E. H., Alrosan, A., Abd-Alrazaq, A., Alweshah, M., & Alshinwan, M. (2024). Joint opposite selection enhanced Mountain Gazelle Optimizer for brain stroke classification. Evolutionary Intelligence, 17(4), 2865–2883.
Arslan, A. K., Colak, C., & Sarihan, M. E. (2016). Different medical data mining approaches based prediction of ischemic stroke. Computer Methods and Programs in Biomedicine, 130, 87–92.
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Delpont, B., Blanc, C., Osseby, G. V, Hervieu-Bègue, M., Giroud, M., & Béjot, Y. (2018). Pain after stroke: a review. Revue Neurologique, 174(10), 671–674.
Emon, M. U., Keya, M. S., Meghla, T. I., Rahman, M. M., Al Mamun, M. S., & Kaiser, M. S. (2020). Performance analysis of machine learning approaches in stroke prediction. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), 1464–1469.
Federico Soriano Palacios. (n.d.). Stroke Prediction Dataset. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset?resource=download
Firmansyah, M. R., & Astuti, Y. P. (2024). Stroke classification comparison with KNN through standardization and normalization techniques. Advance Sustainable Science, Engineering and Technology, 6(1), 2401012.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
Imran, B., Wahyudi, E., Subki, A., Salman, S., & Yani, A. (2022). Classification of stroke patients using data mining with adaboost, decision tree and random forest models. ILKOM Jurnal Ilmiah, 14(3), 218–228.
Joachims, T. (1998). Making large-scale SVM learning practical. Technical report.
Kansadub, T., Thammaboosadee, S., Kiattisin, S., & Jalayondeja, C. (2015). Stroke risk prediction model based on demographic data. 2015 8th Biomedical Engineering International Conference (BMEiCON), 1–3.
Katan, M., & Luft, A. (2018). Global burden of stroke. Seminars in Neurology, 38(02), 208–211.
Marsh, J. D., & Keyrouz, S. G. (2010). Stroke prevention and treatment. Journal of the American College of Cardiology, 56(9), 683–691.
Monteiro, M., Fonseca, A. C., Freitas, A. T., e Melo, T. P., Francisco, A. P., Ferro, J. M., & Oliveira, A. L. (2018). Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(6), 1953–1959.
Nwosu, C. S., Dev, S., Bhardwaj, P., Veeravalli, B., & John, D. (2019). Predicting stroke from electronic health records. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 5704–5707.
O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., & Invernizzi, L. (2019). Keras tuner. Retrieved May, 21, 2020.
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222.
Pandian, J. D., Gall, S. L., Kate, M. P., Silva, G. S., Akinyemi, R. O., Ovbiagele, B. I., Lavados, P. M., Gandhi, D. B. C., & Thrift, A. G. (2018). Prevention of stroke: a global perspective. The Lancet, 392(10154), 1269–1278.
Penafiel, S., Baloian, N., Sanson, H., & Pino, J. A. (2020). Predicting stroke risk with an interpretable classifier. IEEE Access, 9, 1154–1166.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221–234.
Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3(22), 41–46.
SAFE. (2019). Avrupa İnme Faaliyet Planı Raporu. https://actionplan.eso-stroke.org/wp-content/uploads/2021/03/SAP-Turkish-s.pdf
Sailasya, G., & Kumari, G. L. A. (2021). Analyzing the performance of stroke prediction using ML classification algorithms. International Journal of Advanced Computer Science and Applications, 12(6).
Shamrat, F. M. J. M., Ghosh, P., Sadek, M. H., Kazi, M. A., & Shultana, S. (2020). Implementation of machine learning algorithms to detect the prognosis rate of kidney disease. 2020 IEEE International Conference for Innovation in Technology (INOCON), 1–7.
Singh, U., Jena, A. K., & Haque, M. T. (2022). An Ensemble Learning Approach and Analysis for Stroke Prediction Dataset. 2022 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), 1–8.
Tazin, T., Alam, M. N., Dola, N. N., Bari, M. S., Bourouis, S., & Monirujjaman Khan, M. (2021). Stroke disease detection and prediction using robust learning approaches. Journal of Healthcare Engineering, 2021.
TUİK. (2021). Ölüm ve Ölüm Nedeni İstatistikleri.
Xia, X., Yue, W., Chao, B., Li, M., Cao, L., Wang, L., Shen, Y., & Li, X. (2019). Prevalence and risk factors of stroke in the elderly in Northern China: data from the National Stroke Screening Survey. Journal of Neurology, 266, 1449–1458.
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Zuo, T., Li, F., Zhang, X., Hu, F., Huang, L., & Jia, W. (2024). Stroke classification based on deep reinforcement learning over stroke screening imbalanced data. Computers and Electrical Engineering, 114, 109069.

Makine Öğrenimi ve Hiperparametre Optimizasyonuyla İnme Riskinin Tahmin Edilmesi

Yıl 2025, Cilt: 15 Sayı: 2, 633 - 647, 15.06.2025

Burak Özkanat , Evin Şahin Sadık

https://doi.org/10.31466/kfbd.1538305

https://izlik.org/JA96YY78CR

Öz

İnme, beyne giden damarlarda meydana gelen tıkanma veya yırtılma sonucu yetersiz kan akışıyla beyin hücrelerinin ölümüne neden olan ciddi bir tıbbi durumdur. İnme, bireylerin hayatını kaybetmesinin yanı sıra kalıcı sakatlıklarla yaşamlarını sürdürmelerine de yol açabilen erişkinlerde kalp krizi ve kanserden sonra en yaygın ölüm ve sakatlık sebebidir. Bu çalışma kapsamında, inme riskini tahmin etmek için açık kaynak bir veri setindeki 5100 bireye ait 12 öznitelik ve 7 farklı makine öğrenmesi yöntemi kullanılmıştır. Makine öğrenmesi yöntemlerinin performansını arttırmak için hiperparametre optimizasyonu uygulanmış ve en iyi parametreler seçilmiştir. Sonuçlar incelendiğinde Rastgele orman algoritması ile literatürdeki diğer çalışmalara göre daha yüksek olan %96,98 oranında bir doğruluk ile inme riski tespiti yapılabilmiştir. Bu çalışmada, inme riskini tahmin etmek için makine öğrenimi algoritmalarının etkin kullanımı ve model performansını iyileştirmeye yönelik çabalar tartışılmaktadır. Elde edilen sonuçlar inme riskinin daha doğru belirlenmesine ve önleyici tedbirlerin alınmasına yardımcı olabilir.

Anahtar Kelimeler

Sınıflandırma , Hiperparametre optimizasyonu , İnme , Makine öğrenimi

Kaynakça

Agatonovic-Kustrin, S., & Beresford, R. (2000). Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. Journal of Pharmaceutical and Biomedical Analysis, 22(5), 717–727.
Alomoush, W., Houssein, E. H., Alrosan, A., Abd-Alrazaq, A., Alweshah, M., & Alshinwan, M. (2024). Joint opposite selection enhanced Mountain Gazelle Optimizer for brain stroke classification. Evolutionary Intelligence, 17(4), 2865–2883.
Arslan, A. K., Colak, C., & Sarihan, M. E. (2016). Different medical data mining approaches based prediction of ischemic stroke. Computer Methods and Programs in Biomedicine, 130, 87–92.
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Delpont, B., Blanc, C., Osseby, G. V, Hervieu-Bègue, M., Giroud, M., & Béjot, Y. (2018). Pain after stroke: a review. Revue Neurologique, 174(10), 671–674.
Emon, M. U., Keya, M. S., Meghla, T. I., Rahman, M. M., Al Mamun, M. S., & Kaiser, M. S. (2020). Performance analysis of machine learning approaches in stroke prediction. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), 1464–1469.
Federico Soriano Palacios. (n.d.). Stroke Prediction Dataset. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset?resource=download
Firmansyah, M. R., & Astuti, Y. P. (2024). Stroke classification comparison with KNN through standardization and normalization techniques. Advance Sustainable Science, Engineering and Technology, 6(1), 2401012.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
Imran, B., Wahyudi, E., Subki, A., Salman, S., & Yani, A. (2022). Classification of stroke patients using data mining with adaboost, decision tree and random forest models. ILKOM Jurnal Ilmiah, 14(3), 218–228.
Joachims, T. (1998). Making large-scale SVM learning practical. Technical report.
Kansadub, T., Thammaboosadee, S., Kiattisin, S., & Jalayondeja, C. (2015). Stroke risk prediction model based on demographic data. 2015 8th Biomedical Engineering International Conference (BMEiCON), 1–3.
Katan, M., & Luft, A. (2018). Global burden of stroke. Seminars in Neurology, 38(02), 208–211.
Marsh, J. D., & Keyrouz, S. G. (2010). Stroke prevention and treatment. Journal of the American College of Cardiology, 56(9), 683–691.
Monteiro, M., Fonseca, A. C., Freitas, A. T., e Melo, T. P., Francisco, A. P., Ferro, J. M., & Oliveira, A. L. (2018). Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(6), 1953–1959.
Nwosu, C. S., Dev, S., Bhardwaj, P., Veeravalli, B., & John, D. (2019). Predicting stroke from electronic health records. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 5704–5707.
O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., & Invernizzi, L. (2019). Keras tuner. Retrieved May, 21, 2020.
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222.
Pandian, J. D., Gall, S. L., Kate, M. P., Silva, G. S., Akinyemi, R. O., Ovbiagele, B. I., Lavados, P. M., Gandhi, D. B. C., & Thrift, A. G. (2018). Prevention of stroke: a global perspective. The Lancet, 392(10154), 1269–1278.
Penafiel, S., Baloian, N., Sanson, H., & Pino, J. A. (2020). Predicting stroke risk with an interpretable classifier. IEEE Access, 9, 1154–1166.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221–234.
Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3(22), 41–46.
SAFE. (2019). Avrupa İnme Faaliyet Planı Raporu. https://actionplan.eso-stroke.org/wp-content/uploads/2021/03/SAP-Turkish-s.pdf
Sailasya, G., & Kumari, G. L. A. (2021). Analyzing the performance of stroke prediction using ML classification algorithms. International Journal of Advanced Computer Science and Applications, 12(6).
Shamrat, F. M. J. M., Ghosh, P., Sadek, M. H., Kazi, M. A., & Shultana, S. (2020). Implementation of machine learning algorithms to detect the prognosis rate of kidney disease. 2020 IEEE International Conference for Innovation in Technology (INOCON), 1–7.
Singh, U., Jena, A. K., & Haque, M. T. (2022). An Ensemble Learning Approach and Analysis for Stroke Prediction Dataset. 2022 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), 1–8.
Tazin, T., Alam, M. N., Dola, N. N., Bari, M. S., Bourouis, S., & Monirujjaman Khan, M. (2021). Stroke disease detection and prediction using robust learning approaches. Journal of Healthcare Engineering, 2021.
TUİK. (2021). Ölüm ve Ölüm Nedeni İstatistikleri.
Xia, X., Yue, W., Chao, B., Li, M., Cao, L., Wang, L., Shen, Y., & Li, X. (2019). Prevalence and risk factors of stroke in the elderly in Northern China: data from the National Stroke Screening Survey. Journal of Neurology, 266, 1449–1458.
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Zuo, T., Li, F., Zhang, X., Hu, F., Huang, L., & Jia, W. (2024). Stroke classification based on deep reinforcement learning over stroke screening imbalanced data. Computers and Electrical Engineering, 114, 109069.

Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Biyomedikal Mühendisliği (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Burak Özkanat 0009-0007-7262-8891 Evin Şahin Sadık 0000-0002-2212-4210
Gönderilme Tarihi	25 Ağustos 2024
Kabul Tarihi	25 Nisan 2025
Yayımlanma Tarihi	15 Haziran 2025
DOI	https://doi.org/10.31466/kfbd.1538305
IZ	https://izlik.org/JA96YY78CR
Yayımlandığı Sayı	Yıl 2025 Cilt: 15 Sayı: 2

Kaynak Göster

APA	Özkanat, B., & Şahin Sadık, E. (2025). Predicting Stroke Risk with Machine Learning and Hyperparameter Optimization. Karadeniz Fen Bilimleri Dergisi, 15(2), 633-647. https://doi.org/10.31466/kfbd.1538305

Makale Dosyaları

Tam Metin

Karadeniz Fen Bilimleri Dergisinde yayınlanan makaleler Creative Commons Attribution-NonCommercial 4.0 International kapsamında lisanslanmıştır.