Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması

Şerife Çelikbaş; Zeynep Orman; Türker Aksoy; Derya Yılmaz Baysoy

doi:10.29130/dubited.1268348

Araştırma Makalesi

Artificial Immune System with Special Selection for Stroke Prediction in İmbalanced Data

Yıl 2024, Cilt: 12 Sayı: 3, 1723 - 1738, 31.07.2024

Şerife Çelikbaş , Zeynep Orman , Türker Aksoy , Derya Yılmaz Baysoy

https://doi.org/10.29130/dubited.1268348

Öz

Stroke is a neurological disease caused by either bleeding or blockage in the brain, and it is becoming increasingly common worldwide. It can lead to direct deaths as well as disabilities. Due to the lack of a generally accepted and predictable diagnosis method, early diagnosis is a challenging topic. However, detecting recurrent stroke incidents is also crucial. Early stroke prediction has been studied numerous times in the literature by using artificial intelligence techniques, however, it remains an area open to development. In this study, a model is proposed to address the imbalance issue on a stroke dataset with limited patient data. An artificial immune system algorithm with parameters updated by the firefly algorithm is used for data balancing. The algorithm’s outputs were adjusted according to the One-Sided Selection model to improve the performance of the minority class. The model's efficiency is presented with performance metrics evaluated based on six different classification algorithms, namely Categorical Boosting Algorithm (CatBoost), Light Gradient Boosting Machine (LightGBMBoost), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Logistic Regression (LR). The proposed approach achieved effective results compared to previous studies, with accuracy, specificity, and sensitivity rates of 86%, 38%, and 87%, respectively.

Anahtar Kelimeler

Stroke Disease, Imbalanced Data Set, Artificial Immune System, Firefly Algorithm

Kaynakça

[1] M. O. Owolabi et al., “The state of stroke services across the globe: Report of World Stroke Organization–World Health Organization surveys,” International Journal of Stroke, vol. 16, no. 8, pp. 889–901, May 2021, doi: https://doi.org/10.1177/17474930211019568.
[2] Y. Chen, K. T. Abel, J. T. Janecek, Y. Chen, K. Zheng, and S. C. Cramer, “Home-based technologies for stroke rehabilitation: A systematic review,” International Journal of Medical Informatics, vol. 123, pp. 11–22, Mar. 2019, doi: https://doi.org/10.1016/j.ijmedinf.2018.12.001.
[3] M. J. O’Donnell et al., “Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study,” Lancet (London, England), vol. 388, no. 10046, pp. 761–75, 2016, doi: https://doi.org/10.1016/S0140-6736(16)30506-2.
[4] A. K. Arslan, C. Colak, and M. E. Sarihan, “Different medical data mining approaches based prediction of ischemic stroke,” Computer Methods and Programs in Biomedicine, vol. 130, pp. 87–92, Jul. 2016, doi: https://doi.org/10.1016/j.cmpb.2016.03.022.
[5] D. I. Puspitasari, A. F. Riza Kholdani, A. Dharmawati, M. E. Rosadi, and W. Mega Pradnya Dhuhita, “Stroke Disease Analysis and Classification Using Decision Tree and Random Forest Methods,” IEEE Xplore, Nov. 01, 2021. https://ieeexplore.ieee.org/document/9632906 (accessed Dec. 10, 2022).
[6] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, May 2017, doi: https://doi.org/10.1016/j.eswa.2016.12.035.
[7] J. Li et al., “Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data,” PLOS ONE, vol. 12, no. 7, p. e0180830, Jul. 2017, doi: https://doi.org/10.1371/journal.pone.0180830. [8] F. Yagin, I. Cicek, and Z. Kucukakcali, “Classification of stroke with gradient boosting tree using smote-based oversampling method,” Medicine Science | International Medical Journal, vol. 10, no. 4, p. 1510, 2021, doi: https://doi.org/10.5455/medscience.2021.09.322. [9] G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021, doi: https://doi.org/10.14569/ijacsa.2021.0120662.
[10] C. Rana, N. Chitre, B. Poyekar, and P. Bide, “Stroke Prediction Using Smote-Tomek and Neural Network,” 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2021, doi: https://doi.org/10.1109/icccnt51525.2021.9579763. [11] A. Dev and S. K. Malik, “Artificial Bee Colony Optimized Deep Neural Network Model for Handling Imbalanced Stroke Data,” International Journal of E-Health and Medical Communications, vol. 12, no. 5, pp. 67–83, Sep. 2021, doi: https://doi.org/10.4018/ijehmc.20210901.oa5.
[12] T. Liu, W. Fan, and C. Wu, “A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset,” Artificial Intelligence in Medicine, vol. 101, p. 101723, Nov. 2019, doi: https://doi.org/10.1016/j.artmed.2019.101723.
[13] L. I. Santos et al., “Decision tree and artificial immune systems for stroke prediction in imbalanced data,” Expert Systems with Applications, vol. 191, p. 116221, Apr. 2022, doi: https://doi.org/10.1016/j.eswa.2021.116221.
[14] S. M. Hassan, S. A. Ali, B. Hassan, I. Hussain, M. Rafiq, and S. A. Awan, “Hybrid Features Binary Classification of Imbalance Stroke Patients Using Different Machine Learning Algorithms,” International Journal of Biology and Biomedical Engineering, vol. 16, pp. 154–160, Jan. 2022, doi: https://doi.org/10.46300/91011.2022.16.20.
[15] T. Ahammad, “Risk factors identification for stroke prognosis using machine learning algorithms,” Jordanian Journal of Computers and Information Technology, no. 0, p. 1, 2022, doi: https://doi.org/10.5455/jjcit.71-1652725746.
[16] E. L. Cooper, “Evolution of immune systems from self/not self to danger to artificial immune systems (AIS),” Physics of Life Reviews, vol. 7, no. 1, pp. 55–78, Mar. 2010, doi: https://doi.org/10.1016/j.plrev.2009.12.001.
[17] J. Timmis, A. Hone, T. Stibor, and E. Clark, “Theoretical advances in artificial immune systems,” Theoretical Computer Science, vol. 403, no. 1, pp. 11–32, Aug. 2008, doi: https://doi.org/10.1016/j.tcs.2008.02.011.
[18] E. L. Cooper, “Evolution of immune systems from self/not self to danger to artificial immune systems (AIS),” Physics of Life Reviews, vol. 7, no. 1, pp. 55–78, Mar. 2010, doi: https://doi.org/10.1016/j.plrev.2009.12.001.
[19] I. Fister Jr, X.-S. Yang, I. Fister, and J. Brest, “Memetic firefly algorithm for combinatorial optimization,” arXiv:1204.5165 [math], May 2012, Accessed: Feb. 19, 2023. [Online]. Available: https://arxiv.org/abs/1204.5165.
[20] N. V. Chawla, “Data Mining for Imbalanced Datasets: An Overview,” in Data Mining and Knowledge Discovery Handbook, 2009, pp. 875–886. doi: https://doi.org/10.1007/978-0-387-09823-4_45. [21] Kahraman, C., Engin, O. and Yilmaz, M.K. (2009) ‘A new artificial immune system algorithm for Multiobjective Fuzzy Flow Shop’, International Journal of Computational Intelligence Systems, 2(3), pp. 236–247. doi:10.1080/18756891.2009.9727656. [22] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463–484, Jul. 2012, doi: https://doi.org/10.1109/tsmcc.2011.2161285.
[23] E.-H. A. Rady and A. S. Anwar, “Prediction of kidney disease stages using data mining algorithms,” Informatics in Medicine Unlocked, vol. 15, p. 100178, 2019, doi: https://doi.org/10.1016/j.imu.2019.100178.
[24] M. F. S. V. D’Angelo, R. M. Palhares, M. C. O. Camargos Filho, R. D. Maia, J. B. Mendes, and P. Ya. Ekel, “A new fault classification approach applied to Tennessee Eastman benchmark process,” Applied Soft Computing, vol. 49, pp. 676–686, Dec. 2016, doi: https://doi.org/10.1016/j.asoc.2016.08.040.
[25] T. Liu, “Data for: A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets,” Mendeley, http://dx.doi.org/10. 17632/X8YGRW87JW.1, 2019, URL: https://data.mendeley.com/datasets/x8ygrw87jw/1.

Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması

Yıl 2024, Cilt: 12 Sayı: 3, 1723 - 1738, 31.07.2024

Şerife Çelikbaş , Zeynep Orman , Türker Aksoy , Derya Yılmaz Baysoy

https://doi.org/10.29130/dubited.1268348

Öz

İnme, beyinde kanama ya da tıkanma sonucu oluşan nörolojik bir hastalıktır ve dünya genelinde giderek yaygınlaşmaktadır. Doğrudan ölümlere sebep olabildiği gibi sakatlıklara da yol açabilmektedir. Genel geçer öngörülebilir bir teşhis yöntemi bulunmadığından erken teşhisi oldukça zordur. Bununla birlikte, tekrarlanabilecek inme durumlarını tespit etmek de hayati bir önem taşımaktadır. Yapay zekâ teknikleri kullanılarak erken inme tahmini konusu literatürde birçok kez ele alınarak üzerinde çalışmalar yapılmış; ancak hala geliştirilmeye açık alanlardan birisidir. Bu çalışmada, hasta verilerinin azınlıkta olduğu bir inme veri kümesi üzerinde dengeleme sorununu gidermek amacıyla bir model önerilmektedir. Önerilen bu modelde, veri dengeleme işlemi için parametreleri ateş böceği algoritmasına göre güncellenen bir yapay bağışıklık sistemi algoritması kullanılmıştır. Kullanılan algoritma çıktıları, azınlık sınıfın performansını arttırmak amacıyla Tek Taraflı Seçilim modeline göre düzenlenmiştir. Modelin verimliliği, Kategorik Artırma Algoritması (CatBoost), Hafif Gradyan Artırma Makinesi (LightGBMBoost), Gradyan Artırma (Gradient Boosting - GB), Ekstrem Gradyan Arttırma (Extreme Gradient Boosting - XGBoost), Destek Vektör Makinası (Support Vector Machine - SVM) ve Lojistik Regresyon (Logistic Regression - LR) algoritması olmak üzere altı farklı sınıflandırma algoritmasına göre değerlendirilerek performans metrikleriyle sunulmuştur. Önerilen yaklaşımda doğruluk %86, özgüllük %38, hassasiyet %87 oranlarında elde edilerek literatürdeki çalışmalara kıyasla etkili sonuçlar üretildiği gösterilmiştir.

Anahtar Kelimeler

İnme Hastalığı, Dengesiz Veri Kümesi, Yapay Bağışıklık Sistemi, Ateş Böceği Algoritması

Kaynakça

[1] M. O. Owolabi et al., “The state of stroke services across the globe: Report of World Stroke Organization–World Health Organization surveys,” International Journal of Stroke, vol. 16, no. 8, pp. 889–901, May 2021, doi: https://doi.org/10.1177/17474930211019568.
[2] Y. Chen, K. T. Abel, J. T. Janecek, Y. Chen, K. Zheng, and S. C. Cramer, “Home-based technologies for stroke rehabilitation: A systematic review,” International Journal of Medical Informatics, vol. 123, pp. 11–22, Mar. 2019, doi: https://doi.org/10.1016/j.ijmedinf.2018.12.001.
[3] M. J. O’Donnell et al., “Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study,” Lancet (London, England), vol. 388, no. 10046, pp. 761–75, 2016, doi: https://doi.org/10.1016/S0140-6736(16)30506-2.
[4] A. K. Arslan, C. Colak, and M. E. Sarihan, “Different medical data mining approaches based prediction of ischemic stroke,” Computer Methods and Programs in Biomedicine, vol. 130, pp. 87–92, Jul. 2016, doi: https://doi.org/10.1016/j.cmpb.2016.03.022.
[5] D. I. Puspitasari, A. F. Riza Kholdani, A. Dharmawati, M. E. Rosadi, and W. Mega Pradnya Dhuhita, “Stroke Disease Analysis and Classification Using Decision Tree and Random Forest Methods,” IEEE Xplore, Nov. 01, 2021. https://ieeexplore.ieee.org/document/9632906 (accessed Dec. 10, 2022).
[6] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, May 2017, doi: https://doi.org/10.1016/j.eswa.2016.12.035.
[7] J. Li et al., “Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data,” PLOS ONE, vol. 12, no. 7, p. e0180830, Jul. 2017, doi: https://doi.org/10.1371/journal.pone.0180830. [8] F. Yagin, I. Cicek, and Z. Kucukakcali, “Classification of stroke with gradient boosting tree using smote-based oversampling method,” Medicine Science | International Medical Journal, vol. 10, no. 4, p. 1510, 2021, doi: https://doi.org/10.5455/medscience.2021.09.322. [9] G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021, doi: https://doi.org/10.14569/ijacsa.2021.0120662.
[10] C. Rana, N. Chitre, B. Poyekar, and P. Bide, “Stroke Prediction Using Smote-Tomek and Neural Network,” 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2021, doi: https://doi.org/10.1109/icccnt51525.2021.9579763. [11] A. Dev and S. K. Malik, “Artificial Bee Colony Optimized Deep Neural Network Model for Handling Imbalanced Stroke Data,” International Journal of E-Health and Medical Communications, vol. 12, no. 5, pp. 67–83, Sep. 2021, doi: https://doi.org/10.4018/ijehmc.20210901.oa5.
[12] T. Liu, W. Fan, and C. Wu, “A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset,” Artificial Intelligence in Medicine, vol. 101, p. 101723, Nov. 2019, doi: https://doi.org/10.1016/j.artmed.2019.101723.
[13] L. I. Santos et al., “Decision tree and artificial immune systems for stroke prediction in imbalanced data,” Expert Systems with Applications, vol. 191, p. 116221, Apr. 2022, doi: https://doi.org/10.1016/j.eswa.2021.116221.
[14] S. M. Hassan, S. A. Ali, B. Hassan, I. Hussain, M. Rafiq, and S. A. Awan, “Hybrid Features Binary Classification of Imbalance Stroke Patients Using Different Machine Learning Algorithms,” International Journal of Biology and Biomedical Engineering, vol. 16, pp. 154–160, Jan. 2022, doi: https://doi.org/10.46300/91011.2022.16.20.
[15] T. Ahammad, “Risk factors identification for stroke prognosis using machine learning algorithms,” Jordanian Journal of Computers and Information Technology, no. 0, p. 1, 2022, doi: https://doi.org/10.5455/jjcit.71-1652725746.
[16] E. L. Cooper, “Evolution of immune systems from self/not self to danger to artificial immune systems (AIS),” Physics of Life Reviews, vol. 7, no. 1, pp. 55–78, Mar. 2010, doi: https://doi.org/10.1016/j.plrev.2009.12.001.
[17] J. Timmis, A. Hone, T. Stibor, and E. Clark, “Theoretical advances in artificial immune systems,” Theoretical Computer Science, vol. 403, no. 1, pp. 11–32, Aug. 2008, doi: https://doi.org/10.1016/j.tcs.2008.02.011.
[18] E. L. Cooper, “Evolution of immune systems from self/not self to danger to artificial immune systems (AIS),” Physics of Life Reviews, vol. 7, no. 1, pp. 55–78, Mar. 2010, doi: https://doi.org/10.1016/j.plrev.2009.12.001.
[19] I. Fister Jr, X.-S. Yang, I. Fister, and J. Brest, “Memetic firefly algorithm for combinatorial optimization,” arXiv:1204.5165 [math], May 2012, Accessed: Feb. 19, 2023. [Online]. Available: https://arxiv.org/abs/1204.5165.
[20] N. V. Chawla, “Data Mining for Imbalanced Datasets: An Overview,” in Data Mining and Knowledge Discovery Handbook, 2009, pp. 875–886. doi: https://doi.org/10.1007/978-0-387-09823-4_45. [21] Kahraman, C., Engin, O. and Yilmaz, M.K. (2009) ‘A new artificial immune system algorithm for Multiobjective Fuzzy Flow Shop’, International Journal of Computational Intelligence Systems, 2(3), pp. 236–247. doi:10.1080/18756891.2009.9727656. [22] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463–484, Jul. 2012, doi: https://doi.org/10.1109/tsmcc.2011.2161285.
[23] E.-H. A. Rady and A. S. Anwar, “Prediction of kidney disease stages using data mining algorithms,” Informatics in Medicine Unlocked, vol. 15, p. 100178, 2019, doi: https://doi.org/10.1016/j.imu.2019.100178.
[24] M. F. S. V. D’Angelo, R. M. Palhares, M. C. O. Camargos Filho, R. D. Maia, J. B. Mendes, and P. Ya. Ekel, “A new fault classification approach applied to Tennessee Eastman benchmark process,” Applied Soft Computing, vol. 49, pp. 676–686, Dec. 2016, doi: https://doi.org/10.1016/j.asoc.2016.08.040.
[25] T. Liu, “Data for: A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets,” Mendeley, http://dx.doi.org/10. 17632/X8YGRW87JW.1, 2019, URL: https://data.mendeley.com/datasets/x8ygrw87jw/1.

Toplam 20 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Şerife Çelikbaş 0000-0001-6118-9335 Zeynep Orman 0000-0002-0205-4198 Türker Aksoy 0000-0001-5258-9038 Derya Yılmaz Baysoy 0000-0002-8101-9779
Yayımlanma Tarihi	31 Temmuz 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 12 Sayı: 3

Kaynak Göster

APA	Çelikbaş, Ş., Orman, Z., Aksoy, T., Yılmaz Baysoy, D. (2024). Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması. Duzce University Journal of Science and Technology, 12(3), 1723-1738. https://doi.org/10.29130/dubited.1268348
AMA	Çelikbaş Ş, Orman Z, Aksoy T, Yılmaz Baysoy D. Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması. DÜBİTED. Temmuz 2024;12(3):1723-1738. doi:10.29130/dubited.1268348
Chicago	Çelikbaş, Şerife, Zeynep Orman, Türker Aksoy, ve Derya Yılmaz Baysoy. “Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology 12, sy. 3 (Temmuz 2024): 1723-38. https://doi.org/10.29130/dubited.1268348.
EndNote	Çelikbaş Ş, Orman Z, Aksoy T, Yılmaz Baysoy D (01 Temmuz 2024) Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması. Duzce University Journal of Science and Technology 12 3 1723–1738.
IEEE	Ş. Çelikbaş, Z. Orman, T. Aksoy, ve D. Yılmaz Baysoy, “Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması”, DÜBİTED, c. 12, sy. 3, ss. 1723–1738, 2024, doi: 10.29130/dubited.1268348.
ISNAD	Çelikbaş, Şerife vd. “Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology 12/3 (Temmuz 2024), 1723-1738. https://doi.org/10.29130/dubited.1268348.
JAMA	Çelikbaş Ş, Orman Z, Aksoy T, Yılmaz Baysoy D. Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması. DÜBİTED. 2024;12:1723–1738.
MLA	Çelikbaş, Şerife vd. “Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology, c. 12, sy. 3, 2024, ss. 1723-38, doi:10.29130/dubited.1268348.
Vancouver	Çelikbaş Ş, Orman Z, Aksoy T, Yılmaz Baysoy D. Dengesiz Veri Kümelerinde İnme Tahmini İçin Özel Seçilimli Hibrit Dengeleme Yöntemi Tasarımı ve Uygulaması. DÜBİTED. 2024;12(3):1723-38.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin