Research Article
BibTex RIS Cite

Evaluating The Effects of Hyperparameter Tuning and Data Balancing on Machine Learning Algorithms Used for Heart Disease Prediction

Year 2024, , 45 - 58, 31.01.2024
https://doi.org/10.17671/gazibtd.1399813

Abstract

— Neglecting the symptoms of heart disease can result in serious conditions and even death. Machine learning techniques can be used to make predictions about whether a person has heart disease based on these symptoms. In this study, heart disease prediction was performed using Logistic Regression, Decision Trees, Random Forest, K Nearest Neighbors, Naive Bayes, Gradient Boosting, XGBoost, and Bagging machine learning algorithms. Four separate datasets were created using data balancing methods such as SMOTE, SMOTETomek, Oversample Minority Class, and Undersample Majority Class. Hyperparameter optimization was conducted for all selected machine learning algorithms using Random Search and Bayesian Optimization techniques, and the results were compared. By comparing the impact of data balancing and hyperparameter optimization on the performance of machine learning techniques used in predicting heart disease, this study contributes to the literature with an original approach. The study utilized a dataset from a survey of 319,795 individuals in the United States, which included 20 relevant features. The Random Forest algorithm achieved a prediction accuracy of 94% in the model created using the SMOTETomek data balancing technique and Bayesian hyperparameter optimization. Additionally, the Random Forest algorithm, with the Oversample Minority Class data balancing technique and Bayesian hyperparameter optimization, achieved a classification accuracy of 97%.

References

  • [“Kardiyovasküler Hastalıklar.” Jan. 2021. [Online]. Available: https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
  • R. Venkatesh, C. Balasubramanian, and M. Kaliappan, “Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique,” J Med Syst, vol. 43, no. 8, Jan. 2019, doi: 10.1007/s10916-019-1398-y.
  • A. L. Yadav, K. Soni, and S. Khare, “Heart Diseases Prediction using Machine Learning,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–7. doi: 10.1109/ICCCNT56998.2023.10306469.
  • C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, 2023, doi: 10.3390/a16020088.
  • A. Özdemir, “Makine Öğrenmesi Algoritmaları ile Aritmilerin Sınıflandırılması,” Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 39, no. 3, pp. 394–402, 2023.
  • N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, no. 4, 2023, doi: 10.3390/pr11041210.
  • S. Anitha and N. Sridevi, “Heart Dısease Predıctıon Usıng Data Mınıng Technıques,” Journal of Analysis and Computation, vol. 13, no. 2, 2019, [Online]. Available: www.ijaconline.com,
  • M. E. Çifci, “Kalp Hastalıklarında Kullanılan Yapay Zekâ Teknikleri Ve Uygulamaları.” 2019.
  • P. Kamat and M. C. Beyene, “Survey on prediction and analysis the occurrence of heart disease using data mining techniques,” International Journal of Pure and Applied Mathematics, vol. 18, no. 8, 2018, [Online]. Available: https://www.researchgate.net/publication/323277772
  • A. Rajdhan, A. Agarwal, and M. Sai, “Heart Disease Prediction using Machine Learning,” IJERT Journal International Journal of Engineering Research & Technology. 2020. [Online]. Available: www.ijert.org
  • M. Görgün, “Makine Öğrenmesi Yöntemleri ile Kalp Hastalığının Teşhis Edilmesi,” Yüksek Lisans Tezi, Lisansüstü Eğitim Enstitüsü, İstanbul, 2020.
  • A. N. Repaka, S. D. Ravikanti, and R. G. Franklin, “Design and implementing heart disease prediction using naives Bayesian,” in Proceedings of the International Conference on Trends in Electronics and Informatics, ICOEI 2019, Institute of Electrical and Electronics Engineers Inc., Jan. 2019, pp. 292–297. doi: 10.1109/icoei.2019.8862604.
  • O. Köse, “Sınıflama ve Regresyon Ağaçları Tekniği İle Kalp Hastalıklarına Etki Eden Bazı Faktörlerin Belirlenmesi.” 2018.
  • V. V Ramalingam, A. Dandapath, and M. K. Raja, “Heart disease prediction using machine learning techniques: A survey,” International Journal of Engineering and Technology(UAE), vol. 7, no. 2.8 Special Issue 8, pp. 684–687, 2018, doi: 10.14419/ijet.v7i2.8.10557.
  • H. Sharma and M. A. Rizvi, “Prediction of Heart Diseaseusing Machine Learning Algorithms,” national Journal on Recent and Innovation Trends in Computing and Communication , vol. 5, no. 8, 2017.
  • K. H. Kamil, “Artıfıcıal Neural Network Approach For Heart Dısease Classıfıcatıon.” p. 58, 2020.
  • I. Salman, “Heart attack mortality prediction: An application of machine learning methods,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 27, no. 6, pp. 4378–4389, 2019, doi: 10.3906/ELK-1811-4.
  • S. Konda, A. Govardhan, and G. R. Rao, “Analysis of Coronary Heart Disease and Prediction of Heart Attack in Coal Mining Regions Using Data Mining Techniques,” K. Srinavas, Ed., 2020, pp. 1953–1957.
  • M. Tarawneh and O. Embarak, “Hybrid Approach for Heart Disease Prediction Using Data Mining Techniques,” Lecture Notes on Data Engineering and Communications Technologies, vol. 29. Springer Science and Business Media Deutschland GmbH, pp. 447–454, 2019. doi: 10.1007/978-3-030-12839-5_41.
  • E. Çil, “Makine Öğrenmesi Algoritmalarıyla Kalp Hastalıklarının Tespit Edilmesine Yönelik Performans Analizi.” 2022.
  • P. Kamil, “Personal Key Indicators of Heart Disease,” https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease. Jan. 2022.
  • N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16. pp. 321–357, 2002.
  • Z. Wang, C. Wu, K. Zheng, X. Niu, and X. Wang, “SMOTETomek-Based Resampling for Personality Recognition,” IEEE Access, vol. 7, pp. 129678–129689, 2019, doi: 10.1109/ACCESS.2019.2940061.
  • Miss. M. S. Shelke1, Dr. P. R. Deshmukh2, and Prof. V. K. Shandilya, “A Review on Imbalanced Data Handling UsingUndersampling and Oversampling Technique,” International Journal of Recent Trends in Engineering and Research, vol. 3, no. 4, pp. 444–449, Jan. 2017, doi: 10.23883/ijrter.2017.3168.0uwxm.
  • A. Sekulić, M. Kilibarda, G. B. M. Heuvelink, M. Nikolić, and B. Bajat, “Random forest spatial interpolation,” Remote Sens (Basel), vol. 12, no. 10, Jan. 2020, doi: 10.3390/rs12101687.
  • E. Deniz, “Yapay sinir ağları ve K-en yakın komşu algoritması ile toprak çeşitliliğinin belirlenmesi.” p. 69, 2021.
  • E. Akca, “Satış Tahminlemesinde Hibrit Bir Yaklaşım:Pestel, Rfm, Gradıent Boostıng.” Jan. 2022.
  • A. Abraham, · Paramartha, D. Jyotsna, K. Mandal, A. Bhattacharya, and S. Dutta, Advances in Intelligent Systems and Computing 813 Emerging Technologies in Data Mining and Information Security. [Online]. Available: http://www.springer.com/series/11156
  • T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Jan. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
  • M. Zareapoor and P. Shamsolmoali, “Application of credit card fraud detection: Based on bagging ensemble classifier,” in Procedia Computer Science, Elsevier B.V., 2015, pp. 679–685. doi: 10.1016/j.procs.2015.04.201.
  • M. COŞAR and E. DENİZ, “Makine Öğrenimi Algoritmaları Kullanarak Kalp Hastalıklarının Tespit Edilmesi,” European Journal of Science and Technology, Jan. 2021, doi: 10.31590/ejosat.1012986.
  • Ö. Ekrem, O. K. M. Salman, B. Aksoy, And S. A. İnan, “Yapay Zekâ Yöntemleri Kullanılarak Kalp Hastalığının Tespiti,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 8, no. 5, pp. 241–254, Jan. 2020, doi: 10.21923/jesd.824703.
  • M. Saqlain, W. Hussain, N. A. Saqib, and M. A. Khan, “Identification of Heart Failure by Using Unstructured Data of Cardiac Patients,” in Proceedings of the International Conference on Parallel Processing Workshops, Institute of Electrical and Electronics Engineers Inc., Jan. 2016, pp. 426–431. doi: 10.1109/ICPPW.2016.66.
  • M. E. TAŞÇI and R. ŞAMLI, “Veri Madenciliği İle Kalp Hastalığı Teşhisi,” European Journal of Science and Technology, pp. 88–95, Jan. 2020, doi: 10.31590/ejosat.araconf12.
  • S. GÜNDOĞDU, “Kalp hastalık risk tahmini için Python aracılığıyla sınıflandırıcı algoritmalarının performans değerlendirmesi,” Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol. 23, no. 69, pp. 1005–1013, 2021, doi: 10.21205/deufmd.2021236926.

Hiperparametre ayarlama ve veri dengelemenin kalp hastalığı tahmini için kullanılan makine öğrenimi algoritmaları üzerindeki etkilerinin incelenmesi

Year 2024, , 45 - 58, 31.01.2024
https://doi.org/10.17671/gazibtd.1399813

Abstract

Kalp hastalığı belirtilerinin ihmal edilmesi ciddi rahatsızlıklarla hatta ölümle sonuçlanabilir. Makine öğrenme teknikleri ile ön tanı için bu belirtiler kullanılarak kişide kalp hastalığı olup olmadığına dair tahmin yapılabilmektedir. Bu çalışmada Logistic Regression, Decision Trees, Random Forest, K Nearest Neighbors, Naive Bayes, Gradient Boosting, XGBoost ve Bagging algoritmaları ile kalp hastalığı tahmini yapılmıştır. SMOTE, SMOTETomek, Oversample Minority Class, Undersample Majority Class veri dengeleme yöntemleri ile dört ayrı veri seti oluşturulmuştur. Seçilen tüm makine öğrenme algoritmalarına Random Search ve Bayesian Optimizasyon teknikleriyle hiper parametre optimizasyonu yapılarak sonuçlar karşılaştırılmıştır. Veri dengeleme ve hiper parametre optimizasyonunun kalp hastalığının tahmininde kullanılan makine öğrenme teknikleri performansına etkisi karşılaştırılarak literatüre özgün bir çalışma kazandırılmıştır. Çalışmada Amerika Birleşik Devletleri’nde 319.795 kişi ile yapılan 20 öz nitelikli bir anket olan veri seti kullanılmıştır. Random Forest algoritması SMOTETomek veri dengeleme tekniği kullanılarak ve Bayesian hiper parametre optimizasyonu yapılarak oluşturulan modelde %94 tahmin başarısı elde edilmiştir. Ayrıca, Random Forest algoritması ile Oversample Minority Class veri dengeleme tekniği kullanılarak ve Bayesian hiper parametre optimizasyonu yapılarak %97 sınıflandırma doğruluğu elde edilmiştir.

Ethical Statement

Hazırlanan makalede etik kurul izni alınmasına gerek yoktur. Hazırlanan makalede herhangi bir kişi/kurum ile çıkar çatışması bulunmamaktadır.

References

  • [“Kardiyovasküler Hastalıklar.” Jan. 2021. [Online]. Available: https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
  • R. Venkatesh, C. Balasubramanian, and M. Kaliappan, “Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique,” J Med Syst, vol. 43, no. 8, Jan. 2019, doi: 10.1007/s10916-019-1398-y.
  • A. L. Yadav, K. Soni, and S. Khare, “Heart Diseases Prediction using Machine Learning,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–7. doi: 10.1109/ICCCNT56998.2023.10306469.
  • C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, 2023, doi: 10.3390/a16020088.
  • A. Özdemir, “Makine Öğrenmesi Algoritmaları ile Aritmilerin Sınıflandırılması,” Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 39, no. 3, pp. 394–402, 2023.
  • N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, no. 4, 2023, doi: 10.3390/pr11041210.
  • S. Anitha and N. Sridevi, “Heart Dısease Predıctıon Usıng Data Mınıng Technıques,” Journal of Analysis and Computation, vol. 13, no. 2, 2019, [Online]. Available: www.ijaconline.com,
  • M. E. Çifci, “Kalp Hastalıklarında Kullanılan Yapay Zekâ Teknikleri Ve Uygulamaları.” 2019.
  • P. Kamat and M. C. Beyene, “Survey on prediction and analysis the occurrence of heart disease using data mining techniques,” International Journal of Pure and Applied Mathematics, vol. 18, no. 8, 2018, [Online]. Available: https://www.researchgate.net/publication/323277772
  • A. Rajdhan, A. Agarwal, and M. Sai, “Heart Disease Prediction using Machine Learning,” IJERT Journal International Journal of Engineering Research & Technology. 2020. [Online]. Available: www.ijert.org
  • M. Görgün, “Makine Öğrenmesi Yöntemleri ile Kalp Hastalığının Teşhis Edilmesi,” Yüksek Lisans Tezi, Lisansüstü Eğitim Enstitüsü, İstanbul, 2020.
  • A. N. Repaka, S. D. Ravikanti, and R. G. Franklin, “Design and implementing heart disease prediction using naives Bayesian,” in Proceedings of the International Conference on Trends in Electronics and Informatics, ICOEI 2019, Institute of Electrical and Electronics Engineers Inc., Jan. 2019, pp. 292–297. doi: 10.1109/icoei.2019.8862604.
  • O. Köse, “Sınıflama ve Regresyon Ağaçları Tekniği İle Kalp Hastalıklarına Etki Eden Bazı Faktörlerin Belirlenmesi.” 2018.
  • V. V Ramalingam, A. Dandapath, and M. K. Raja, “Heart disease prediction using machine learning techniques: A survey,” International Journal of Engineering and Technology(UAE), vol. 7, no. 2.8 Special Issue 8, pp. 684–687, 2018, doi: 10.14419/ijet.v7i2.8.10557.
  • H. Sharma and M. A. Rizvi, “Prediction of Heart Diseaseusing Machine Learning Algorithms,” national Journal on Recent and Innovation Trends in Computing and Communication , vol. 5, no. 8, 2017.
  • K. H. Kamil, “Artıfıcıal Neural Network Approach For Heart Dısease Classıfıcatıon.” p. 58, 2020.
  • I. Salman, “Heart attack mortality prediction: An application of machine learning methods,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 27, no. 6, pp. 4378–4389, 2019, doi: 10.3906/ELK-1811-4.
  • S. Konda, A. Govardhan, and G. R. Rao, “Analysis of Coronary Heart Disease and Prediction of Heart Attack in Coal Mining Regions Using Data Mining Techniques,” K. Srinavas, Ed., 2020, pp. 1953–1957.
  • M. Tarawneh and O. Embarak, “Hybrid Approach for Heart Disease Prediction Using Data Mining Techniques,” Lecture Notes on Data Engineering and Communications Technologies, vol. 29. Springer Science and Business Media Deutschland GmbH, pp. 447–454, 2019. doi: 10.1007/978-3-030-12839-5_41.
  • E. Çil, “Makine Öğrenmesi Algoritmalarıyla Kalp Hastalıklarının Tespit Edilmesine Yönelik Performans Analizi.” 2022.
  • P. Kamil, “Personal Key Indicators of Heart Disease,” https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease. Jan. 2022.
  • N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16. pp. 321–357, 2002.
  • Z. Wang, C. Wu, K. Zheng, X. Niu, and X. Wang, “SMOTETomek-Based Resampling for Personality Recognition,” IEEE Access, vol. 7, pp. 129678–129689, 2019, doi: 10.1109/ACCESS.2019.2940061.
  • Miss. M. S. Shelke1, Dr. P. R. Deshmukh2, and Prof. V. K. Shandilya, “A Review on Imbalanced Data Handling UsingUndersampling and Oversampling Technique,” International Journal of Recent Trends in Engineering and Research, vol. 3, no. 4, pp. 444–449, Jan. 2017, doi: 10.23883/ijrter.2017.3168.0uwxm.
  • A. Sekulić, M. Kilibarda, G. B. M. Heuvelink, M. Nikolić, and B. Bajat, “Random forest spatial interpolation,” Remote Sens (Basel), vol. 12, no. 10, Jan. 2020, doi: 10.3390/rs12101687.
  • E. Deniz, “Yapay sinir ağları ve K-en yakın komşu algoritması ile toprak çeşitliliğinin belirlenmesi.” p. 69, 2021.
  • E. Akca, “Satış Tahminlemesinde Hibrit Bir Yaklaşım:Pestel, Rfm, Gradıent Boostıng.” Jan. 2022.
  • A. Abraham, · Paramartha, D. Jyotsna, K. Mandal, A. Bhattacharya, and S. Dutta, Advances in Intelligent Systems and Computing 813 Emerging Technologies in Data Mining and Information Security. [Online]. Available: http://www.springer.com/series/11156
  • T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Jan. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
  • M. Zareapoor and P. Shamsolmoali, “Application of credit card fraud detection: Based on bagging ensemble classifier,” in Procedia Computer Science, Elsevier B.V., 2015, pp. 679–685. doi: 10.1016/j.procs.2015.04.201.
  • M. COŞAR and E. DENİZ, “Makine Öğrenimi Algoritmaları Kullanarak Kalp Hastalıklarının Tespit Edilmesi,” European Journal of Science and Technology, Jan. 2021, doi: 10.31590/ejosat.1012986.
  • Ö. Ekrem, O. K. M. Salman, B. Aksoy, And S. A. İnan, “Yapay Zekâ Yöntemleri Kullanılarak Kalp Hastalığının Tespiti,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 8, no. 5, pp. 241–254, Jan. 2020, doi: 10.21923/jesd.824703.
  • M. Saqlain, W. Hussain, N. A. Saqib, and M. A. Khan, “Identification of Heart Failure by Using Unstructured Data of Cardiac Patients,” in Proceedings of the International Conference on Parallel Processing Workshops, Institute of Electrical and Electronics Engineers Inc., Jan. 2016, pp. 426–431. doi: 10.1109/ICPPW.2016.66.
  • M. E. TAŞÇI and R. ŞAMLI, “Veri Madenciliği İle Kalp Hastalığı Teşhisi,” European Journal of Science and Technology, pp. 88–95, Jan. 2020, doi: 10.31590/ejosat.araconf12.
  • S. GÜNDOĞDU, “Kalp hastalık risk tahmini için Python aracılığıyla sınıflandırıcı algoritmalarının performans değerlendirmesi,” Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol. 23, no. 69, pp. 1005–1013, 2021, doi: 10.21205/deufmd.2021236926.
There are 35 citations in total.

Details

Primary Language Turkish
Subjects Machine Learning (Other)
Journal Section Articles
Authors

Fuat Sungur 0000-0001-8589-4207

Halit Bakır 0000-0003-3327-2822

Publication Date January 31, 2024
Submission Date December 4, 2023
Acceptance Date January 26, 2024
Published in Issue Year 2024

Cite

APA Sungur, F., & Bakır, H. (2024). Hiperparametre ayarlama ve veri dengelemenin kalp hastalığı tahmini için kullanılan makine öğrenimi algoritmaları üzerindeki etkilerinin incelenmesi. Bilişim Teknolojileri Dergisi, 17(1), 45-58. https://doi.org/10.17671/gazibtd.1399813