Balkan Journal of Electrical and Computer Engineering

2147-284X 2147-284X

MUSA YILMAZ

10.17694/bajece.1691905

Electrical Engineering (Other)

Elektrik Mühendisliği (Diğer)

Heart Attack Classification with a Machine Learning Approach Based on the Random Forest Algorithm

Random Forest Algoritmasına Dayalı Makine Öğrenmesi Yaklaşımıyla Kalp Krizi Sınıflandırması

https://orcid.org/0000-0002-4564-8076

Dal

Süleyman

BATMAN ÜNİVERSİTESİ

https://orcid.org/0000-0002-4893-6014

Sezgin

Necmettin

BATMAN ÜNİVERSİTESİ

06 30 2025

13 2 140 147 05 05 2025 05 15 2025

2013

Balkan Journal of Electrical and Computer Engineering

Heart attack diagnosis delays constitute a critical health problem that increases the risk of mortality. Timely and accurate identification of cardiac events is therefore essential to improve patient outcomes and reduce preventable deaths. This study aims to develop a random forest based classification model using the Heart Disease Classification dataset published on the Kaggle platform to support early diagnosis. This dataset consists of 1319 samples and 8 demographic, clinical and biochemical features for the diagnosis of heart disease. To evaluate the model’s reliability and generalizability, a 10-fold cross-validation technique was employed. Through this method, each data instance contributed to both training and testing phases, enabling a more stable and robust performance assessment. This approach also reduced the risk of overfitting and ensured more representative evaluation metrics. The performance of the model was evaluated with ROC curve, training-validation curves, confusion matrix. In the evaluation process, especially in Fold 6, 100% accuracy, precision, recall and F1 score were obtained and it was revealed that the model showed superior performance in the classification task. In addition, as a result of the feature importance analysis, it was determined that troponin, potassium (kcm) and age variables came to the forefront in the decision process. This study aims to fill an important gap in the literature in terms of both strong classification performance and interpretability in the field of machine learning models for heart attack diagnosis.

Kalp krizi tanısındaki gecikmeler, mortalite riskini artıran kritik bir sağlık sorunu oluşturmaktadır. Bu nedenle, kardiyak olayların zamanında ve doğru bir şekilde tanımlanması, hasta sonuçlarını iyileştirmek ve önlenebilir ölümleri azaltmak açısından büyük önem taşımaktadır. Bu çalışma, erken tanıyı desteklemek amacıyla Kaggle platformunda yayımlanan Kalp Hastalığı Sınıflandırma veri seti kullanılarak Random Forest tabanlı bir sınıflandırma modeli geliştirmeyi amaçlamaktadır. Bu veri seti, kalp hastalığı tanısı için 1319 örneklem ve 8 demografik, klinik ve biyokimyasal özelliği içermektedir. Modelin güvenilirliğini ve genellenebilirliğini değerlendirmek için 10 katlı çapraz doğrulama yöntemi kullanılmıştır. Bu yöntem sayesinde her bir veri örneği hem eğitim hem de test aşamalarına katkı sağlamış, böylece daha kararlı ve sağlam bir performans değerlendirmesi yapılmıştır. Aynı zamanda bu yaklaşım, aşırı öğrenme riskini azaltmış ve daha temsil edici değerlendirme metrikleri elde edilmesini sağlamıştır. Modelin performansı ROC eğrisi, eğitim-doğrulama eğrileri ve karışıklık matrisi ile değerlendirilmiştir. Değerlendirme sürecinde özellikle 6. katmanda %100 doğruluk, kesinlik, duyarlılık ve F1 skoru elde edilmiş; modelin sınıflandırma görevinde üstün performans sergilediği ortaya konmuştur. Ayrıca, özellik önem düzeyi analizi sonucunda troponin, potasyum (kcm) ve yaş değişkenlerinin karar verme sürecinde öne çıktığı belirlenmiştir. Bu çalışma, kalp krizi tanısına yönelik makine öğrenmesi modelleri alanında hem güçlü sınıflandırma performansı hem de yorumlanabilirlik açısından literatürde önemli bir boşluğu doldurmayı hedeflemektedir.

Heart Attack Classification Machine Learning Random Forest Algorithm Clinical Decision Support Systems

Kalp Krizi Sınıflandırması Makine Öğrenmesi Random Forest Algoritması Klinik Karar Destek Sistemleri

"No funding

[1] H. F. El-Sofany, "Predicting heart diseases using machine learning and different data classification techniques," IEEE Access, 2024.

[2] H. G. Enad and M. A. Mohammed, "Cloud computing-based framework for heart disease classification using quantum machine learning approach," Journal of Intelligent Systems, vol. 33, no. 1, p. 20230261, 2024.

[3] T. A. Gaziano, A. Bitton, S. Anand, S. Abrahams-Gessel, and A. Murphy, "Growing epidemic of coronary heart disease in low-and middle-income countries," Current problems in cardiology, vol. 35, no. 2, pp. 72-115, 2010.

[4] C. Gupta, A. Saha, N. S. Reddy, and U. D. Acharya, "Cardiac Disease Prediction using Supervised Machine Learning Techniques," in Journal of physics: conference series, 2022, vol. 2161, no. 1: IOP Publishing, p. 012013.

[5] A. K. Dubey, A. K. Sinhal, and R. Sharma, "Heart disease classification through crow intelligence optimization-based deep learning approach," International Journal of Information Technology, vol. 16, no. 3, pp. 1815-1830, 2024.

[6] R. Rajkumar, K. Anandakumar, and A. Bharathi, "Coronary artery disease (CAD) prediction and classification-a survey," Breast Cancer, vol. 90, p. 94.35, 2006.

[7] P. Rani et al., "An extensive review of machine learning and deep learning techniques on heart disease classification and prediction," Archives of Computational Methods in Engineering, vol. 31, no. 6, pp. 3331-3349, 2024.

[8] I. H. Sarker, "Machine learning: Algorithms, real-world applications and research directions," SN computer science, vol. 2, no. 3, p. 160, 2021.

[9] Ö. F. Ertuğrul, S. Dal, Y. Hazar, and E. Aldemir, "Determining relevant features in activity recognition via wearable sensors on the MYO Armband," Arabian Journal for Science and Engineering, vol. 45, pp. 10097-10113, 2020.

[10] L. Fraiwan, K. Lweesy, N. Khasawneh, H. Wenz, and H. Dickhaus, "Automated sleep stage identification system based on time–frequency analysis of a single EEG channel and random forest classifier," Computer methods and programs in biomedicine, vol. 108, no. 1, pp. 10-19, 2012.

[11] D. R. Edla, K. Mangalorekar, G. Dhavalikar, and S. Dodia, "Classification of EEG data for human mental state analysis using Random Forest Classifier," Procedia computer science, vol. 132, pp. 1523-1532, 2018.

[12] K. Natarajan et al., "Efficient heart disease classification through stacked ensemble with optimized firefly feature selection," International Journal of Computational Intelligence Systems, vol. 17, no. 1, p. 174, 2024.

[13] B. Deekshatulu and P. Chandra, "Classification of heart disease using k-nearest neighbor and genetic algorithm," Procedia technology, vol. 10, pp. 85-94, 2013.

[14] N. Kosaraju, S. R. Sankepally, and K. Mallikharjuna Rao, "Categorical data: Need, encoding, selection of encoding method and its emergence in machine learning models—a practical review study on heart disease prediction dataset using pearson correlation," in Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 1, 2023: Springer, pp. 369-382.

[15] T. Amarbayasgalan, V.-H. Pham, N. Theera-Umpon, Y. Piao, and K. H. Ryu, "An efficient prediction method for coronary heart disease risk based on two deep neural networks trained on well-ordered training datasets," IEEE Access, vol. 9, pp. 135210-135223, 2021.

[16] Y. Hazar and Ö. F. Ertuğrul, "Process management in diabetes treatment by blending technique," Computers in Biology and Medicine, vol. 190, p. 110034, 2025.

[17] P. Soltanzadeh and M. Hashemzadeh, "RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem," Information Sciences, vol. 542, pp. 92-111, 2021.

[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.

[19] S. Hegelich, "Decision trees and random forests: Machine learning techniques to classify rare events," European policy analysis, vol. 2, no. 1, pp. 98-120, 2016.

[20] G. A. B. Suryanegara and M. D. Purbolaksono, "Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi," Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 5, no. 1, pp. 114-122, 2021.

[21] S. Suparyati, E. Utami, and A. H. Muhammad, "Applying different resampling strategies in random forest algorithm to predict lumpy skin disease," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 555-562, 2022.

[22] R. Oktafiani, A. Hermawan, and D. Avianto, "Max Depth Impact on Heart Disease Classification: Decision Tree and Random Forest," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 1, pp. 160-168, 2024.

[23] I. Tougui, A. Jilbab, and J. El Mhamdi, "Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications," Healthcare informatics research, vol. 27, no. 3, pp. 189-199, 2021.

[24] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2022.

[25] M. Yin, J. Wortman Vaughan, and H. Wallach, "Understanding the effect of accuracy on trust in machine learning models," in Proceedings of the 2019 chi conference on human factors in computing systems, 2019, pp. 1-12.

[26] J. Davis and M. Goadrich, "The relationship between Precision-Recall and ROC curves," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233-240.

[27] A. Humphrey et al., "Machine-learning classification of astronomical sources: estimating F1-score in the absence of ground truth," Monthly Notices of the Royal Astronomical Society: Letters, vol. 517, no. 1, pp. L116-L120, 2022.

[28] J. Liang, "Confusion matrix: Machine learning," POGIL Activity Clearinghouse, vol. 3, no. 4, 2022.

[29] I. Markoulidakis and G. Markoulidakis, "Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis," Technologies, vol. 12, no. 7, p. 113, 2024.

[30] V. A. Huynh-Thu, Y. Saeys, L. Wehenkel, and P. Geurts, "Statistical interpretation of machine learning-based feature importance scores for biomarker discovery," Bioinformatics, vol. 28, no. 13, pp. 1766-1774, 2012.

[31] F. Pan, T. Converse, D. Ahn, F. Salvetti, and G. Donato, "Feature selection for ranking using boosted trees," in Proceedings of the 18th ACM conference on Information and knowledge management, 2009, pp. 2025-2028.

[32] A. A. Megantara and T. Ahmad, "Feature importance ranking for increasing performance of intrusion detection system," in 2020 3rd International Conference on Computer and Informatics Engineering (IC2IE), 2020: IEEE, pp. 37-42.

[33] M. A. Jamil and S. Khanam, "Influence of one-way ANOVA and Kruskal–Wallis based feature ranking on the performance of ML classifiers for bearing fault diagnosis," Journal of Vibration Engineering & Technologies, vol. 12, no. 3, pp. 3101-3132, 2024.

[34] N. Silpa, V. M. Rao, M. V. Subbarao, R. R. Kurada, S. S. Reddy, and P. J. Uppalapati, "An enriched employee retention analysis system with a combination strategy of feature selection and machine learning techniques," in 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), 2023: IEEE, pp. 142-149.

[35] C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn, "Bias in random forest variable importance measures: Illustrations, sources and a solution," BMC bioinformatics, vol. 8, pp. 1-21, 2007.

[36] B. M. Greenwell, B. C. Boehmke, and A. J. McCarthy, "A simple and effective model-based variable importance measure," arXiv preprint arXiv:1805.04755, 2018.

[37] Bharath011, Heart Disease Classification Dataset, Kaggle, 2022. [Çevrimiçi]. Erişim adresi:https://www.kaggle.com/datasets/bharath011/heart-disease-classification dataset