Research Article

Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data

Volume: 16 Number: 2 June 1, 2026
TR EN

Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data

Abstract

Cardiovascular diseases (CVD) are a leading global cause of death and morbidity. This study evaluates data balancing techniques (SMOTE, ENN, SMOTE-ENN, SMOTE-Tomek) and machine learning (ML) algorithms for predicting CVD risk using big data. The 2021 CDC BRFSS dataset, with 308,854 records, was preprocessed by removing missing and irrelevant data. The dataset was split into 80% training and 20% testing subsets. ML models, including logistic regression, random forest, LightGBM, XGBoost, and CatBoost, were trained on balanced data. Performance metrics such as accuracy, precision, recall, F1 score, ROC curve, and AUC were used for evaluation. SMOTE-ENN and SMOTE-Tomek improved model performance, with LightGBM and CatBoost achieving the highest AUC and F1 scores. Results demonstrate that data balancing, especially SMOTE-ENN, enhances model sensitivity, aiding CVD risk identification. These findings underscore the potential for ML in nursing to develop targeted interventions and improve outcomes.

Keywords

References

  1. Ali, Z. A., Abduljabbar, Z. H., Taher, H. A., Sallow, A. B., & Almufti, S. M. (2023). Exploring the power of eXtreme gradient boosting algorithm in machine learning: A review. Academic Journal of Nawroz University, 12(2), 320-334.
  2. Aslan, E., & Özüpak, Y. (2025a). Improving Accuracy Through Preprocessing and Data Augmentation Techniques with a Deep Learning-Based Approach for Arrhythmia Detection. International Journal of Integrated Engineering, 17(5), 376-388.
  3. Aslan, E., & Özüpak, Y. (2025b). Comparison of machine learning algorithms for automatic prediction of Alzheimer disease. Journal of the Chinese Medical Association, 88(2), 98-107.
  4. Aslan, E., Özüpak, Y., & Alpsalaz, F. (2025a). Boiler efficiency and performance optimization in district heating and cooling systems with machine learning models. Journal of the Chinese Institute of Engineers, 1-16.
  5. Aslan, E., Ozupak, Y., Alpsalaz, F., & Elbarbary, Z. M. (2025b). A Hybrid Machine Learning Approach for Predicting Power Transformer Failures Using Internet of Things Based Monitoring and Explainable Artificial Intelligence. IEEE Access.
  6. Assyifa, D. S., & Luthfiarta, A. (2024). SMOTE-Tomek Re-sampling Based on Random Forest Method to Overcome Unbalanced Data for Multi-class Classification. Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, 9(2), 151-160.
  7. Baddah, W., Qasem, H. A., Alsabry, A., Al Gawani, R. S., Alzuraiqi, W. M., & Hanash, F. E. (2024, August). Optimizing Heart Disease Prediction Models through SMOTE: Addressing Data Imbalance. In 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA) (pp. 1-10). IEEE.
  8. Barker, J., Li, X., Khavandi, S., Koeckerling, D., Mavilakandy, A., Pepper, C., ... & Ng, G. A. (2022). Machine learning in sudden cardiac death risk prediction: a systematic review. Europace, 24(11), 1777-1787.

Details

Primary Language

English

Subjects

Software Engineering (Other)

Journal Section

Research Article

Publication Date

June 1, 2026

Submission Date

October 9, 2025

Acceptance Date

December 28, 2025

Published in Issue

Year 2026 Volume: 16 Number: 2

APA
Özsezer, G. (2026). Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data. Journal of the Institute of Science and Technology, 16(2), 461-487. https://doi.org/10.21597/jist.1800624
AMA
1.Özsezer G. Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data. J. Inst. Sci. and Tech. 2026;16(2):461-487. doi:10.21597/jist.1800624
Chicago
Özsezer, Gözde. 2026. “Comparison of Different Machine Learning Models With Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data”. Journal of the Institute of Science and Technology 16 (2): 461-87. https://doi.org/10.21597/jist.1800624.
EndNote
Özsezer G (June 1, 2026) Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data. Journal of the Institute of Science and Technology 16 2 461–487.
IEEE
[1]G. Özsezer, “Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data”, J. Inst. Sci. and Tech., vol. 16, no. 2, pp. 461–487, June 2026, doi: 10.21597/jist.1800624.
ISNAD
Özsezer, Gözde. “Comparison of Different Machine Learning Models With Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data”. Journal of the Institute of Science and Technology 16/2 (June 1, 2026): 461-487. https://doi.org/10.21597/jist.1800624.
JAMA
1.Özsezer G. Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data. J. Inst. Sci. and Tech. 2026;16:461–487.
MLA
Özsezer, Gözde. “Comparison of Different Machine Learning Models With Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data”. Journal of the Institute of Science and Technology, vol. 16, no. 2, June 2026, pp. 461-87, doi:10.21597/jist.1800624.
Vancouver
1.Gözde Özsezer. Comparison of Different Machine Learning Models with Data Balancing for Prediction of Cardiovascular Disease Risks Based on Big Data. J. Inst. Sci. and Tech. 2026 Jun. 1;16(2):461-87. doi:10.21597/jist.1800624