Research Article

A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets

Volume: 22 Number: 2 June 30, 2026
EN

A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets

Abstract

In healthcare datasets, imbalanced class distributions and missing data pose significant challenges to the performance and stability of machine learning models, thereby hindering accurate analysis and disease diagnosis. Addressing these challenges is crucial for improving both the precision and reliability of healthcare data analysis. This paper proposes a novel preprocessing framework specifically designed for healthcare datasets to mitigate issues related to incomplete data and class imbalance. The framework introduces a new imputation method, GA-MICE, which enhances the Multiple Imputation by Chained Equations (MICE) technique using a Genetic Algorithm (GA) to improve the accuracy of handling missing data. Additionally, the framework incorporates the GASMOTEPSO_ENN method, which combines the Synthetic Minority Oversampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms with GA and Particle Swarm Optimization (PSO) heuristics to effectively address class imbalance. After preprocessing, six machine learning classifiers are employed to categorize individuals as either patients or healthy subjects. The model's performance is evaluated using multiple metrics, including accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC). Experimental results demonstrate the effectiveness of the proposed approach in managing missing data and addressing class imbalance, achieving performance close to or exceeding existing methodologies reported in the literature.

Keywords

Ethical Statement

There are no ethical issues after the publication of this manuscript.

References

  1. [1]. García-Laencina, PJ, Sancho-Gómez, JL, et al. 2010. Pattern classification with missing data: a review. Neural Computing and Applications; 19: 263-282. https://doi.org/10.1007/s00521-009-0295-6
  2. [2]. Lin, W-C, Ke, S-W, et al. 2017. When should we ignore examples with missing values? International Journal of Data Warehousing and Mining (IJDWM); 13(4): 53-63. https://doi.org/10.4018/ijdwm.2017100104
  3. [3]. Bertsimas, D, Pawlowski, C, et al. 2018. From predictive methods to missing data imputation: an optimization approach. Journal of Machine Learning Research; 18(196): 1-39. https://www.jmlr.org/papers/v18/17-073.html
  4. [4]. Lin, W-C, Tsai, C-F, et al. 2022. Deep learning for missing value imputation of continuous data and the effect of data discretization. Knowledge-Based Systems; 239: 108079. https://doi.org/10.1016/j.knosys.2021.108079
  5. [5]. Nizam-Ozogur, H, Orman, Z. 2024. A heuristic-based hybrid sampling method using a combination of smote and enn for imbalanced health data. Expert Systems; 41(8): e13596. https://doi.org/10.1111/exsy.13596
  6. [6]. Parhi, SK, Patro, SK. 2023. Compressive strength prediction of pet fiber-reinforced concrete using dolphin echolocation optimized decision tree-based machine learning algorithms. Asian Journal of Civil Engineering; 25(1): 977-996. https://doi.org/10.1007/s42107-023-00826-8
  7. [7]. Zhou, X-H, Eckert, GJ, et al. 2001. Multiple imputation in public health research. Statistics in Medicine; 20(9-10): 1541-1549. https://doi.org/10.1002/sim.689
  8. [8]. Khan, SI, Hoque, ASML. 2020. Sice: an improved missing data imputation technique. Journal of Big Data; 7(1): 37. https://doi.org/10.1186/s40537-020-00313-w

Details

Primary Language

English

Subjects

Biomedical Diagnosis

Journal Section

Research Article

Publication Date

June 30, 2026

Submission Date

August 16, 2025

Acceptance Date

January 26, 2026

Published in Issue

Year 2026 Volume: 22 Number: 2

APA
Nizam Özoğur, H., & Orman, Z. (2026). A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets. Celal Bayar University Journal of Science, 22(2), 225-235. https://doi.org/10.18466/cbayarfbe.1766229
AMA
1.Nizam Özoğur H, Orman Z. A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets. CBUJOS. 2026;22(2):225-235. doi:10.18466/cbayarfbe.1766229
Chicago
Nizam Özoğur, Hatice, and Zeynep Orman. 2026. “A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets”. Celal Bayar University Journal of Science 22 (2): 225-35. https://doi.org/10.18466/cbayarfbe.1766229.
EndNote
Nizam Özoğur H, Orman Z (June 1, 2026) A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets. Celal Bayar University Journal of Science 22 2 225–235.
IEEE
[1]H. Nizam Özoğur and Z. Orman, “A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets”, CBUJOS, vol. 22, no. 2, pp. 225–235, June 2026, doi: 10.18466/cbayarfbe.1766229.
ISNAD
Nizam Özoğur, Hatice - Orman, Zeynep. “A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets”. Celal Bayar University Journal of Science 22/2 (June 1, 2026): 225-235. https://doi.org/10.18466/cbayarfbe.1766229.
JAMA
1.Nizam Özoğur H, Orman Z. A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets. CBUJOS. 2026;22:225–235.
MLA
Nizam Özoğur, Hatice, and Zeynep Orman. “A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets”. Celal Bayar University Journal of Science, vol. 22, no. 2, June 2026, pp. 225-3, doi:10.18466/cbayarfbe.1766229.
Vancouver
1.Hatice Nizam Özoğur, Zeynep Orman. A Genetic Algorithm-Enhanced Method for Missing Value Imputation in Healthcare Datasets. CBUJOS. 2026 Jun. 1;22(2):225-3. doi:10.18466/cbayarfbe.1766229