Research Article

Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Volume: 8 Number: 1 June 22, 2025
EN

Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Abstract

The Cox regression method is generally used to model censored data. Recently, with the increase in data, new methods have been sought. This study aims to reclassify the censored data using the Fleming-Harrington method to apply machine learning techniques, thereby conducting survival analysis through machine learning classification methods. In practice, the censored data of acute leukemia patients were used, with four distinct sample sizes simulated using a correlation matrix obtained from this acute leukemia dataset. The data were adapted to the machine learning algorithm using the Fleming-Harrington method. Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machines methods were applied to the datasets from among the classification algorithms. Performance metrics, including accuracy, the area under the ROC Curve (AUC), and the F score, were used to compare these algorithms. Results showed that the Random Forest algorithm performed best for the actual dataset, while the Naïve Bayes algorithm produced the best outcomes for the simulated dataset. When examining the machine learning algorithm results, close values were found, with Naïve Bayes outperforming other algorithms in all situations. Comparisons between these datasets using the Cox regression method and Naïve Bayes algorithm AUC values revealed similar outcomes. However, as the sample size increased, the performance of the Cox regression method decreased, while the machine learning algorithms' performance increased. Therefore, machine learning algorithms can provide valuable insights into cancer patients' mortality status or the likelihood of disease recurrence in studies incorporating survival analyses, especially when the sample size is large.

Keywords

Survival analysis , Naïve Bayes , Censored data , Fleming-Harrington Estimator

References

  1. P. B. Snow, D. S. Smith, and W. J. Catalona, "Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study," The Journal of urology, vol. 152, no. 5, pp. 1923-1926, 1994.
  2. D. Faraggi and R. Simon, "A neural network model for survival data," Statistics in medicine, vol. 14, no. 1, pp. 73-82, 1995.
  3. B. Zupan, J. DemšAr, M. W. Kattan, J. R. Beck, and I. Bratko, "Machine learning for survival analysis: a case study on recurrence of prostate cancer," Artificial intelligence in medicine, vol. 20, no. 1, pp. 59-75, 2000.
  4. M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, "A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data," (in English), Ieee Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126-3139, Dec 1 2016, doi: 10.1109/Tkde.2016.2608347.
  5. M. Billichová, L. J. Coan, S. Czanner, M. Kováčová, F. Sharifian, and G. Czanner, "Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment," Plos one, vol. 19, no. 1, p. e0297190, 2024.
  6. W. Tizi and A. Berrado, "Machine learning for survival analysis in cancer research: A comparative study," Scientific African, vol. 21, p. e01880, 2023.
  7. S. Leger et al., "A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling," Scientific reports, vol. 7, no. 1, p. 13206, 2017.
  8. M. Özbay Karakuş and O. Er, "A comparative study on prediction of survival event of heart failure patients using machine learning algorithms," Neural Computing and Applications, vol. 34, no. 16, pp. 13895-13908, 2022.
  9. E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations," Journal of the American statistical association, vol. 53, no. 282, pp. 457-481, 1958.
  10. J. P. Klein, "Small Sample-Moments of Some Estimators of the Variance of the Kaplan-Meier and Nelson-Aalen Estimators," (in English), Scandinavian Journal of Statistics, vol. 18, no. 4, pp. 333-340, 1991. [Online]. Available: ://WOS:A1991HF92400006.
IEEE
[1]P. Akın and Y. Terzi, “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”, International Journal of Data Science and Applications, vol. 8, no. 1, pp. 11–27, June 2025, [Online]. Available: https://izlik.org/JA65NT37GJ