Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Pelin Akın; Yüksel Terzi

EN

Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Abstract

The Cox regression method is generally used to model censored data. Recently, with the increase in data, new methods have been sought. This study aims to reclassify the censored data using the Fleming-Harrington method to apply machine learning techniques, thereby conducting survival analysis through machine learning classification methods. In practice, the censored data of acute leukemia patients were used, with four distinct sample sizes simulated using a correlation matrix obtained from this acute leukemia dataset. The data were adapted to the machine learning algorithm using the Fleming-Harrington method. Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machines methods were applied to the datasets from among the classification algorithms. Performance metrics, including accuracy, the area under the ROC Curve (AUC), and the F score, were used to compare these algorithms. Results showed that the Random Forest algorithm performed best for the actual dataset, while the Naïve Bayes algorithm produced the best outcomes for the simulated dataset. When examining the machine learning algorithm results, close values were found, with Naïve Bayes outperforming other algorithms in all situations. Comparisons between these datasets using the Cox regression method and Naïve Bayes algorithm AUC values revealed similar outcomes. However, as the sample size increased, the performance of the Cox regression method decreased, while the machine learning algorithms' performance increased. Therefore, machine learning algorithms can provide valuable insights into cancer patients' mortality status or the likelihood of disease recurrence in studies incorporating survival analyses, especially when the sample size is large.

Keywords

Survival analysis, Naïve Bayes, Censored data, Fleming-Harrington Estimator

References

P. B. Snow, D. S. Smith, and W. J. Catalona, "Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study," The Journal of urology, vol. 152, no. 5, pp. 1923-1926, 1994.
D. Faraggi and R. Simon, "A neural network model for survival data," Statistics in medicine, vol. 14, no. 1, pp. 73-82, 1995.
B. Zupan, J. DemšAr, M. W. Kattan, J. R. Beck, and I. Bratko, "Machine learning for survival analysis: a case study on recurrence of prostate cancer," Artificial intelligence in medicine, vol. 20, no. 1, pp. 59-75, 2000.
M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, "A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data," (in English), Ieee Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126-3139, Dec 1 2016, doi: 10.1109/Tkde.2016.2608347.
M. Billichová, L. J. Coan, S. Czanner, M. Kováčová, F. Sharifian, and G. Czanner, "Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment," Plos one, vol. 19, no. 1, p. e0297190, 2024.
W. Tizi and A. Berrado, "Machine learning for survival analysis in cancer research: A comparative study," Scientific African, vol. 21, p. e01880, 2023.
S. Leger et al., "A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling," Scientific reports, vol. 7, no. 1, p. 13206, 2017.
M. Özbay Karakuş and O. Er, "A comparative study on prediction of survival event of heart failure patients using machine learning algorithms," Neural Computing and Applications, vol. 34, no. 16, pp. 13895-13908, 2022.
E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations," Journal of the American statistical association, vol. 53, no. 282, pp. 457-481, 1958.
J. P. Klein, "Small Sample-Moments of Some Estimators of the Variance of the Kaplan-Meier and Nelson-Aalen Estimators," (in English), Scandinavian Journal of Statistics, vol. 18, no. 4, pp. 333-340, 1991. [Online]. Available: ://WOS:A1991HF92400006.

E. A. Colosimo, F. F. Ferreira, M. D. Oliveira, and C. B. Sousa, "Empirical comparisons between Kaplan-Meier and Nelson-Aalen survival function estimators," (in English), Journal of Statistical Computation and Simulation, vol. 72, no. 4, pp. 299-308, Apr 2002, doi: 10.1080/00949650212847.
G. A. Satten and S. Datta, "The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average," The American Statistician, vol. 55, no. 3, pp. 207-210, 2001. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568678/pdf/nihms810169.pdf.
A. Ihwah, "The Use of Cox Regression Model to Analyze the Factors that Influence Consumer Purchase Decision on a Product," (in English), International Conference on Agro-Industry (Icoa): Sustainable and Competitive Agro-Industry for Human Welfare Yogyakarta-Indonesia 2014, vol. 3, pp. 78-83, 2015, doi: 10.1016/j.aaspro.2015.01.017.
D. R. Cox, "Regression models and life‐tables," Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 2, pp. 187-202, 1972.
S. B. Kotsiantis, "Decision trees: a recent overview," (in English), Artificial Intelligence Review, vol. 39, no. 4, pp. 261-283, Apr 2011, doi: 10.1007/s10462-011-9272-4.
X. Jinguo and X. Chen, "Application of decision tree method in economic statistical data processing," in E-Business and E-Government (ICEE), 2011 International Conference on, 2011: IEEE, pp. 1-4.
Vikramkumar, B. Vijaykumar, and Trilochan, "Bayes and Naive Bayes Classifier," Computer Science & Engineering. Rajiv Gandhi University of Knowledge Technologies Andhra Pradesh, India, 2014.
R. R. Yager, "An extension of the naive Bayesian classifier," (in English), Information Sciences, vol. 176, no. 5, pp. 577-588, Mar 6 2006, doi: 10.1016/j.ins.2004.12.006.
O. T. Bişkin, M. Kuntalp, and D. G. Kuntalp, "Classification of arrhythmias according to the energy spectral density features by using Kernel density estimation," in Biomedical Engineering Meeting (BIYOMUT), 2010 15th National, 2010: IEEE, pp. 1-4.
C. Friedman and S. Sandow, Utility-based learning from data (Machine learning & pattern recognition series.). Boca Raton: Chapman & Hall/CRC, 2011, p. 397 p.
A. Liaw and M. Wiener, "Classification and regression by randomForest," R news, vol. 2, no. 3, pp. 18-22, 2002.
Ş. Haciefendioğlu, "Makine öğrenmesi yöntemleri ile glokom hastalığının teşhisi," Selçuk Üniversitesi Fen Bilimleri Enstitüsü, 2012.
V. Vapnik, "Principles of risk minimization for learning theory," Advances in neural information processing systems, vol. 4, 1991.
E. Alpaydin, Machine learning : the new AI (MIT Press essential knowledge series.). pp. xv, 206 pages.
S. Uğuz, "Makine öğrenmesi teorik yönleri ve Python uygulamaları ile bir yapay zeka ekolü," Nobel Yayıncılık. Ankara, 2019.
A. Dirican, "Kliniğimizde akciğer kanseri tanısı alan hastaların prospektif olarak değerlendirilmesi ve sağkalıma etki eden faktörlerin belirlenmesi " Tıpta Uzmanlık, Ondokuz Mayıs University, 2004.
S. v. Buuren and K. Groothuis-Oudshoorn, "mice: Multivariate imputation by chained equations in R," Journal of statistical software, pp. 1-68, 2010.

Details

Primary Language

English

Subjects

Machine Learning Algorithms, Data Analysis

Journal Section

Research Article

Authors

Pelin Akın ^*
0000-0003-3798-4827
Türkiye

Yüksel Terzi
0000-0003-4966-8450
Türkiye

Early Pub Date

May 20, 2025

Publication Date

June 22, 2025

Submission Date

September 20, 2024

Acceptance Date

January 13, 2025

Published in Issue

Year 2025 Volume: 8 Number: 1

IZ

https://izlik.org/JA65NT37GJ

APA

Akın, P., & Terzi, Y. (2025). Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study. International Journal of Data Science and Applications, 8(1), 11-27. https://izlik.org/JA65NT37GJ

AMA

1.Akın P, Terzi Y. Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study. International Journal of Data Science and Applications. 2025;8(1):11-27. https://izlik.org/JA65NT37GJ

Chicago

Akın, Pelin, and Yüksel Terzi. 2025. “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”. International Journal of Data Science and Applications 8 (1): 11-27. https://izlik.org/JA65NT37GJ.

EndNote

Akın P, Terzi Y (June 1, 2025) Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study. International Journal of Data Science and Applications 8 1 11–27.

IEEE

[1]P. Akın and Y. Terzi, “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”, International Journal of Data Science and Applications, vol. 8, no. 1, pp. 11–27, June 2025, [Online]. Available: https://izlik.org/JA65NT37GJ

ISNAD

Akın, Pelin - Terzi, Yüksel. “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”. International Journal of Data Science and Applications 8/1 (June 1, 2025): 11-27. https://izlik.org/JA65NT37GJ.

JAMA

1.Akın P, Terzi Y. Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study. International Journal of Data Science and Applications. 2025;8:11–27.

MLA

Akın, Pelin, and Yüksel Terzi. “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”. International Journal of Data Science and Applications, vol. 8, no. 1, June 2025, pp. 11-27, https://izlik.org/JA65NT37GJ.

Vancouver

1.Pelin Akın, Yüksel Terzi. Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study. International Journal of Data Science and Applications [Internet]. 2025 Jun. 1;8(1):11-27. Available from: https://izlik.org/JA65NT37GJ

AI Research and Application Center, Sakarya University of Applied Sciences, Sakarya, Türkiye.

Abstract

Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Abstract

Keywords

References

Details

Primary Language

Subjects

Journal Section

Authors

Early Pub Date

Publication Date

Submission Date

Acceptance Date

Published in Issue

IZ

Cite