://WOS:A1991HF92400006." />
Research Article
BibTex RIS Cite

Year 2025, Volume: 8 Issue: 1, 11 - 27, 22.06.2025

Abstract

References

  • P. B. Snow, D. S. Smith, and W. J. Catalona, "Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study," The Journal of urology, vol. 152, no. 5, pp. 1923-1926, 1994.
  • D. Faraggi and R. Simon, "A neural network model for survival data," Statistics in medicine, vol. 14, no. 1, pp. 73-82, 1995.
  • B. Zupan, J. DemšAr, M. W. Kattan, J. R. Beck, and I. Bratko, "Machine learning for survival analysis: a case study on recurrence of prostate cancer," Artificial intelligence in medicine, vol. 20, no. 1, pp. 59-75, 2000.
  • M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, "A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data," (in English), Ieee Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126-3139, Dec 1 2016, doi: 10.1109/Tkde.2016.2608347.
  • M. Billichová, L. J. Coan, S. Czanner, M. Kováčová, F. Sharifian, and G. Czanner, "Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment," Plos one, vol. 19, no. 1, p. e0297190, 2024.
  • W. Tizi and A. Berrado, "Machine learning for survival analysis in cancer research: A comparative study," Scientific African, vol. 21, p. e01880, 2023.
  • S. Leger et al., "A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling," Scientific reports, vol. 7, no. 1, p. 13206, 2017.
  • M. Özbay Karakuş and O. Er, "A comparative study on prediction of survival event of heart failure patients using machine learning algorithms," Neural Computing and Applications, vol. 34, no. 16, pp. 13895-13908, 2022.
  • E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations," Journal of the American statistical association, vol. 53, no. 282, pp. 457-481, 1958.
  • J. P. Klein, "Small Sample-Moments of Some Estimators of the Variance of the Kaplan-Meier and Nelson-Aalen Estimators," (in English), Scandinavian Journal of Statistics, vol. 18, no. 4, pp. 333-340, 1991. [Online]. Available: <Go to ISI>://WOS:A1991HF92400006.
  • E. A. Colosimo, F. F. Ferreira, M. D. Oliveira, and C. B. Sousa, "Empirical comparisons between Kaplan-Meier and Nelson-Aalen survival function estimators," (in English), Journal of Statistical Computation and Simulation, vol. 72, no. 4, pp. 299-308, Apr 2002, doi: 10.1080/00949650212847.
  • G. A. Satten and S. Datta, "The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average," The American Statistician, vol. 55, no. 3, pp. 207-210, 2001. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568678/pdf/nihms810169.pdf.
  • A. Ihwah, "The Use of Cox Regression Model to Analyze the Factors that Influence Consumer Purchase Decision on a Product," (in English), International Conference on Agro-Industry (Icoa): Sustainable and Competitive Agro-Industry for Human Welfare Yogyakarta-Indonesia 2014, vol. 3, pp. 78-83, 2015, doi: 10.1016/j.aaspro.2015.01.017.
  • D. R. Cox, "Regression models and life‐tables," Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 2, pp. 187-202, 1972.
  • S. B. Kotsiantis, "Decision trees: a recent overview," (in English), Artificial Intelligence Review, vol. 39, no. 4, pp. 261-283, Apr 2011, doi: 10.1007/s10462-011-9272-4.
  • X. Jinguo and X. Chen, "Application of decision tree method in economic statistical data processing," in E-Business and E-Government (ICEE), 2011 International Conference on, 2011: IEEE, pp. 1-4.
  • Vikramkumar, B. Vijaykumar, and Trilochan, "Bayes and Naive Bayes Classifier," Computer Science & Engineering. Rajiv Gandhi University of Knowledge Technologies Andhra Pradesh, India, 2014.
  • R. R. Yager, "An extension of the naive Bayesian classifier," (in English), Information Sciences, vol. 176, no. 5, pp. 577-588, Mar 6 2006, doi: 10.1016/j.ins.2004.12.006.
  • O. T. Bişkin, M. Kuntalp, and D. G. Kuntalp, "Classification of arrhythmias according to the energy spectral density features by using Kernel density estimation," in Biomedical Engineering Meeting (BIYOMUT), 2010 15th National, 2010: IEEE, pp. 1-4.
  • C. Friedman and S. Sandow, Utility-based learning from data (Machine learning & pattern recognition series.). Boca Raton: Chapman & Hall/CRC, 2011, p. 397 p.
  • A. Liaw and M. Wiener, "Classification and regression by randomForest," R news, vol. 2, no. 3, pp. 18-22, 2002.
  • Ş. Haciefendioğlu, "Makine öğrenmesi yöntemleri ile glokom hastalığının teşhisi," Selçuk Üniversitesi Fen Bilimleri Enstitüsü, 2012.
  • V. Vapnik, "Principles of risk minimization for learning theory," Advances in neural information processing systems, vol. 4, 1991.
  • E. Alpaydin, Machine learning : the new AI (MIT Press essential knowledge series.). pp. xv, 206 pages.
  • S. Uğuz, "Makine öğrenmesi teorik yönleri ve Python uygulamaları ile bir yapay zeka ekolü," Nobel Yayıncılık. Ankara, 2019.
  • A. Dirican, "Kliniğimizde akciğer kanseri tanısı alan hastaların prospektif olarak değerlendirilmesi ve sağkalıma etki eden faktörlerin belirlenmesi " Tıpta Uzmanlık, Ondokuz Mayıs University, 2004.
  • S. v. Buuren and K. Groothuis-Oudshoorn, "mice: Multivariate imputation by chained equations in R," Journal of statistical software, pp. 1-68, 2010.

Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study

Year 2025, Volume: 8 Issue: 1, 11 - 27, 22.06.2025

Abstract

The Cox regression method is generally used to model censored data. Recently, with the increase in data, new methods have been sought. This study aims to reclassify the censored data using the Fleming-Harrington method to apply machine learning techniques, thereby conducting survival analysis through machine learning classification methods. In practice, the censored data of acute leukemia patients were used, with four distinct sample sizes simulated using a correlation matrix obtained from this acute leukemia dataset. The data were adapted to the machine learning algorithm using the Fleming-Harrington method. Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machines methods were applied to the datasets from among the classification algorithms. Performance metrics, including accuracy, the area under the ROC Curve (AUC), and the F score, were used to compare these algorithms. Results showed that the Random Forest algorithm performed best for the actual dataset, while the Naïve Bayes algorithm produced the best outcomes for the simulated dataset. When examining the machine learning algorithm results, close values were found, with Naïve Bayes outperforming other algorithms in all situations. Comparisons between these datasets using the Cox regression method and Naïve Bayes algorithm AUC values revealed similar outcomes. However, as the sample size increased, the performance of the Cox regression method decreased, while the machine learning algorithms' performance increased. Therefore, machine learning algorithms can provide valuable insights into cancer patients' mortality status or the likelihood of disease recurrence in studies incorporating survival analyses, especially when the sample size is large.

References

  • P. B. Snow, D. S. Smith, and W. J. Catalona, "Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study," The Journal of urology, vol. 152, no. 5, pp. 1923-1926, 1994.
  • D. Faraggi and R. Simon, "A neural network model for survival data," Statistics in medicine, vol. 14, no. 1, pp. 73-82, 1995.
  • B. Zupan, J. DemšAr, M. W. Kattan, J. R. Beck, and I. Bratko, "Machine learning for survival analysis: a case study on recurrence of prostate cancer," Artificial intelligence in medicine, vol. 20, no. 1, pp. 59-75, 2000.
  • M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, "A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data," (in English), Ieee Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126-3139, Dec 1 2016, doi: 10.1109/Tkde.2016.2608347.
  • M. Billichová, L. J. Coan, S. Czanner, M. Kováčová, F. Sharifian, and G. Czanner, "Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment," Plos one, vol. 19, no. 1, p. e0297190, 2024.
  • W. Tizi and A. Berrado, "Machine learning for survival analysis in cancer research: A comparative study," Scientific African, vol. 21, p. e01880, 2023.
  • S. Leger et al., "A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling," Scientific reports, vol. 7, no. 1, p. 13206, 2017.
  • M. Özbay Karakuş and O. Er, "A comparative study on prediction of survival event of heart failure patients using machine learning algorithms," Neural Computing and Applications, vol. 34, no. 16, pp. 13895-13908, 2022.
  • E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations," Journal of the American statistical association, vol. 53, no. 282, pp. 457-481, 1958.
  • J. P. Klein, "Small Sample-Moments of Some Estimators of the Variance of the Kaplan-Meier and Nelson-Aalen Estimators," (in English), Scandinavian Journal of Statistics, vol. 18, no. 4, pp. 333-340, 1991. [Online]. Available: <Go to ISI>://WOS:A1991HF92400006.
  • E. A. Colosimo, F. F. Ferreira, M. D. Oliveira, and C. B. Sousa, "Empirical comparisons between Kaplan-Meier and Nelson-Aalen survival function estimators," (in English), Journal of Statistical Computation and Simulation, vol. 72, no. 4, pp. 299-308, Apr 2002, doi: 10.1080/00949650212847.
  • G. A. Satten and S. Datta, "The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average," The American Statistician, vol. 55, no. 3, pp. 207-210, 2001. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568678/pdf/nihms810169.pdf.
  • A. Ihwah, "The Use of Cox Regression Model to Analyze the Factors that Influence Consumer Purchase Decision on a Product," (in English), International Conference on Agro-Industry (Icoa): Sustainable and Competitive Agro-Industry for Human Welfare Yogyakarta-Indonesia 2014, vol. 3, pp. 78-83, 2015, doi: 10.1016/j.aaspro.2015.01.017.
  • D. R. Cox, "Regression models and life‐tables," Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 2, pp. 187-202, 1972.
  • S. B. Kotsiantis, "Decision trees: a recent overview," (in English), Artificial Intelligence Review, vol. 39, no. 4, pp. 261-283, Apr 2011, doi: 10.1007/s10462-011-9272-4.
  • X. Jinguo and X. Chen, "Application of decision tree method in economic statistical data processing," in E-Business and E-Government (ICEE), 2011 International Conference on, 2011: IEEE, pp. 1-4.
  • Vikramkumar, B. Vijaykumar, and Trilochan, "Bayes and Naive Bayes Classifier," Computer Science & Engineering. Rajiv Gandhi University of Knowledge Technologies Andhra Pradesh, India, 2014.
  • R. R. Yager, "An extension of the naive Bayesian classifier," (in English), Information Sciences, vol. 176, no. 5, pp. 577-588, Mar 6 2006, doi: 10.1016/j.ins.2004.12.006.
  • O. T. Bişkin, M. Kuntalp, and D. G. Kuntalp, "Classification of arrhythmias according to the energy spectral density features by using Kernel density estimation," in Biomedical Engineering Meeting (BIYOMUT), 2010 15th National, 2010: IEEE, pp. 1-4.
  • C. Friedman and S. Sandow, Utility-based learning from data (Machine learning & pattern recognition series.). Boca Raton: Chapman & Hall/CRC, 2011, p. 397 p.
  • A. Liaw and M. Wiener, "Classification and regression by randomForest," R news, vol. 2, no. 3, pp. 18-22, 2002.
  • Ş. Haciefendioğlu, "Makine öğrenmesi yöntemleri ile glokom hastalığının teşhisi," Selçuk Üniversitesi Fen Bilimleri Enstitüsü, 2012.
  • V. Vapnik, "Principles of risk minimization for learning theory," Advances in neural information processing systems, vol. 4, 1991.
  • E. Alpaydin, Machine learning : the new AI (MIT Press essential knowledge series.). pp. xv, 206 pages.
  • S. Uğuz, "Makine öğrenmesi teorik yönleri ve Python uygulamaları ile bir yapay zeka ekolü," Nobel Yayıncılık. Ankara, 2019.
  • A. Dirican, "Kliniğimizde akciğer kanseri tanısı alan hastaların prospektif olarak değerlendirilmesi ve sağkalıma etki eden faktörlerin belirlenmesi " Tıpta Uzmanlık, Ondokuz Mayıs University, 2004.
  • S. v. Buuren and K. Groothuis-Oudshoorn, "mice: Multivariate imputation by chained equations in R," Journal of statistical software, pp. 1-68, 2010.
There are 27 citations in total.

Details

Primary Language English
Subjects Machine Learning Algorithms, Data Analysis
Journal Section Research Article
Authors

Pelin Akın 0000-0003-3798-4827

Yüksel Terzi 0000-0003-4966-8450

Early Pub Date May 20, 2025
Publication Date June 22, 2025
Submission Date September 20, 2024
Acceptance Date January 13, 2025
Published in Issue Year 2025 Volume: 8 Issue: 1

Cite

IEEE P. Akın and Y. Terzi, “Using the Fleming-Harrington Estimator Method to Process Censored Data in Machine Learning: A Methodological Study”, International Journal of Data Science and Applications, vol. 8, no. 1, pp. 11–27, 2025.

AI Research and Application Center, Sakarya University of Applied Sciences, Sakarya, Türkiye.