A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters

Gizem Zorlu Görgülügil; Volkan Karakuş; Ayşegül Kurtoğlu; Erdal Kurtoğlu

doi:10.46310/tjim.1852407

A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters

Abstract

Objective: Thalassemia is a hereditary hemoglobinopathy and remains a significant public health problem, particularly in Mediterranean regions. Although genetic testing represents the gold standard for subtype classification, access to such testing is limited in many clinical settings. This pilot study aimed to explore the feasibility of using machine learning models based on routinely available clinical and laboratory parameters to support the differentiation of thalassemia subtypes in the absence of genetic testing.

Methods: This retrospective cross-sectional study included 83 individuals (57 thalassemia major, 11 thalassemia intermedia, and 15 healthy controls). Demographic, clinical, and laboratory variables were analyzed using the R programming language. A supervised Random Forest algorithm was applied for multiclass classification. Model performance was assessed using accuracy, class-specific sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). To further evaluate the distinction between thalassemia major and intermedia, a simplified logistic regression model was constructed, and Firth logistic regression was applied to address the small sample size and class imbalance.

Results: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.

Conclusion: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.

Keywords

References

1. Viprakasit V, Ekwattanakit S. Clinical classification, screening and diagnosis for thalassemia. Hematol Oncol Clin North Am. 2018;32(2):193-211. doi:10.1016/j.hoc.2017.11.006.
2. Tan L, Huang T, Luo L, Ma P, Liu J, Zou J, et al. Molecular identification and the hematological findings of four novel variants in globin genes in Jiangxi Province of Southern China. Hemoglobin. 2024;48(6):369-74. doi:10.1 080/03630269.2024.2438707.
3. Sadiq IZ, Abubakar FS, Usman HS, Abdullahi AD, Ibrahim B, Kastayal BS, et al. Thalassemia: pathophysiology, diagnosis, and advances in treatment. Thalass Rep. 2024;14(4):81-102. doi:10.3390/thalassrep14040010.
4. Brancaleoni V, Di Pierro E, Motta I, Cappellini MD. Laboratory diagnosis of thalassemia. Int J Lab Hematol. 2016;38(Suppl 1):32-40. doi:10.1111/ijlh.12527.
5. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689. doi:10.1186/ s12909-023-04698-z.
6. Masood A, Naseem U, Rashid J, Kim J, Razzak I. Review on enhancing clinical decision support system using machine learning. CAAI Trans Intell Technol. 2024:1-14. doi:10.1049/cit2.12286.
7. Busnatu Ș, Niculescu AG, Bolocan A, Petrescu GED, Păduraru DN, Năstasă I, et al. Clinical applications of artificial intelligence—an updated overview. J Clin Med. 2022;11(8):2265. doi:10.3390/jcm11082265.
8. Piriyakhuntorn P, Tantiworawit A, Rattanathammethee T, Chai Adisaksopha C, Rattarittamrong E, Norasetthada L. The role of red cell distribution width in the differential diagnosis of iron deficiency anemia and non transfusion dependent thalassemia patients. Hematol Rep. 2018;10(3):7605. doi:10.4081/hr.2018.7605.

Details

Primary Language

English

Subjects

Cardiovascular Medicine and Haematology (Other)

Journal Section

Research Article

Authors

Gizem Zorlu Görgülügil ^*
0000-0002-0773-7000
Türkiye

Volkan Karakuş
0000-0001-9178-2850
Türkiye

Ayşegül Kurtoğlu
0000-0002-6033-4139
Türkiye

Erdal Kurtoğlu
0000-0002-6867-6053
Türkiye

Publication Date

March 6, 2026

Submission Date

December 30, 2025

Acceptance Date

January 25, 2026

Published in Issue

Year 2026 Volume: 8

DOI

https://doi.org/10.46310/tjim.1852407

IZ

https://izlik.org/JA55YA64UH

Cite

RIS / Bibtex

APA

Zorlu Görgülügil, G., Karakuş, V., Kurtoğlu, A., & Kurtoğlu, E. (2026). A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters. Turkish Journal of Internal Medicine, 8, 28-33. https://doi.org/10.46310/tjim.1852407

AMA

1.Zorlu Görgülügil G, Karakuş V, Kurtoğlu A, Kurtoğlu E. A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters. Turk J Int Med. 2026;8:28-33. doi:10.46310/tjim.1852407

Chicago

Zorlu Görgülügil, Gizem, Volkan Karakuş, Ayşegül Kurtoğlu, and Erdal Kurtoğlu. 2026. “A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters”. Turkish Journal of Internal Medicine 8 (March): 28-33. https://doi.org/10.46310/tjim.1852407.

EndNote

Zorlu Görgülügil G, Karakuş V, Kurtoğlu A, Kurtoğlu E (March 1, 2026) A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters. Turkish Journal of Internal Medicine 8 28–33.

IEEE

[1]G. Zorlu Görgülügil, V. Karakuş, A. Kurtoğlu, and E. Kurtoğlu, “A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters”, Turk J Int Med, vol. 8, pp. 28–33, Mar. 2026, doi: 10.46310/tjim.1852407.

ISNAD

Zorlu Görgülügil, Gizem - Karakuş, Volkan - Kurtoğlu, Ayşegül - Kurtoğlu, Erdal. “A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters”. Turkish Journal of Internal Medicine 8 (March 1, 2026): 28-33. https://doi.org/10.46310/tjim.1852407.

JAMA

1.Zorlu Görgülügil G, Karakuş V, Kurtoğlu A, Kurtoğlu E. A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters. Turk J Int Med. 2026;8:28–33.

MLA

Zorlu Görgülügil, Gizem, et al. “A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters”. Turkish Journal of Internal Medicine, vol. 8, Mar. 2026, pp. 28-33, doi:10.46310/tjim.1852407.

Vancouver

1.Gizem Zorlu Görgülügil, Volkan Karakuş, Ayşegül Kurtoğlu, Erdal Kurtoğlu. A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters. Turk J Int Med. 2026 Mar. 1;8:28-33. doi:10.46310/tjim.1852407