Research Article

Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison

Volume: 15 Number: 3 September 15, 2025
TR EN

Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison

Abstract

Hematologic cancers are often diagnosed after symptoms become apparent, which can make it difficult to control the disease and implement effective treatment strategies. Studying gene expression profiles is vital for early diagnosis and the development of treatment strategies for hematologic cancers such as T-cell leukemia. The motivation of this study is to reveal the molecular mechanisms in the pathogenesis of this disease by comparing the whole gene expression profile in Adult T-cell Leukemia (ATL) cells and CD4+T cells of healthy individuals. For this aim, several machine learning algorithms, Naive Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, C4.5, Logistic Regression, Linear Discriminant Analysis and Artificial Neural Network algorithms were used. Their performance was compared on the GSE33615 dataset by using 5-fold cross validation with stratified sampling. Among these, Artificial Neural Network stood out with an AUC of 0.98 and an F1 score of 0.93. It was followed by SVM with an AUC of 0.97 and 0.957 F1 score. In addition to performance comparison, information gain ratio, SHAPLEY metric and correlation values were calculated for the detection of genes causing ATL. Among the models, the three with the highest performance (ANN, SVM, RF) were selected, and the top ten most significant genes were identified for each. Considering the intersection of these gene sets, ZSCAN18, PLK3, and NELL2 were found to be associated with the related disease. These genes may contribute to Adult T-cell Leukemia pathogenesis through their roles in cell cycle regulation, transcriptional control, and oncogenic signaling. Further investigation is needed to clarify their precise molecular mechanisms in the related disease.

Keywords

Adult T-cell Leukemia (ATL), Microarray study, Machine learning, Variable importance

References

  1. Abass, Y. A., & Adeshina, S. A. (2021). Deep learning methodologies for genomic data prediction. Journal of Artificial Intelligence for Medical Sciences, 2(1), 1-11
  2. Akalın, F., and Yumuşak, N. (2023). Mikrodizi veri kümesindeki ALL, AML ve MLL lösemi türlerine ilişkin gen anomalilerinin LSTM sinir ağı ile sınıflandırılması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(3), 1299–1306.
  3. Breiman, L. (2001). Random Forests. Mach Learn, 45 (1): 5–32.
  4. Chi, C. M., Vossler, P., Fan, Y., & Lv, J. (2022). Asymptotic properties of high-dimensional random forests. The Annals of Statistics, 50(6), 3415-3438.
  5. Choi, H., Song, H., and Jung, Y. W. (2020). The roles of CCR7 for the homing of memory CD8+ T cells into their survival niches. Immune Network, 20(3).
  6. Chong, Y., Lee, J. Y., Kim, Y., Choi, J., Yu, H., Park, G., Cho, M. Y., and Thakur, N. (2020). A machine-learning expert-supporting system for diagnosis prediction of lymphoid neoplasms using a probabilistic decision-tree algorithm and immunohistochemistry profile database. Journal of Pathology and Translational Medicine, 54(6), 462–470.
  7. Cook, L., Rowan, A., & Bangham, C. (2021). ATLleukemia/lymphoma—Pathobiology and implications for modern clinical management. Annals of Lymphoma, 5.
  8. Cordo, V., Meijer, M. T., Hagelaar, R., de Goeij-de Haas, R. R., Poort, V. M., Henneman, A. A., Piersma, S. R., Pham, T. V., Oshima, K., and Ferrando, A. A. (2022). Phosphoproteomic profiling of T cell acute lymphoblastic leukemia reveals targetable kinases and combination treatment strategies. Nature Communications, 13(1), 1048.
  9. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
  10. Eckardt, J. N., Bornhäuser, M., Wendt, K., and Middeke, J. M. (2020). Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects. Blood Advances, 4(23), 6077-6085.
APA
Kiliçarslan, S., & Yücebaş, S. C. (2025). Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison. Karadeniz Fen Bilimleri Dergisi, 15(3), 1046-1069. https://doi.org/10.31466/kfbd.1597865
AMA
1.Kiliçarslan S, Yücebaş SC. Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison. KFBD. 2025;15(3):1046-1069. doi:10.31466/kfbd.1597865
Chicago
Kiliçarslan, Sabire, and Sait Can Yücebaş. 2025. “Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis With Machine Learning Models and Performance Comparison”. Karadeniz Fen Bilimleri Dergisi 15 (3): 1046-69. https://doi.org/10.31466/kfbd.1597865.
EndNote
Kiliçarslan S, Yücebaş SC (September 1, 2025) Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison. Karadeniz Fen Bilimleri Dergisi 15 3 1046–1069.
IEEE
[1]S. Kiliçarslan and S. C. Yücebaş, “Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison”, KFBD, vol. 15, no. 3, pp. 1046–1069, Sept. 2025, doi: 10.31466/kfbd.1597865.
ISNAD
Kiliçarslan, Sabire - Yücebaş, Sait Can. “Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis With Machine Learning Models and Performance Comparison”. Karadeniz Fen Bilimleri Dergisi 15/3 (September 1, 2025): 1046-1069. https://doi.org/10.31466/kfbd.1597865.
JAMA
1.Kiliçarslan S, Yücebaş SC. Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison. KFBD. 2025;15:1046–1069.
MLA
Kiliçarslan, Sabire, and Sait Can Yücebaş. “Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis With Machine Learning Models and Performance Comparison”. Karadeniz Fen Bilimleri Dergisi, vol. 15, no. 3, Sept. 2025, pp. 1046-69, doi:10.31466/kfbd.1597865.
Vancouver
1.Sabire Kiliçarslan, Sait Can Yücebaş. Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison. KFBD. 2025 Sep. 1;15(3):1046-69. doi:10.31466/kfbd.1597865