Research Article

Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data

Volume: 9 Number: 3 May 15, 2026
EN TR

Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data

Abstract

Mucosa-associated lymphoid tissue (MALT) lymphoma is an indolent and clinically heterogeneous B-cell malignancy. The lack of robust molecular biomarkers and the limited availability of high-resolution single-cell studies hinder the development of precise diagnostic and therapeutic strategies. To address this gap, an interpretable machine learning pipeline was developed and applied to single-cell RNA sequencing (scRNA-seq) data derived from MALT lymphoma patients and healthy donors. After quality control, normalization, and dimensionality reduction via truncated singular value decomposition (SVD), multiple tree-based classifiers were trained using stratified cross-validation and evaluated on a held-out balanced test set. Feature importance scores and SHapley Additive exPlanations (SHAP) values were integrated with Wilcoxon rank-sum testing to identify statistically supported gene markers. Among the classifiers, CatBoost, LightGBM, and Random Forest achieved the highest performance. Genes such as RPS4Y1, RGS1, XIST, CREM, and HSPH1 were consistently prioritized by both SHAP and statistical testing, indicating their biological relevance and differential expression. Notably, a divergence was observed between impurity-based feature importance rankings and the SHAP/Wilcoxon consensus, reflecting the complementary nature of these analytical approaches. This integrative framework provides a transparent and reproducible approach for gene discovery in scRNA-seq data and contributes computationally prioritized candidate genes warranting experimental validation for understanding MALT lymphoma pathogenesis and improving future molecular diagnostics.

Keywords

Ethical Statement

This study did not involve human participants or animal experiments requiring ethical approval. All data were obtained from publicly available repositories.

Thanks

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. The single-cell RNA sequencing (scRNA-seq) datasets analyzed in this study are publicly available from the 10x Genomics repository. Processed data and analysis scripts are available from the corresponding author upon reasonable request.

References

  1. Andor, N., Simonds, E. F., Czerwinski, D. K., Chen, J., Grimes, S. M., Wood-Bouwens, C., Ji, H. P., & Levy, R. (2019). Single-cell RNA-Seq of follicular lymphoma reveals malignant B-cell types and coexpression of T-cell immune checkpoints. Blood, 134(13), 1119–1129. https://doi.org/10.1182/blood-2018-08-862292
  2. Bobée, V., Stamatoullas, A., Lenain, P., Parmentier, F., Camus, V., Picquenot, J. M., & Tilly, H. (2020). Combining gene expression profiling and machine learning to diagnose B-cell non-Hodgkin lymphoma. Leukemia & Lymphoma, 61(4), 902–911. https://doi.org/10.1038/s41408-020-0322-5
  3. Boye, K., & Maelandsmo, G. M. (2010). S100A4 and metastasis: A small actor playing many roles. The American Journal of Pathology, 176(2), 528–535. https://doi.org/10.2353/ajpath.2010.090526
  4. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
  5. Doherty, L., Sheen, M. R., Vlachos, A., Choesmel, V., O’Donohue, M. F., Clinton, C., et al. (2010). Ribosomal protein genes RPS10 and RPS26 are commonly mutated in Diamond-Blackfan anemia. American Journal of Human Genetics, 86(2), 222–228. https://doi.org/10.1016/j.ajhg.2009.12.015
  6. Genomics-10x. (2024). Chromium single cell gene expression v3 data. https://www.10xgenomics.com/ (accessed July 1, 2024)
  7. Ilicic, T., Kim, J. K., Kolodziejczyk, A. A., et al. (2016). Classification of low quality cells from single-cell RNA-seq data. Genome Biology, 17, 29. https://doi.org/10.1186/s13059-016-0888-1
  8. Juang, Y. T., Wang, Y., Solomou, E. E., Li, Y., Mawrin, C., Tenbrock, K., Kyttaris, V. C., & Tsokos, G. C. (2005). Systemic lupus erythematosus serum IgG increases CREM binding to the IL-2 promoter and suppresses IL-2 production through CaMKIV. Journal of Clinical Investigation, 115(4), 996–1005. https://doi.org/10.1172/JCI22854

Details

Primary Language

English

Subjects

Computational Statistics

Journal Section

Research Article

Publication Date

May 15, 2026

Submission Date

February 20, 2026

Acceptance Date

April 7, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

APA
Özlüer Başer, B. (2026). Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data. Black Sea Journal of Engineering and Science, 9(3), 1150-1162. https://doi.org/10.34248/bsengineering.1893922
AMA
1.Özlüer Başer B. Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data. BSJ Eng. Sci. 2026;9(3):1150-1162. doi:10.34248/bsengineering.1893922
Chicago
Özlüer Başer, Bilge. 2026. “Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on ScRNA-Seq Data”. Black Sea Journal of Engineering and Science 9 (3): 1150-62. https://doi.org/10.34248/bsengineering.1893922.
EndNote
Özlüer Başer B (May 1, 2026) Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data. Black Sea Journal of Engineering and Science 9 3 1150–1162.
IEEE
[1]B. Özlüer Başer, “Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data”, BSJ Eng. Sci., vol. 9, no. 3, pp. 1150–1162, May 2026, doi: 10.34248/bsengineering.1893922.
ISNAD
Özlüer Başer, Bilge. “Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on ScRNA-Seq Data”. Black Sea Journal of Engineering and Science 9/3 (May 1, 2026): 1150-1162. https://doi.org/10.34248/bsengineering.1893922.
JAMA
1.Özlüer Başer B. Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data. BSJ Eng. Sci. 2026;9:1150–1162.
MLA
Özlüer Başer, Bilge. “Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on ScRNA-Seq Data”. Black Sea Journal of Engineering and Science, vol. 9, no. 3, May 2026, pp. 1150-62, doi:10.34248/bsengineering.1893922.
Vancouver
1.Bilge Özlüer Başer. Gene Discovery in MALT Lymphoma Using Explainable Machine Learning on scRNA-seq Data. BSJ Eng. Sci. 2026 May 1;9(3):1150-62. doi:10.34248/bsengineering.1893922

                            24890