A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading

Dung Ho; Lien Le Thi Quynh; Le Hung; Vu Tran

doi:10.35377/saucis...1830952

A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading

Abstract

This paper presents a multi-task transformer ensemble that unifies Grammatical Error Correction, Tense Prediction, and CEFR-level Proficiency Grading within a single explainable framework for English language learning. The system integrates fine-tuned transformer models—Flan-T5, BERT, RoBERTa, and DistilBERT—through a confidence-weighted ensemble to generate grammatical corrections, verify tense consistency, and predict learner proficiency levels. The framework was trained and validated on large-scale linguistic datasets, including BEA-2019, C4, PTB, BNC, EFCAMDAT, and CLC, to improve robustness across writing domains. Experimental results show that the Flan-T5 GEC module achieved BLEU = 32.4 with a 36.8% reduction in training time, while the RoBERTa tense classifier reached F1 = 0.99. For CEFR grading, the model achieved κ = 0.90 , indicating strong aggregate agreement with human ratings, although performance at the most advanced proficiency level remained more challenging. Compared with non-integrated module outputs, the ensemble further improved overall linguistic accuracy and interpretability, with statistically significant gains in key aggregate metrics (p < 0.05). These findings suggest a hierarchical dependency among grammar, tense, and proficiency, while provid- ing an interpretable AI mechanism for delivering personalized and pedagogically meaningful feedback.

Keywords

Supporting Institution

None. This research did not receive support from any institution, funding agency, or organization.

Ethical Statement

This study did not involve human participants, personal data, or animal experiments; therefore, ethical committee approval was not required.

Thanks

We thank our colleagues for their valuable comments.

References

I. Kapounova´, “Predicting item difficulty by applying machine learning algorithms using item text features,” Ph.D. dissertation, Charles Univ., Prague, Czech Republic, Sep. 2025. [Online]. Available: https://dspace.cuni.cz/handle/20.500.11956/203312
P. Siripol, S. Rhee, S. Thirakunkovit, and A. Liang-Itsara, “Evaluating the consistency of automated CEFR analyzers: a study of English language text classification,” Int. J. Eval. Res. Educ. (IJERE), vol. 14, no. 4, pp. 3283–3294, Aug. 2025, doi: 10.11591/ijere.v14i4.33528.
S. M. Marier, X. Chen, L. Zhu, and X. Kong, “Grammatical error correction for low-resource languages: A review of challenges, strategies, computational and future directions,” PeerJ Comput. Sci., vol. 11, p. e3044, Jul. 2025, doi: 10.7717/peerj-cs.3044.
A. Katinskaia, “Assessing Learner Answers in Computer-Aided Language Learning,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Helsinki, Helsinki, Finland, 2025. [Online]. Available: http://urn.fi/URN:ISBN:978-952-84-1303-5
A. Vaswani et al., “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008, doi: 10.48550/arXiv.1706.03762.
A. Koyama, M. Mita, S.-Y. Yoon, Y. Takama, and M. Komachi, “Targeted syntactic evaluation for grammatical error correction,” in Proc. 63rd Annu. Meeting Assoc. Comput. Linguistics (ACL), 2025, pp. 21108–21125, doi: 10.18653/v1/2025.acl-long.1026.
M. Gu, “Cross-lingual pre-trained models for Chinese-English QA systems,” in Proc. 2nd Int. Conf. Mach. Intell. Digit. Appl. (MIDA), 2025, pp. 128–137, doi: 10.1145/3744464.3744485.
M. Qiu et al., “Chinese grammatical error correction: A survey,” arXiv:2504.00977, Apr. 2025, doi: 10.48550/arXiv.2504.00977.

X. Zheng and J. Zhang, “The usage of a transformer-based and artificial intelligence-driven multidimensional feedback system in English writing instruction,” Scientific Reports, vol. 15, Art. no. 19268, 2025, doi: 10.1038/s41598-025-05026-9.
C. Lin, “Design and application of an English smart teaching system based on intelligent large language model,” in Proc. ICICKE 2025, 2025, pp. 1–6, doi: 10.1109/ICICKE65317.2025.11136776.
X. Mei, “Visual-textual co-attention model for adaptive English teaching resource recommendation,” in Proc. ICDSIS 2025, 2025, pp. 1–9, doi: 10.1109/ICDSIS65355.2025.11070698.
C. Bryant, M. Felice, Ø. E. Andersen, and T. Briscoe, “The BEA-2019 shared task on grammatical error correction,” in Proc. 14th Workshop Innovative Use of NLP for Building Educational Applications, 2019, pp. 52–75, doi: 10.18653/v1/W19-4406.
C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67, 2023, doi: 10.48550/arXiv.1910.10683.
M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, “Building a large annotated corpus of English: The Penn Treebank,” Comput. Linguist., vol. 19, no. 2, pp. 313–330, 1993.
The British National Corpus (BNC), Oxford Univ., 1995. [Online]. Available: https://www.gloriacappelli.it/wp-content/uploads/2007/05/bnc.pdf
J. Geertzen, T. Alexopoulou, and A. Korhonen, “Automatic linguistic annotation of large-scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat),” in Proc. 31st Second Language Research Forum (SLRF), Cascadilla Proceedings Project, 2013, pp. 240–254.
Cambridge Learner Corpus, Cambridge Univ. Press. [Online]. Available: https://www.cambridge.org/elt/corpus/learner corpus2.htm
N. Lin et al., “A BERT-based unsupervised grammatical error correction framework,” arXiv:2303.17367, Mar. 2023, doi: 10.48550/arXiv.2303.17367.
Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv:1907.11692, Jul. 2019, doi: 10.48550/arXiv.1907.11692.
G. Lample and A. Conneau, “Cross-lingual language model pretraining,” arXiv:1901.07291, Jan. 2019, doi: 10.48550/arXiv.1901.07291.
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101, Jan. 2019, doi: 10.48550/arXiv.1711.05101.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics (ACL), 2002, pp. 311–318, doi: 10.3115/1073083.1073135.
C.-Y. Lin, “ROUGE: a package for automatic evaluation of summaries,” in Text Summarization Branches Out, Jul. 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013/
D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,” arXiv:2010.16061, Oct. 2020, doi: 10.48550/arXiv.2010.16061.
J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. Meas., vol. 20, no. 1, pp. 37–46, 1960, doi: 10.1177/001316446002000104.

Details

Primary Language

English

Subjects

Natural Language Processing, Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Dung Ho
0000-0003-3731-3783
Vietnam

Lien Le Thi Quynh ^*
0000-0003-4448-3115
Vietnam

Le Hung
0009-0000-2209-1247
Vietnam

Vu Tran
0009-0005-9386-8603
Vietnam

Early Pub Date

June 19, 2026

Publication Date

June 30, 2026

Submission Date

December 1, 2025

Acceptance Date

March 30, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

DOI

https://doi.org/10.35377/saucis...1830952

IZ

https://izlik.org/JA44CK99DS

Cite

RIS / Bibtex

APA

Ho, D., Le Thi Quynh, L., Hung, L., & Tran, V. (2026). A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading. Sakarya University Journal of Computer and Information Sciences, 9(3), 700-711. https://doi.org/10.35377/saucis...1830952

AMA

1.Ho D, Le Thi Quynh L, Hung L, Tran V. A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading. SAUCIS. 2026;9(3):700-711. doi:10.35377/saucis.1830952

Chicago

Ho, Dung, Lien Le Thi Quynh, Le Hung, and Vu Tran. 2026. “A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading”. Sakarya University Journal of Computer and Information Sciences 9 (3): 700-711. https://doi.org/10.35377/saucis. 1830952.

EndNote

Ho D, Le Thi Quynh L, Hung L, Tran V (June 1, 2026) A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading. Sakarya University Journal of Computer and Information Sciences 9 3 700–711.

IEEE

[1]D. Ho, L. Le Thi Quynh, L. Hung, and V. Tran, “A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading”, SAUCIS, vol. 9, no. 3, pp. 700–711, June 2026, doi: 10.35377/saucis...1830952.

ISNAD

Ho, Dung - Le Thi Quynh, Lien - Hung, Le - Tran, Vu. “A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 700-711. https://doi.org/10.35377/saucis. 1830952.

JAMA

1.Ho D, Le Thi Quynh L, Hung L, Tran V. A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading. SAUCIS. 2026;9:700–711.

MLA

Ho, Dung, et al. “A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 700-11, doi:10.35377/saucis. 1830952.

Vancouver

1.Dung Ho, Lien Le Thi Quynh, Le Hung, Vu Tran. A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading. SAUCIS. 2026 Jun. 1;9(3):700-11. doi:10.35377/saucis. 1830952