A Multi-Task Transformer Ensemble for Explainable English Language Learning: Unifying Grammatical Correction, Tense Prediction, and CEFR Proficiency Grading
Abstract
This paper presents a multi-task transformer ensemble that unifies Grammatical Error Correction, Tense Prediction, and CEFR-level Proficiency Grading within a single explainable framework for English language learning. The system integrates fine-tuned transformer models—Flan-T5, BERT, RoBERTa, and DistilBERT—through a confidence-weighted ensemble to generate grammatical corrections, verify tense consistency, and predict learner proficiency levels. The framework was trained and validated on large-scale linguistic datasets, including BEA-2019, C4, PTB, BNC, EFCAMDAT, and CLC, to improve robustness across writing domains. Experimental results show that the Flan-T5 GEC module achieved BLEU = 32.4 with a 36.8% reduction in training time, while the RoBERTa tense classifier reached F1 = 0.99. For CEFR grading, the model achieved κ = 0.90 , indicating strong aggregate agreement with human ratings, although performance at the most advanced proficiency level remained more challenging. Compared with non-integrated module outputs, the ensemble further improved overall linguistic accuracy and interpretability, with statistically significant gains in key aggregate metrics (p < 0.05). These findings suggest a hierarchical dependency among grammar, tense, and proficiency, while provid- ing an interpretable AI mechanism for delivering personalized and pedagogically meaningful feedback.
Keywords
- Natural language processing
- Transformer models
- Grammatical error correc- tion
- Tense prediction
- CEFR proficiency grading
- Educational AI
- Multi-task learning
Supporting Institution
Ethical Statement
Thanks
References
- I. Kapounova´, “Predicting item difficulty by applying machine learning algorithms using item text features,” Ph.D. dissertation, Charles Univ., Prague, Czech Republic, Sep. 2025. [Online]. Available: https://dspace.cuni.cz/handle/20.500.11956/203312
- P. Siripol, S. Rhee, S. Thirakunkovit, and A. Liang-Itsara, “Evaluating the consistency of automated CEFR analyzers: a study of English language text classification,” Int. J. Eval. Res. Educ. (IJERE), vol. 14, no. 4, pp. 3283–3294, Aug. 2025, doi: 10.11591/ijere.v14i4.33528.
- S. M. Marier, X. Chen, L. Zhu, and X. Kong, “Grammatical error correction for low-resource languages: A review of challenges, strategies, computational and future directions,” PeerJ Comput. Sci., vol. 11, p. e3044, Jul. 2025, doi: 10.7717/peerj-cs.3044.
- A. Katinskaia, “Assessing Learner Answers in Computer-Aided Language Learning,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Helsinki, Helsinki, Finland, 2025. [Online]. Available: http://urn.fi/URN:ISBN:978-952-84-1303-5
- A. Vaswani et al., “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008, doi: 10.48550/arXiv.1706.03762.
- A. Koyama, M. Mita, S.-Y. Yoon, Y. Takama, and M. Komachi, “Targeted syntactic evaluation for grammatical error correction,” in Proc. 63rd Annu. Meeting Assoc. Comput. Linguistics (ACL), 2025, pp. 21108–21125, doi: 10.18653/v1/2025.acl-long.1026.
- M. Gu, “Cross-lingual pre-trained models for Chinese-English QA systems,” in Proc. 2nd Int. Conf. Mach. Intell. Digit. Appl. (MIDA), 2025, pp. 128–137, doi: 10.1145/3744464.3744485.
- M. Qiu et al., “Chinese grammatical error correction: A survey,” arXiv:2504.00977, Apr. 2025, doi: 10.48550/arXiv.2504.00977.
Details
Primary Language
English
Subjects
Natural Language Processing, Artificial Intelligence (Other)
Journal Section
Research Article
Authors
Dung Ho
0000-0003-3731-3783
Vietnam
Le Hung
0009-0000-2209-1247
Vietnam
Vu Tran
0009-0005-9386-8603
Vietnam
Early Pub Date
June 19, 2026
Publication Date
June 30, 2026
Submission Date
December 1, 2025
Acceptance Date
March 30, 2026
Published in Issue
Year 2026 Volume: 9 Number: 3
