Research Article

Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data

Volume: 11 Number: 3 December 31, 2025

Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data

Abstract

Scene Text Recognition (STR) has emerged as a critical research area in computer vision, enabling machines to interpret textual information embedded in natural scenes. Despite significant progress with deep learning, current Optical Character Recognition (OCR) systems still face challenges in generalizing across varying languages, fonts, distortions, and environmental conditions. This study provides a comprehensive evaluation of six state-of-the-art OCR models across eight English benchmark datasets and a newly introduced Turkish scene text dataset (TS-TR). Beyond conventional metrics such as accuracy and F1-score, the analysis incorporates character-level error types and substitution patterns to reveal systematic weaknesses in model behavior. The comparative results emphasize the superiority of Transformer-based architectures, particularly MGP-STR, in diverse scene conditions, while also highlighting performance degradation in morphologically rich and non-English languages. These findings underline the need for multilingual adaptation, linguistically informed modeling, and hybrid visual language approaches to achieve robust and language-aware OCR systems applicable to real-world scenarios.

Keywords

References

  1. [1] J. Baek et al., "What is wrong with scene text recognition model comparisons? Dataset and model analysis," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), October 27–November 2, 2019, Seoul, Korea [Online]. Available: IEEE/CVF Open Access, https://openaccess.thecvf.com/content_ICCV_2019/ [Accessed: 29 Dec. 2025].
  2. [2] R. Smith, "An overview of the Tesseract OCR engine," in Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), September 23–26, 2007, Curitiba, Brazil [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025].
  3. [3] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, "Robust scene text recognition with automatic rectification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), June 26–July 1, 2016, Las Vegas, NV, USA [Online]. Available: CVF Open Access, https://www.cv-foundation.org/. [Accessed: 29 Dec. 2025].
  4. [4] Y. L. Tan, A. W. K. Kong, and J. J. Kim, "Pure transformer with integrated experts for scene text recognition," in Computer Vision – ECCV 2022: Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13688. pp. 486–502. doi: 10.1007/978-3-031-19815-1_28
  5. [5] M. Li et al., "TrOCR: Transformer-based optical character recognition with pre-trained models," in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), February 7–14, 2023, B. Williams, Y. Chen, and J. Neville, Eds. Washington, DC: AAAI Press, 2023. pp. 13094–13102. doi: 10.1609/aaai.v37i11.26538
  6. [6] P. Wang, C. Da, and C. Yao, "Multi-granularity prediction for scene text recognition," in Computer Vision – ECCV 2022: Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13688. pp. 339–355. doi: 10.1007/978-3-031-19815-1_20
  7. [7] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text recognition in the wild: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021. doi: 10.1145/3440756
  8. [8] H. Yue et al., “NRSTRNet: a novel network for noise-robust scene text recognition,” International Journal of Computational Intelligence Systems, vol. 16, no. 1, p. 5, 2023. doi: 10.1007/s44196-023-00181-1

Details

Primary Language

English

Subjects

Computer Software

Journal Section

Research Article

Publication Date

December 31, 2025

Submission Date

October 30, 2025

Acceptance Date

December 5, 2025

Published in Issue

Year 2025 Volume: 11 Number: 3

APA
Dörterler, S., Şahin, E., & Özdemir, D. (2025). Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data. Gazi Journal of Engineering Sciences, 11(3), 495-521. https://izlik.org/JA42PM93ZD
AMA
1.Dörterler S, Şahin E, Özdemir D. Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data. GJES. 2025;11(3):495-521. https://izlik.org/JA42PM93ZD
Chicago
Dörterler, Safa, Emrullah Şahin, and Durmuş Özdemir. 2025. “Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data”. Gazi Journal of Engineering Sciences 11 (3): 495-521. https://izlik.org/JA42PM93ZD.
EndNote
Dörterler S, Şahin E, Özdemir D (December 1, 2025) Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data. Gazi Journal of Engineering Sciences 11 3 495–521.
IEEE
[1]S. Dörterler, E. Şahin, and D. Özdemir, “Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data”, GJES, vol. 11, no. 3, pp. 495–521, Dec. 2025, [Online]. Available: https://izlik.org/JA42PM93ZD
ISNAD
Dörterler, Safa - Şahin, Emrullah - Özdemir, Durmuş. “Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data”. Gazi Journal of Engineering Sciences 11/3 (December 1, 2025): 495-521. https://izlik.org/JA42PM93ZD.
JAMA
1.Dörterler S, Şahin E, Özdemir D. Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data. GJES. 2025;11:495–521.
MLA
Dörterler, Safa, et al. “Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data”. Gazi Journal of Engineering Sciences, vol. 11, no. 3, Dec. 2025, pp. 495-21, https://izlik.org/JA42PM93ZD.
Vancouver
1.Safa Dörterler, Emrullah Şahin, Durmuş Özdemir. Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data. GJES [Internet]. 2025 Dec. 1;11(3):495-521. Available from: https://izlik.org/JA42PM93ZD

GJES is indexed and archived by:

3311333114331153311633117

Gazi Journal of Engineering Sciences (GJES) publishes open access articles under a Creative Commons Attribution 4.0 International License (CC BY) 1366_2000-copia-2.jpg