Research Article
BibTex RIS Cite

Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data

Year 2025, Volume: 11 Issue: 3, 495 - 521, 31.12.2025

Abstract

Scene Text Recognition (STR) has emerged as a critical research area in computer vision, enabling machines to interpret textual information embedded in natural scenes. Despite significant progress with deep learning, current Optical Character Recognition (OCR) systems still face challenges in generalizing across varying languages, fonts, distortions, and environmental conditions. This study provides a comprehensive evaluation of six state-of-the-art OCR models across eight English benchmark datasets and a newly introduced Turkish scene text dataset (TS-TR). Beyond conventional metrics such as accuracy and F1-score, the analysis incorporates character-level error types and substitution patterns to reveal systematic weaknesses in model behavior. The comparative results emphasize the superiority of Transformer-based architectures, particularly MGP-STR, in diverse scene conditions, while also highlighting performance degradation in morphologically rich and non-English languages. These findings underline the need for multilingual adaptation, linguistically informed modeling, and hybrid visual language approaches to achieve robust and language-aware OCR systems applicable to real-world scenarios.

References

  • [1] J. Baek et al., "What is wrong with scene text recognition model comparisons? Dataset and model analysis," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), October 27–November 2, 2019, Seoul, Korea [Online]. Available: IEEE/CVF Open Access, https://openaccess.thecvf.com/content_ICCV_2019/ [Accessed: 29 Dec. 2025].
  • [2] R. Smith, "An overview of the Tesseract OCR engine," in Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), September 23–26, 2007, Curitiba, Brazil [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025].
  • [3] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, "Robust scene text recognition with automatic rectification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), June 26–July 1, 2016, Las Vegas, NV, USA [Online]. Available: CVF Open Access, https://www.cv-foundation.org/. [Accessed: 29 Dec. 2025].
  • [4] Y. L. Tan, A. W. K. Kong, and J. J. Kim, "Pure transformer with integrated experts for scene text recognition," in Computer Vision – ECCV 2022: Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13688. pp. 486–502. doi: 10.1007/978-3-031-19815-1_28
  • [5] M. Li et al., "TrOCR: Transformer-based optical character recognition with pre-trained models," in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), February 7–14, 2023, B. Williams, Y. Chen, and J. Neville, Eds. Washington, DC: AAAI Press, 2023. pp. 13094–13102. doi: 10.1609/aaai.v37i11.26538
  • [6] P. Wang, C. Da, and C. Yao, "Multi-granularity prediction for scene text recognition," in Computer Vision – ECCV 2022: Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13688. pp. 339–355. doi: 10.1007/978-3-031-19815-1_20
  • [7] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text recognition in the wild: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021. doi: 10.1145/3440756
  • [8] H. Yue et al., “NRSTRNet: a novel network for noise-robust scene text recognition,” International Journal of Computational Intelligence Systems, vol. 16, no. 1, p. 5, 2023. doi: 10.1007/s44196-023-00181-1
  • [9] J. Baek et al., "What is wrong with scene text recognition model comparisons? Dataset and model analysis," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), October 27–November 2, 2019, Seoul, Korea [Online]. Available: IEEE/CVF Open Access, https://openaccess.thecvf.com/content_ICCV_2019/ [Accessed: 29 Dec. 2025].
  • [10] M. Şahin, E. Şahin, E. Özdemir, F. Talu, and S. Öztürk, “Beyin tümörü biyopsisi için derin öğrenme tabanlı risk minimizasyonlu otomatik planlama,” Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, vol. 40, no. 1, pp. 487–500, 2024. doi: 10.17341/gazimmfd.1348325
  • [11] M. A. Dursun and P. Ö. KAVAS, “Beyond Diagnosis: Cross-Dataset Evaluation of Risk Factors for Thyroid Cancer Recurrence,” Artificial Intelligence Studies, vol. 8, no. 1, pp. 38–69, 2025. doi: 10.30855/ais.2025.08.01.03
  • [12] E. Şahin and D. Özdemir, “ThinkSTra: a transformer-driven architecture for decoding imagined speech from EEG with spatial–temporal dynamics,” Medical & Biological Engineering & Computing, pp. 1–26, 2025. doi: 10.1007/s11517-025-03478-9
  • [13] E. Şahin and M. F. Talu, “WY-NET: A new approach to image synthesis with generative adversarial networks,” Journal of Scientific Reports-A, no. 050, pp. 270–290, 2022.
  • [14] S. Dörterler, "Kanser hastalığı teşhisinde ölüm oyunu optimizasyon algoritmasının etkisi," in Mühendislik Alanında Uluslararası Araştırmalar VIII, N. Görür, Ed. Istanbul: Eğitim Publishing, 2023, p. 15.
  • [15] İ. Şahin, M. Dörterler, and H. GÖKÇE, “Optimization of hydrostatic thrust bearing using enhanced grey wolf optimizer,” Mechanics, vol. 25, no. 6, pp. 480–486, 2019. doi: 10.5755/j01.mech.25.6.22512
  • [16] S. Dörterler, S. Arslan, and D. Özdemir, “Unlocking the potential: A review of artificial intelligence applications in wind energy,” Expert Systems, vol. 41, no. 12, p. e13716, 2024. doi: 10.1111/exsy.13716
  • [17] V. Kaya, “Classification of waste materials with a smart garbage system for sustainable development: A novel model,” Frontiers in Environmental Science, vol. 11, p. 1228732, 2023. doi: 10.3389/fenvs.2023.1228732
  • [18] N. Yagmur, I. Dag, and H. Temurtas, “A new computer‐aided diagnostic method for classifying anaemia disease: Hybrid use of Tree Bagger and metaheuristics,” Expert Systems, p. e13528, 2023. doi: 10.1111/exsy.13528
  • [19] S. Dörterler, H. Dumlu, D. Özdemir, and H. Temurtaş, "Melezlenmiş K-means ve diferansiyel gelişim algoritmaları ile kalp hastalığının teşhisi," in Proceedings of the 1st International Conference on Engineering and Applied Natural Sciences (ICEANS 2022), May 10–13, 2022, Konya, Turkey, U. Özkaya, Ed. Konya: ICEANS, 2022. pp. 1840–1844.
  • [20] G. Arslan, F. Aydemir, and S. Arslan, "Enhanced license plate recognition using deep learning and block-based approach," Journal of Scientific Reports-A, no. 58, pp. 57–82, 2023.
  • [21] G. Nagy, “Twenty years of document image analysis in PAMI,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 22, no. 01, pp. 38–62, 2000. doi: 10.1109/34.824820
  • [22] D. Karatzas et al., "ICDAR 2015 competition on robust reading," in Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR 2015), August 23–26, 2015, Tunis, Tunisia [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025].
  • [23] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text recognition in the wild: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021. doi: 10.1145/3440756
  • [24] K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proceedings of the 2011 International Conference on Computer Vision (ICCV 2011), November 6–13, 2011, Barcelona, Spain [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025]. doi: 10.1109/ICCV.2011.6126402
  • [25] S. He, R. W. H. Lau, W. Liu, Z. Huang, and Q. Yang, “Supercnn: A superpixelwise convolutional neural network for salient object detection,” International Journal of Computer Vision, vol. 115, pp. 330–344, 2015. doi: 10.1007/s11263-015-0822-0
  • [26] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), June 16–20, 2019, Long Beach, CA, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [27] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, "PhotoOCR: Reading text in uncontrolled conditions," in Proceedings of the IEEE International Conference on Computer Vision (ICCV 2013), December 1–8, 2013, Sydney, Australia [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [28] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, 2016. doi: 10.1109/TPAMI.2016.2646371 [29] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic data and artificial neural networks for natural scene text recognition,” arXiv preprint arXiv:1406.2227, 2014. doi: 10.48550/arXiv.1406.2227
  • [30] A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic data for text localisation in natural images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), June 26–July 1, 2016, Las Vegas, NV, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [31] S. Long and C. Yao, “Unrealtext: Synthesizing realistic scene text images from the unreal world,” arXiv preprint arXiv:2003.10608, 2020. doi: 10.48550/arXiv.2003.10608
  • [32] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), June 16–20, 2019, Long Beach, CA, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [33] C.-Y. Lee and S. Osindero, "Recursive recurrent nets with attention modeling for OCR in the wild," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), June 26–July 1, 2016, Las Vegas, NV, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [34] B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, “Aster: An attentional scene text recognizer with flexible rectification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035–2048, 2018. doi: 10.1109/TPAMI.2018.2848939
  • [35] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, "Robust scene text recognition with automatic rectification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), June 26–July 1, 2016, Las Vegas, NV, USA [Online]. Available: CVF Open Access, https://www.cv-foundation.org/. [Accessed: 29 Dec. 2025].
  • [36] Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, and S. Zhou, "AON: Towards arbitrarily-oriented text recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), June 18–22, 2018, Salt Lake City, UT, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [37] Y. Baek et al., "CLEval: Character-level evaluation for text detection and recognition tasks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2020), June 14–19, 2020, Virtual Conference [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [38] J. I. Laszlo and P. J. Bairstow, “Handwriting: Difficulties and possible solutions,” School Psychology International, vol. 5, no. 4, pp. 207–213, 1984. doi: 10.1177/0143034384054004
  • [39] H. Bunke, M. Roth, and E. G. Schukat-Talamazzini, “Off-line cursive handwriting recognition using hidden markov models,” Pattern Recognition, vol. 28, no. 9, pp. 1399–1413, 1995. doi: 10.1016/0031-3203(95)00013-P
  • [40] W. Cho, S.-W. Lee, and J. H. Kim, “Modeling and recognition of cursive words with hidden Markov models,” Pattern Recognition, vol. 28, no. 12, pp. 1941–1953, 1995. doi: 10.1016/0031-3203(95)00041-0
  • [41] A. Kundu, Y. He, and P. Bahl, “Recognition of handwritten word: first and second order hidden Markov model based approach,” Pattern Recognition, vol. 22, no. 3, pp. 283–297, 1989. doi: 10.1016/0031-3203(89)90076-9
  • [42] M. Gilloux, "Hidden Markov models in handwriting recognition," in Fundamentals in Handwriting Recognition, Proceedings of the NATO Advanced Study Institute on Handwriting Recognition, S. Impedovo, Ed. Berlin, Heidelberg: Springer, 1994. pp. 264–288.
  • [43] J. Hu, M. K. Brown, and W. Turin, “HMM based online handwriting recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 10, pp. 1039–1045, 1996. doi: 10.1109/34.541414
  • [44] W. AlKendi, F. Gechter, L. Heyberger, and C. Guyeux, “Advancements and challenges in handwritten text recognition: A comprehensive survey,” Journal of Imaging, vol. 10, no. 1, p. 18, 2024. doi: 10.3390/jimaging10010018
  • [45] K. Barrere, Y. Soullard, A. Lemaitre, and B. Coüasnon, "A light transformer-based architecture for handwritten text recognition," in Document Analysis Systems: Proceedings of the 15th IAPR International Workshop (DAS 2022), May 22–25, 2022, La Rochelle, France, S. Uchida, E. Barney, and V. Eglin, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13237. pp. 275–290.
  • [46] H. Singh, R. K. Sharma, and V. P. Singh, “Online handwriting recognition systems for Indic and non-Indic scripts: A review,” Artificial Intelligence Review, vol. 54, no. 2, pp. 1525–1579, 2021. doi: 10.1007/s10462-020-09886-7
  • [47] S. S. Rosyda and T. W. Purboyo, “A review of various handwriting recognition methods,” International Journal of Applied Engineering Research, vol. 13, no. 2, pp. 1155–1164, 2018.
  • [48] P. Yadav and N. Yadav, “Handwriting recognition system-a review,” International Journal of Computer Applications, vol. 114, no. 19, pp. 36–40, 2015.
  • [49] S. Singh, S. Rohilla, and A. Sharma, “An inclusive review on deep learning techniques and their scope in handwriting recognition,” arXiv preprint arXiv:2404.08011, 2024. doi: 10.48550/arXiv.2404.08011
  • [50] M. Agarwal and A. Anastasopoulos, "A concise survey of OCR for low-resource languages," in Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), June 2024, Mexico City, Mexico [Online]. Available: ACL Anthology, https://aclanthology.org/. [Accessed: 29 Dec. 2025]. doi: 10.18653/v1/2024.americasnlp-1.10
  • [51] M. Xu, J. Zhang, L. Xu, W. Silamu, and Y. Li, “Collaborative encoding method for scene text recognition in low linguistic resources: The Uyghur language case study,” Applied Sciences, vol. 14, no. 5, p. 1707, 2024. doi: 10.3390/app14051707
  • [52] M. Nazeem, R. Anitha, and S. Navaneeth, "Open-source OCR libraries: A comprehensive study for low resource language," in Proceedings of the 21st International Conference on Natural Language Processing (ICON 2024), December 2024, Chennai, India [Online]. Available: ACL Anthology, https://aclanthology.org/. [Accessed: 29 Dec. 2025].
  • [53] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), June 16–20, 2019, Long Beach, CA, USA [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [54] Jaided, “EasyOCR,” Dec. 14, 2025. [Online]. Available: https://github.com/JaidedAI/EasyOCR. [Accessed: Dec. 14, 2025].
  • [55] P. Wang, C. Da, and C. Yao, "Multi-granularity prediction for scene text recognition," in Computer Vision – ECCV 2022: Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer, 2022, Lecture Notes in Computer Science, vol. 13688. pp. 339–355. doi: 10.1007/978-3-031-19815-1_20
  • [56] R. Smith, "An overview of the Tesseract OCR engine," in Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), September 23–26, 2007, Curitiba, Brazil [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025].
  • [57] D. Sporici, E. Cușnir, and C.-A. Boiangiu, “Improving the accuracy of Tesseract 4.0 OCR engine using convolution-based preprocessing,” Symmetry, vol. 12, no. 5, p. 715, 2020. doi: 10.3390/sym12050715
  • [58] D. Mindee, “Document text recognition.” Dec. 14, 2025. [Online]. Available: https://github.com/mindee/doctr. [Accessed: Dec. 14, 2025].
  • [59] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, "Real-time scene text detection with differentiable binarization," in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020), February 7–12, 2020, New York, NY, USA. Palo Alto, CA: AAAI Press, 2020. pp. 11474–11481.
  • [60] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), June 16–21, 2012, Providence, RI, USA [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025].
  • [61] S. M. Lucas et al., “ICDAR 2003 robust reading competitions: entries, results, and future directions,” International Journal of Document Analysis and Recognition (IJDAR), vol. 7, pp. 105–122, 2005. doi: 10.1007/s10032-004-0134-3
  • [62] D. Karatzas et al., "ICDAR 2013 robust reading competition," in Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR 2013), August 25–28, 2013, Washington, DC, USA [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025]. doi: 10.1109/ICDAR.2013.221
  • [63] O. Y. Ling, L. B. Theng, A. C. Weiyen, and C. Mccarthy, “Development of vertical text interpreter for natural scene images,” IEEE Access, vol. 9, pp. 144341–144351, 2021. doi: 10.1109/ACCESS.2021.3121608
  • [64] J. Baek, Y. Matsui, and K. Aizawa, "What if we only use real datasets for scene text recognition? Toward scene text recognition with fewer labels," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 19–25, 2021, Virtual Conference [Online]. Available: CVF Open Access, https://openaccess.thecvf.com/. [Accessed: 29 Dec. 2025].
  • [65] C. K. Ch’ng and C. S. Chan, "Total-text: A comprehensive dataset for scene text detection and recognition," in Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), November 9–15, 2017, Kyoto, Japan [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/. [Accessed: 29 Dec. 2025]. doi: 10.1109/ICDAR.2017.157
  • [66] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust arbitrary text detection system for natural scene images,” Expert Systems with Applications, vol. 41, no. 18, pp. 8027–8048, 2014. doi: 10.1016/j.eswa.2014.07.008
  • [67] S. Dörterler, H. Dumlu, D. Özdemir, and H. Temurtaş, “Hybridization of meta-heuristic algorithms with K-means for clustering analysis: Case of medical datasets,” Gazi Mühendislik Bilimleri Dergisi, pp. 1–23. doi: 10.30855/gmbd.0705N01
  • [68] S. Yıldız, “Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model,” Engineering Science and Technology, an International Journal, vol. 60, p. 101881, 2024. doi: 10.1016/j.jestch.2024.101881
There are 67 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Research Article
Authors

Safa Dörterler 0000-0001-8778-081X

Emrullah Şahin 0000-0002-3390-6285

Durmuş Özdemir 0000-0002-9543-4076

Submission Date October 30, 2025
Acceptance Date December 5, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 11 Issue: 3

Cite

IEEE S. Dörterler, E. Şahin, and D. Özdemir, “Evaluating Scene Text Recognition Models Through Multidimensional Error Analysis: Insights from Turkish Scene Text Data”, GJES, vol. 11, no. 3, pp. 495–521, 2025.

GJES is indexed and archived by:

3311333114331153311633117

Gazi Journal of Engineering Sciences (GJES) publishes open access articles under a Creative Commons Attribution 4.0 International License (CC BY) 1366_2000-copia-2.jpg