Scene Text Recognition (STR) has emerged as a critical research area in computer vision, enabling machines to interpret textual information embedded in natural scenes. Despite significant progress with deep learning, current Optical Character Recognition (OCR) systems still face challenges in generalizing across varying languages, fonts, distortions, and environmental conditions. This study provides a comprehensive evaluation of six state-of-the-art OCR models across eight English benchmark datasets and a newly introduced Turkish scene text dataset (TS-TR). Beyond conventional metrics such as accuracy and F1-score, the analysis incorporates character-level error types and substitution patterns to reveal systematic weaknesses in model behavior. The comparative results emphasize the superiority of Transformer-based architectures, particularly MGP-STR, in diverse scene conditions, while also highlighting performance degradation in morphologically rich and non-English languages. These findings underline the need for multilingual adaptation, linguistically informed modeling, and hybrid visual language approaches to achieve robust and language-aware OCR systems applicable to real-world scenarios.
Optical Character recognition scene text recognition OCR model comparison Turkish scene text character substitution errors
| Primary Language | English |
|---|---|
| Subjects | Computer Software |
| Journal Section | Research Article |
| Authors | |
| Submission Date | October 30, 2025 |
| Acceptance Date | December 5, 2025 |
| Publication Date | December 31, 2025 |
| Published in Issue | Year 2025 Volume: 11 Issue: 3 |