Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription

Mehmet Oğuz Derin; Erdem Uçar

doi:10.35236/jots.1501797

Araştırma Makalesi

Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription

Yıl 2024, Cilt: 8 Sayı: 2, 278 - 301, 21.07.2024

Mehmet Oğuz Derin , Erdem Uçar

https://doi.org/10.35236/jots.1501797

Cited By: 1

https://izlik.org/JA96YE94LG

Öz

Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.

Anahtar Kelimeler

optical character recognition , Old Turkic runiform script , data synthesis

Kaynakça

Akenine-Moller, T. et al. (2019). Real-time rendering. London-New York: AK Peters/CRC Press.
AlKendi, W. et al. (2024). Advancements and Challenges in Handwritten Text Recognition: A Comprehensive Survey. Journal of Imaging, 10(1), 18.
Blender Foundation. (2024). Blender - A 3D modelling and rendering package. (Retrieved from www.blender.org)
Bradski, G. et al. (2000). OpenCV. Dr. Dobb’s Journal of Software Tools, 3(2).
Buslaev, A. et al. (2020). Albumentations: fast and flexible image augmentations. Information, 11(2), 125.
Celso M. de Melo et al. (2022). Next-generation deep learning based on simulators and synthetic data. Trends in Cognitive Sciences, 26(2), 174–187.
Chefer, H. et al. (2021). Transformer interpretability beyond attention visualization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 782–791.
Choudhary, T. et al. (2020). A comprehensive survey on model compression and acceleration. Artificial Intelligence Review, 53, 5113–5155.
Curless, B. & Levoy, M. (1996). A volumetric method for building complex models from range images. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (pp. 303–312). New York: Association for Computing Machinery.
Debevec, P. (1998). Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography. Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques (pp. 189-198). New York: Association for Computing Machinery.
Derin, M. O. & Harada, T. (2021). Universal Dependencies for Old Turkish. Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021) (pp. 129–141). Sofia: Association for Computational Linguistics.
Dosovitskiy, A. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint ArXiv:2010.11929.
Erdal, M. (1979). The Chronological Classification of Old Turkish Texts. Central Asiatic Journal, 23(3), 151-175.
Falcon, W. & The PyTorch Lightning team. (2019). PyTorch Lightning (Version 1.4).
Goodfellow, I. et al. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144.
Harris, C. R. et al. (2020). Array programming with NumPy. Nature, 585(7825), 357–362.
Hart, J. C. (1996). Sphere tracing: A geometric method for the antialiased ray tracing of implicit surfaces. The Visual Computer, 12(10), 527–545.
Heckbert, P. S. (1986). Survey of texture mapping. IEEE Computer Graphics and Applications, 6(11), 56–67.
Ho, J. et al. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Jaderberg, M. et al. (2014). Synthetic data and artificial neural networks for natural scene text recognition. ArXiv Preprint ArXiv: 1406.2227.
Johanson, L. (2021). Turkic. Cambridge: Cambridge University Press.
Karras, T. et al. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401–4410). Long Beach, CA, USA.
Lattner, C. et al. (2021). MLIR: Scaling compiler infrastructure for domain specific computation. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 2–14). Curran Associates.
Levien, R. & Uguray, A. (2024). GPU-friendly Stroke Expansion (v2). ArXiv Preprint ArXiv: 2405.00127v2.
Liang, J. et al. (2005). Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition (IJDAR), 7, 84–104.
Loshchilov, I. & Hutter, F. (2019). Decoupled Weight Decay Regularization. ArXiv Preprint ArXiv: 1711.05101.
Ma, H.-Y. et al. (2024). Reading between the Lines: Image-Based Order Detection in OCR for Chinese Historical Documents. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23808–23810.
Martı́nek, J. et al. (2020). Building an efficient OCR system for historical documents with little training data. Neural Comput. Appl., 32(23), 17209–17227.
Mori, S. et al. (1992). Historical review of OCR research and development. Proceedings of the IEEE, 80(7), 1029–1058.
Nevskaya, I. et al. (2018). 3D documentation of Old Turkic Altai runiform inscriptions and revised readings of the inscriptions Tuekta-V and Bichiktu-Boom-III. Turkic Languages, 22(2), 194-216.
Osher, S. et al. (2004). Level set methods and dynamic implicit surfaces. Appl. Mech. Rev., 57(3), B15–B15.
Paszke, A. et al. (2019). PyTorch: An imperative style, high-performance deep learning library. ArXiv Preprint ArXiv: 1912.01703.
Pharr, M. et al. (2023). Physically based rendering: From theory to implementation. San Francisko: Morgan Kaufmann.
Poncelas, A. et al. (2020). A Tool for Facilitating OCR Postediting in Historical Documents. Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages (pp. 47-51). Marseille: European Language Resources Association (ELRA).
Robbeets, M. & Savelyev, A. (2020). The Oxford guide to the Transeurasian languages. Oxford: Oxford University Press.
Shi, B. et al. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.
Ströbel, P. B. et al. (2023). The Adaptability of a Transformer-Based OCR Model for Historical Documents. In M. Coustaty & A. Fornés (Eds.), Document Analysis and Recognition – ICDAR 2023 Workshops (pp. 34–48). Springer Nature Switzerland.
Tekin, T. (1968). A Grammar of Orkhon Turkic. Bloomingron: Indiana University.
Tremblay, J. et al. (2018). Deep object pose estimation for semantic robotic grasping of household objects. ArXiv Preprint ArXiv: 1809.10790.
Uçar, E. (2024). A New Interpretation of Line 17 (I/South 10) of the Tuñuquq Inscriptions. Zeitschrift Der Deutschen Morgenländischen Gesellschaft, 174(1), 161-172.
Vasilyev, D. D. (1983). Grafiçeskiy fond pamyatnikov Tyurkskoy runiçeskoy pis’mennosti Aziatskogo areala (opıt sistematizatsii). Moskva: İzdatel’stvo “Nauka” Glavnaya Redaktsiya Vostoçnoy Literaturı.
Xia, J. et al. (2009). Perceivable artifacts in compressed video and their relation to video quality. Signal Processing: Image Communication, 24(7), 548–556.
Yousef, M. & Bishop, T. E. (2020). OrigamiNet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14710–14719). Seattle.

Eski Türk Runik Yazıtlarında OCR için Parametrik Veri Sentezini 3D Simülasyon ile Arttırma: Kül Tégin Yazıtı Üzerinde Bir Örnekleme

Yıl 2024, Cilt: 8 Sayı: 2, 278 - 301, 21.07.2024

Mehmet Oğuz Derin , Erdem Uçar

https://doi.org/10.35236/jots.1501797

Cited By: 1

https://izlik.org/JA96YE94LG

Öz

Anahtar Kelimeler

optical character recognition , Old Turkic runiform script , data synthesis

Kaynakça

Akenine-Moller, T. et al. (2019). Real-time rendering. London-New York: AK Peters/CRC Press.
AlKendi, W. et al. (2024). Advancements and Challenges in Handwritten Text Recognition: A Comprehensive Survey. Journal of Imaging, 10(1), 18.
Blender Foundation. (2024). Blender - A 3D modelling and rendering package. (Retrieved from www.blender.org)
Bradski, G. et al. (2000). OpenCV. Dr. Dobb’s Journal of Software Tools, 3(2).
Buslaev, A. et al. (2020). Albumentations: fast and flexible image augmentations. Information, 11(2), 125.
Celso M. de Melo et al. (2022). Next-generation deep learning based on simulators and synthetic data. Trends in Cognitive Sciences, 26(2), 174–187.
Chefer, H. et al. (2021). Transformer interpretability beyond attention visualization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 782–791.
Choudhary, T. et al. (2020). A comprehensive survey on model compression and acceleration. Artificial Intelligence Review, 53, 5113–5155.
Curless, B. & Levoy, M. (1996). A volumetric method for building complex models from range images. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (pp. 303–312). New York: Association for Computing Machinery.
Debevec, P. (1998). Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography. Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques (pp. 189-198). New York: Association for Computing Machinery.
Derin, M. O. & Harada, T. (2021). Universal Dependencies for Old Turkish. Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021) (pp. 129–141). Sofia: Association for Computational Linguistics.
Dosovitskiy, A. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint ArXiv:2010.11929.
Erdal, M. (1979). The Chronological Classification of Old Turkish Texts. Central Asiatic Journal, 23(3), 151-175.
Falcon, W. & The PyTorch Lightning team. (2019). PyTorch Lightning (Version 1.4).
Goodfellow, I. et al. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144.
Harris, C. R. et al. (2020). Array programming with NumPy. Nature, 585(7825), 357–362.
Hart, J. C. (1996). Sphere tracing: A geometric method for the antialiased ray tracing of implicit surfaces. The Visual Computer, 12(10), 527–545.
Heckbert, P. S. (1986). Survey of texture mapping. IEEE Computer Graphics and Applications, 6(11), 56–67.
Ho, J. et al. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Jaderberg, M. et al. (2014). Synthetic data and artificial neural networks for natural scene text recognition. ArXiv Preprint ArXiv: 1406.2227.
Johanson, L. (2021). Turkic. Cambridge: Cambridge University Press.
Karras, T. et al. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401–4410). Long Beach, CA, USA.
Lattner, C. et al. (2021). MLIR: Scaling compiler infrastructure for domain specific computation. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 2–14). Curran Associates.
Levien, R. & Uguray, A. (2024). GPU-friendly Stroke Expansion (v2). ArXiv Preprint ArXiv: 2405.00127v2.
Liang, J. et al. (2005). Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition (IJDAR), 7, 84–104.
Loshchilov, I. & Hutter, F. (2019). Decoupled Weight Decay Regularization. ArXiv Preprint ArXiv: 1711.05101.
Ma, H.-Y. et al. (2024). Reading between the Lines: Image-Based Order Detection in OCR for Chinese Historical Documents. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23808–23810.
Martı́nek, J. et al. (2020). Building an efficient OCR system for historical documents with little training data. Neural Comput. Appl., 32(23), 17209–17227.
Mori, S. et al. (1992). Historical review of OCR research and development. Proceedings of the IEEE, 80(7), 1029–1058.
Nevskaya, I. et al. (2018). 3D documentation of Old Turkic Altai runiform inscriptions and revised readings of the inscriptions Tuekta-V and Bichiktu-Boom-III. Turkic Languages, 22(2), 194-216.
Osher, S. et al. (2004). Level set methods and dynamic implicit surfaces. Appl. Mech. Rev., 57(3), B15–B15.
Paszke, A. et al. (2019). PyTorch: An imperative style, high-performance deep learning library. ArXiv Preprint ArXiv: 1912.01703.
Pharr, M. et al. (2023). Physically based rendering: From theory to implementation. San Francisko: Morgan Kaufmann.
Poncelas, A. et al. (2020). A Tool for Facilitating OCR Postediting in Historical Documents. Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages (pp. 47-51). Marseille: European Language Resources Association (ELRA).
Robbeets, M. & Savelyev, A. (2020). The Oxford guide to the Transeurasian languages. Oxford: Oxford University Press.
Shi, B. et al. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.
Ströbel, P. B. et al. (2023). The Adaptability of a Transformer-Based OCR Model for Historical Documents. In M. Coustaty & A. Fornés (Eds.), Document Analysis and Recognition – ICDAR 2023 Workshops (pp. 34–48). Springer Nature Switzerland.
Tekin, T. (1968). A Grammar of Orkhon Turkic. Bloomingron: Indiana University.
Tremblay, J. et al. (2018). Deep object pose estimation for semantic robotic grasping of household objects. ArXiv Preprint ArXiv: 1809.10790.
Uçar, E. (2024). A New Interpretation of Line 17 (I/South 10) of the Tuñuquq Inscriptions. Zeitschrift Der Deutschen Morgenländischen Gesellschaft, 174(1), 161-172.
Vasilyev, D. D. (1983). Grafiçeskiy fond pamyatnikov Tyurkskoy runiçeskoy pis’mennosti Aziatskogo areala (opıt sistematizatsii). Moskva: İzdatel’stvo “Nauka” Glavnaya Redaktsiya Vostoçnoy Literaturı.
Xia, J. et al. (2009). Perceivable artifacts in compressed video and their relation to video quality. Signal Processing: Image Communication, 24(7), 548–556.
Yousef, M. & Bishop, T. E. (2020). OrigamiNet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14710–14719). Seattle.

Toplam 43 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Dilbilim (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Mehmet Oğuz Derin 0000-0002-6264-3509 Erdem Uçar 0000-0002-0039-9619
Gönderilme Tarihi	15 Haziran 2024
Kabul Tarihi	14 Temmuz 2024
Yayımlanma Tarihi	21 Temmuz 2024
DOI	https://doi.org/10.35236/jots.1501797
IZ	https://izlik.org/JA96YE94LG
Yayımlandığı Sayı	Yıl 2024 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Derin, M. O., & Uçar, E. (2024). Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription. Journal of Old Turkic Studies, 8(2), 278-301. https://doi.org/10.35236/jots.1501797

Cited By

https://doi.org/

Makale Dosyaları

Tam Metin