OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING

Ali Alper Demir; Ufuk Ozkaya

doi:10.21923/jesd.1383926

EN TR

OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING

Abstract

In this study, a deep learning-based method is developed for character detection and recognition in printed Ottoman documents. The character detection and recognition problem are considered as an object detection problem and for this purpose, an Ottoman character recognition model is developed based on the YOLO model, which is one of the most successful methods in object detection. In addition, in this study, a dataset consisting of Ottoman document images is created in which each character in the document images is marked. Data augmentation techniques are applied to improve the accuracy of character recognition and the robustness of the method. The Ottoman character recognition network was then trained using this dataset. The trained network model was tested with the test images in the dataset. The performance evaluation of the model was performed by calculating the average precision metric, which is frequently used in the literature. The average precision value was calculated for 34 character classes in the dataset and the results were interpreted in terms of the pros and cons of the method. The results show that the proposed method can detect and recognize characters in printed Ottoman documents with great accuracy, with a weighted average precision of 98.71%.

Keywords

DERİN ÖĞRENME KULLANARAK MATBU DOKÜMANLARDAKİ OSMANLICA KARAKTERLERİN TANINMASI

Abstract

Bu çalışmada matbu Osmanlıca dokümanlardaki karakterlerin tespiti ve tanınmasına yönelik derin öğrenme tabanlı bir yöntem geliştirilmiştir. Karakter tespit ve tanıma problemi bir nesne tespit problemi olarak ele alınmış ve bu amaçla nesne tespitinde en başarılı yöntemlerden biri olan YOLO modeli temel alınarak Osmanlıca karakter tanıma modeli geliştirilmiştir. Ayrıca bu çalışmada, Osmanlıca doküman imgelerinden oluşan ve doküman imgelerindeki her bir karakterin işaretlendiği bir veri kümesi oluşturulmuştur. Karakter tanıma doğruluğunun artırılması ve yöntemin gürbüzlüğünün sağlanması için veri çoğaltma teknikleri uygulanmıştır. Daha sonra bu veri kümesi kullanılarak Osmanlıca karakter tanıma ağı eğitilmiştir. Eğitilen ağ modeli veri kümesindeki test imgeleri ile test edilmiştir. Modelin performans değerlendirmesi, literatürde sıklıkla kullanılan ortalama kesinlik metriği hesaplanarak yapılmıştır. Veri kümesindeki 34 karakter sınıfı için ortalama kesinlik değeri hesaplanmış ve sonuçlar yöntemin artı ve eksileri açısından yorumlanmıştır. Elde edilen sonuçlar değerlendirildiğinde, önerilen yöntemin matbu Osmanlıca belgelerdeki karakterleri büyük bir doğrulukla, %98,71 ağırlıklı ortalama kesinlik değeri ile, tespit edip tanıyabildiği görülmüştür.

Keywords

References

Altun, H. O. (2022). Osmanlı Türkçesi araştırmalarında optik karakter tanıma teknolojisinin kullanımı. Başkent 3. Uluslararası Multidisipliner Bilimsel Çalışmalar Kongresi, 23-25 Eylül 2022.
Bilgin Tasdemir, E. F. (2023). Printed Ottoman text recognition using synthetic data and data augmentation. International Journal on Document Analysis and Recognition (IJDAR), 1-15.
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Doğru, M. (2016). Ottoman-Turkish Optical Character Recognition and Latin Transcription (Master's thesis, Ankara Yıldırım Beyazıt Üniversitesi Fen Bilimleri Enstitüsü).
Dölek, İ., & Kurt, A. (2023). Derin Sinir Ağlarıyla Osmanlıca Optik Karakter Tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(4), 2579-2594.
Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1440-1448).
Gkioxari, G., Hariharan, B., Girshick, R., & Malik, J. (2014). R-CNNs for pose estimation and action detection. arXiv preprint arXiv:1406.5212.
Gorgel, P., Kilic, N., Ucan, B., Kala, A., & Ucan, O. N. (2009). A backpropagation neural network approach for Ottoman character recognition. Intelligent Automation & Soft Computing, 15(3), 451-462.

Onat, A., Yildiz, F., & Gündüz, M. (2006). Ottoman script recognition using hidden Markov model. IEEE Transaction on Engineering Computing Technology, 14, 71-73.
Öztürk, A., Güneş, S., & Özbay, Y. (2000, December). Multifont Ottoman character recognition. In ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No. 00EX445) (Vol. 2, pp. 945-949). IEEE.
Kilic, N., Gorgel, P., Ucan, O. N., & Kala, A. (2008, March). Multifont Ottoman character recognition using support vector machine. In 2008 3rd International Symposium on Communications, control and Signal Processing (pp. 328-333). IEEE.
Kurt, Z., Türkmen, H. I., & Karslıgil, M. E. (2007, June). Ottoman Alphabet Character Recognition by LDA. In 2007 IEEE 15th Signal Processing and Communications Applications (pp. 1-4). IEEE.
Kurt, Z., Türkmen, H. I., & Karslıgil, M. E. (2009). Linear discriminant analysis in Ottoman alphabet character recognition. In Proceedings of the European Computing Conference: Volume 2 (pp. 601-607). Springer US.
Küçükşahin, N. (2019). Design of an Offline Ottoman Character Recognition System for Translating Printed Documents to Modern Turkish (Master's thesis, Izmir Institute of Technology).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Majid, N., & Smith, E. H. B. (2019, September). Segmentation-free bangla offline handwriting recognition using sequential detection of characters and diacritics with a faster R-CNN. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 228-233). IEEE.
Mondal, R., Malakar, S., Barney Smith, E. H., & Sarkar, R. (2022). Handwritten English word recognition using a deep learning-based object detection architecture. Multimedia Tools and Applications, 1-26.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Tesseract, Tesseract Open Source OCR Engine, https://github.com/tesseract-ocr, Access Date: 27.09.2023.
Tulum, M. (2014). Osmanlı Türkçesine Giriş 1-7. Anadolu Üniversitesi.
Uçar, M. (2021). Osmanlı Türkçesi Kolay Okuma Metinleri 1. Hayrât Neşriyat. Isparta.
Uzun, A. B., & Özer. A., 2021. Ottoman Turkish Characters, Access Date: 09.04.2022. https://www.kaggle.com/datasets/alpbintuuzun/ottoman-turkish-characters
Yalniz, I. Z., Altingovde, I. S., Güdükbay, U., & Ulusoy, Ö. (2009). Integrated segmentation and recognition of connected Ottoman script. Optical Engineering, 48(11), 117205-117205.

Details

Primary Language

English

Subjects

Computer Software, Software Engineering (Other)

Journal Section

Research Article

Authors

Ali Alper Demir ^*
0000-0001-5250-0590
Türkiye

Ufuk Ozkaya
0000-0002-3520-1975
Türkiye

Publication Date

June 30, 2024

Submission Date

October 31, 2023

Acceptance Date

June 5, 2024

Published in Issue

Year 2024 Volume: 12 Number: 2

DOI

https://doi.org/10.21923/jesd.1383926

IZ

https://izlik.org/JA92TM83EA

Cite

RIS / Bibtex

APA

Demir, A. A., & Ozkaya, U. (2024). OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. Mühendislik Bilimleri Ve Tasarım Dergisi, 12(2), 392-402. https://doi.org/10.21923/jesd.1383926

AMA

1.Demir AA, Ozkaya U. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. JESD. 2024;12(2):392-402. doi:10.21923/jesd.1383926

Chicago

Demir, Ali Alper, and Ufuk Ozkaya. 2024. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri Ve Tasarım Dergisi 12 (2): 392-402. https://doi.org/10.21923/jesd.1383926.

EndNote

Demir AA, Ozkaya U (June 1, 2024) OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. Mühendislik Bilimleri ve Tasarım Dergisi 12 2 392–402.

IEEE

[1]A. A. Demir and U. Ozkaya, “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”, JESD, vol. 12, no. 2, pp. 392–402, June 2024, doi: 10.21923/jesd.1383926.

ISNAD

Demir, Ali Alper - Ozkaya, Ufuk. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri ve Tasarım Dergisi 12/2 (June 1, 2024): 392-402. https://doi.org/10.21923/jesd.1383926.

JAMA

1.Demir AA, Ozkaya U. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. JESD. 2024;12:392–402.

MLA

Demir, Ali Alper, and Ufuk Ozkaya. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri Ve Tasarım Dergisi, vol. 12, no. 2, June 2024, pp. 392-0, doi:10.21923/jesd.1383926.

Vancouver

1.Ali Alper Demir, Ufuk Ozkaya. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. JESD. 2024 Jun. 1;12(2):392-40. doi:10.21923/jesd.1383926

Cited By

An Object Detection-Based Character Recognition Method for Ottoman Handwritten Documents

International Journal on Document Analysis and Recognition (IJDAR)

https://doi.org/10.1007/s10032-025-00529-7