OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING

Ali Alper Demir; Ufuk Ozkaya

doi:10.21923/jesd.1383926

EN TR

OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING

Öz

In this study, a deep learning-based method is developed for character detection and recognition in printed Ottoman documents. The character detection and recognition problem are considered as an object detection problem and for this purpose, an Ottoman character recognition model is developed based on the YOLO model, which is one of the most successful methods in object detection. In addition, in this study, a dataset consisting of Ottoman document images is created in which each character in the document images is marked. Data augmentation techniques are applied to improve the accuracy of character recognition and the robustness of the method. The Ottoman character recognition network was then trained using this dataset. The trained network model was tested with the test images in the dataset. The performance evaluation of the model was performed by calculating the average precision metric, which is frequently used in the literature. The average precision value was calculated for 34 character classes in the dataset and the results were interpreted in terms of the pros and cons of the method. The results show that the proposed method can detect and recognize characters in printed Ottoman documents with great accuracy, with a weighted average precision of 98.71%.

Anahtar Kelimeler

DERİN ÖĞRENME KULLANARAK MATBU DOKÜMANLARDAKİ OSMANLICA KARAKTERLERİN TANINMASI

Öz

Bu çalışmada matbu Osmanlıca dokümanlardaki karakterlerin tespiti ve tanınmasına yönelik derin öğrenme tabanlı bir yöntem geliştirilmiştir. Karakter tespit ve tanıma problemi bir nesne tespit problemi olarak ele alınmış ve bu amaçla nesne tespitinde en başarılı yöntemlerden biri olan YOLO modeli temel alınarak Osmanlıca karakter tanıma modeli geliştirilmiştir. Ayrıca bu çalışmada, Osmanlıca doküman imgelerinden oluşan ve doküman imgelerindeki her bir karakterin işaretlendiği bir veri kümesi oluşturulmuştur. Karakter tanıma doğruluğunun artırılması ve yöntemin gürbüzlüğünün sağlanması için veri çoğaltma teknikleri uygulanmıştır. Daha sonra bu veri kümesi kullanılarak Osmanlıca karakter tanıma ağı eğitilmiştir. Eğitilen ağ modeli veri kümesindeki test imgeleri ile test edilmiştir. Modelin performans değerlendirmesi, literatürde sıklıkla kullanılan ortalama kesinlik metriği hesaplanarak yapılmıştır. Veri kümesindeki 34 karakter sınıfı için ortalama kesinlik değeri hesaplanmış ve sonuçlar yöntemin artı ve eksileri açısından yorumlanmıştır. Elde edilen sonuçlar değerlendirildiğinde, önerilen yöntemin matbu Osmanlıca belgelerdeki karakterleri büyük bir doğrulukla, %98,71 ağırlıklı ortalama kesinlik değeri ile, tespit edip tanıyabildiği görülmüştür.

Anahtar Kelimeler

Kaynakça

Altun, H. O. (2022). Osmanlı Türkçesi araştırmalarında optik karakter tanıma teknolojisinin kullanımı. Başkent 3. Uluslararası Multidisipliner Bilimsel Çalışmalar Kongresi, 23-25 Eylül 2022.
Bilgin Tasdemir, E. F. (2023). Printed Ottoman text recognition using synthetic data and data augmentation. International Journal on Document Analysis and Recognition (IJDAR), 1-15.
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Doğru, M. (2016). Ottoman-Turkish Optical Character Recognition and Latin Transcription (Master's thesis, Ankara Yıldırım Beyazıt Üniversitesi Fen Bilimleri Enstitüsü).
Dölek, İ., & Kurt, A. (2023). Derin Sinir Ağlarıyla Osmanlıca Optik Karakter Tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(4), 2579-2594.
Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1440-1448).
Gkioxari, G., Hariharan, B., Girshick, R., & Malik, J. (2014). R-CNNs for pose estimation and action detection. arXiv preprint arXiv:1406.5212.
Gorgel, P., Kilic, N., Ucan, B., Kala, A., & Ucan, O. N. (2009). A backpropagation neural network approach for Ottoman character recognition. Intelligent Automation & Soft Computing, 15(3), 451-462.

Onat, A., Yildiz, F., & Gündüz, M. (2006). Ottoman script recognition using hidden Markov model. IEEE Transaction on Engineering Computing Technology, 14, 71-73.
Öztürk, A., Güneş, S., & Özbay, Y. (2000, December). Multifont Ottoman character recognition. In ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No. 00EX445) (Vol. 2, pp. 945-949). IEEE.
Kilic, N., Gorgel, P., Ucan, O. N., & Kala, A. (2008, March). Multifont Ottoman character recognition using support vector machine. In 2008 3rd International Symposium on Communications, control and Signal Processing (pp. 328-333). IEEE.
Kurt, Z., Türkmen, H. I., & Karslıgil, M. E. (2007, June). Ottoman Alphabet Character Recognition by LDA. In 2007 IEEE 15th Signal Processing and Communications Applications (pp. 1-4). IEEE.
Kurt, Z., Türkmen, H. I., & Karslıgil, M. E. (2009). Linear discriminant analysis in Ottoman alphabet character recognition. In Proceedings of the European Computing Conference: Volume 2 (pp. 601-607). Springer US.
Küçükşahin, N. (2019). Design of an Offline Ottoman Character Recognition System for Translating Printed Documents to Modern Turkish (Master's thesis, Izmir Institute of Technology).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Majid, N., & Smith, E. H. B. (2019, September). Segmentation-free bangla offline handwriting recognition using sequential detection of characters and diacritics with a faster R-CNN. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 228-233). IEEE.
Mondal, R., Malakar, S., Barney Smith, E. H., & Sarkar, R. (2022). Handwritten English word recognition using a deep learning-based object detection architecture. Multimedia Tools and Applications, 1-26.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Tesseract, Tesseract Open Source OCR Engine, https://github.com/tesseract-ocr, Access Date: 27.09.2023.
Tulum, M. (2014). Osmanlı Türkçesine Giriş 1-7. Anadolu Üniversitesi.
Uçar, M. (2021). Osmanlı Türkçesi Kolay Okuma Metinleri 1. Hayrât Neşriyat. Isparta.
Uzun, A. B., & Özer. A., 2021. Ottoman Turkish Characters, Access Date: 09.04.2022. https://www.kaggle.com/datasets/alpbintuuzun/ottoman-turkish-characters
Yalniz, I. Z., Altingovde, I. S., Güdükbay, U., & Ulusoy, Ö. (2009). Integrated segmentation and recognition of connected Ottoman script. Optical Engineering, 48(11), 117205-117205.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bilgisayar Yazılımı, Yazılım Mühendisliği (Diğer)

Bölüm

Araştırma Makalesi

Yazarlar

Ali Alper Demir ^*
0000-0001-5250-0590
Türkiye

Ufuk Ozkaya
0000-0002-3520-1975
Türkiye

Yayımlanma Tarihi

30 Haziran 2024

Gönderilme Tarihi

31 Ekim 2023

Kabul Tarihi

5 Haziran 2024

Yayımlandığı Sayı

Yıl 2024 Cilt: 12 Sayı: 2

DOI

https://doi.org/10.21923/jesd.1383926

IZ

https://izlik.org/JA92TM83EA

Kaynak Göster

RIS / Bibtex

APA

Demir, A. A., & Ozkaya, U. (2024). OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. Mühendislik Bilimleri ve Tasarım Dergisi, 12(2), 392-402. https://doi.org/10.21923/jesd.1383926

AMA

1.Demir AA, Ozkaya U. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. MBTD. 2024;12(2):392-402. doi:10.21923/jesd.1383926

Chicago

Demir, Ali Alper, ve Ufuk Ozkaya. 2024. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri ve Tasarım Dergisi 12 (2): 392-402. https://doi.org/10.21923/jesd.1383926.

EndNote

Demir AA, Ozkaya U (01 Haziran 2024) OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. Mühendislik Bilimleri ve Tasarım Dergisi 12 2 392–402.

IEEE

[1]A. A. Demir ve U. Ozkaya, “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”, MBTD, c. 12, sy 2, ss. 392–402, Haz. 2024, doi: 10.21923/jesd.1383926.

ISNAD

Demir, Ali Alper - Ozkaya, Ufuk. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri ve Tasarım Dergisi 12/2 (01 Haziran 2024): 392-402. https://doi.org/10.21923/jesd.1383926.

JAMA

1.Demir AA, Ozkaya U. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. MBTD. 2024;12:392–402.

MLA

Demir, Ali Alper, ve Ufuk Ozkaya. “OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING”. Mühendislik Bilimleri ve Tasarım Dergisi, c. 12, sy 2, Haziran 2024, ss. 392-0, doi:10.21923/jesd.1383926.

Vancouver

1.Ali Alper Demir, Ufuk Ozkaya. OTTOMAN CHARACTER RECOGNITION ON PRINTED DOCUMENTS USING DEEP LEARNING. MBTD. 01 Haziran 2024;12(2):392-40. doi:10.21923/jesd.1383926

Cited By

An Object Detection-Based Character Recognition Method for Ottoman Handwritten Documents

International Journal on Document Analysis and Recognition (IJDAR)

https://doi.org/10.1007/s10032-025-00529-7