EN
TR
Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts
Abstract
The digitization of manuscripts remains a critical challenge in Digital Humanities due to the morphological complexity of Arabic scripts. This study evaluates the performance of Large Language Models, specifically Gemini, in digitizing Arabic manuscripts. Using manuscripts in naskh, ruq‘ah, and ta‘līq scripts, model outputs were analyzed via Levenshtein distance, Word Error Rate, and Character Error Rate metrics. Results indicate recognition accuracy varies by script, with raw character accuracy at 97.91% for naskh, 94.32% for ruq‘ah, and 93.51% for ta‘līq. A substantial portion of errors in ruq‘ah and ta‘līq arose from orthographic variations in the letter yā and the hamza. Excluding these anomalies, accuracy across all scripts consistently reached the 97–98% range. The study concludes that while AI models demonstrate robust performance in deciphering manuscript templates, their integration into classical text edition (tahqīq) requires careful oversight of paleographic nuances, despite their high potential to assist professional publication. The experiment was conducted with the Gemini 3.1 Pro model under a fixed-prompt protocol. Since only the opening page of a single manuscript was used per script type, the findings should be interpreted as preliminary indicators rather than definitive conclusions.
Keywords
References
- Akça, Sümeyye vd. “Osmanlıca Metinlerin Dijital/Otomatik Transkripsiyonu: Mevcut Durum ve Öneriler”. Osmanlı Araştırmaları 65/65 (27 Mayıs 2025), 351-364. https://doi.org/10.18589/oa.1719330
- Aladağ, Fatma. “Dijital Beşerî Bilimler ve Türkiye Araştırmaları: Bir Literatür Değerlendirmesi”. Türkiye Araştırmaları Literatür Dergisi 18/36 (01 Aralık 2020), 773-796.
- Anjaneyulu, Pendyala Sri - Rao, Kunjum Nageswara. “Handwritten Text Recognition System Using Deep Learning”. International Journal of Engineering Technology Research & Management 9/7 (2025), 340-348.
- Atçeken, İsmail Hakkı. Endülüs’ün Fethi ve Mûsâ b. Nusayr. Ankara: Araştırma Yayınları, 2002.
- Bekiroğlu, Harun - Tuğrul, Talip. İlahiyat Araştırmalarında Veri Taraması: Veri Tabanları - Kütüphaneler - Atıf Dizinleri. Ankara: Oku Okut Yayınları, 2021. https://doi.org/10.24146/tk.1223960
- Bhatia, Gagan vd. “Qalam: A Multimodal LLM for Arabic Optical Character and Handwriting Recognition”. Proceedings of The Second Arabic Natural Language Processing Conference. 210-224. Bangkok, Thailand: Association for Computational Linguistics, 2024. https://doi.org/10.18653/v1/2024.arabicnlp-1.19
- Bilgin Tasdemir, Esma F. “Printed Ottoman Text Recognition Using Synthetic Data and Data Augmentation”. International Journal on Document Analysis and Recognition (IJDAR) 26/3 (Eylül 2023), 273-287. https://doi.org/10.1007/s10032-023-00436-9
- Broadwell, Peter vd. “Multilingual Handwritten Text Recognition (HTR) Models for Large-scale Processing of Archival Documents in Low-Resourced Arabic-Script Languages”. SSRN, 2025. SSRN, DOI.org (Crossref). Erişim 06 Şubat 2026. https://doi.org/10.2139/ssrn.5190984
Details
Primary Language
English
Subjects
Hadith, Tafsir
Journal Section
Research Article
Authors
Publication Date
June 26, 2026
Submission Date
March 23, 2026
Acceptance Date
June 1, 2026
Published in Issue
Year 2026 Volume: 26 Number: 2
APA
Adıgüzel, A., & Özbek, M. S. (2026). Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts. Dinbilimleri Akademik Araştırma Dergisi, 26(2), 909-941. https://doi.org/10.33415/daad.1914124
AMA
1.Adıgüzel A, Özbek MS. Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts. Dinbilimleri Akademik Araştırma Dergisi. 2026;26(2):909-941. doi:10.33415/daad.1914124
Chicago
Adıgüzel, Abdulcabbar, and Muhammed Sadık Özbek. 2026. “Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts”. Dinbilimleri Akademik Araştırma Dergisi 26 (2): 909-41. https://doi.org/10.33415/daad.1914124.
EndNote
Adıgüzel A, Özbek MS (June 1, 2026) Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts. Dinbilimleri Akademik Araştırma Dergisi 26 2 909–941.
IEEE
[1]A. Adıgüzel and M. S. Özbek, “Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts”, Dinbilimleri Akademik Araştırma Dergisi, vol. 26, no. 2, pp. 909–941, June 2026, doi: 10.33415/daad.1914124.
ISNAD
Adıgüzel, Abdulcabbar - Özbek, Muhammed Sadık. “Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts”. Dinbilimleri Akademik Araştırma Dergisi 26/2 (June 1, 2026): 909-941. https://doi.org/10.33415/daad.1914124.
JAMA
1.Adıgüzel A, Özbek MS. Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts. Dinbilimleri Akademik Araştırma Dergisi. 2026;26:909–941.
MLA
Adıgüzel, Abdulcabbar, and Muhammed Sadık Özbek. “Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts”. Dinbilimleri Akademik Araştırma Dergisi, vol. 26, no. 2, June 2026, pp. 909-41, doi:10.33415/daad.1914124.
Vancouver
1.Abdulcabbar Adıgüzel, Muhammed Sadık Özbek. Digital Paleography in Arabic Manuscripts: The Performance Levels Of The Gemini Model in Recognizing Naskh, Ta‘liq, And Ruq‘ah Scripts. Dinbilimleri Akademik Araştırma Dergisi. 2026 Jun. 1;26(2):909-41. doi:10.33415/daad.1914124