A Survey on Text-Line Segmentation in Arab Historical Manuscripts

Soumia Djaghbellou; Abdelouahab Attıa; Abderraouf Bouzıane

doi:10.53508/ijiam.1407236

EN

A Survey on Text-Line Segmentation in Arab Historical Manuscripts

Abstract

The segmentation process entails dividing or decomposing the entire document image into segments. This operation serves as a fundamental step in developing any writing or optical character recognition system. However, numerous existing segmentation schemes encounter challenges when dealing with specific script styles, like ancient or historical Arabic writing found in ancient manuscripts, which possesses unique characteristics. These characteristics include inclined text lines, overlapping letters, diacritic marks, decorative elements, variable letter forms, and ligatures (combinations of two or more letters merged to form a single connected shape). Thus, in this paper, we present a thorough survey of the field. The survey is composed of two parts. The first section provides a concise overview of historical Arabic documents. The second, which serves as the primary section, focuses on the crucial step of handwritten document recognition, specifically segmentation. A detailed and systematic overview of various segmentation approaches at different levels for extracting handwritten Arabic text-lines is outlined, followed by a literature study analyzing proposed works in this area.

Keywords

References

Paola Orsatti. Le manuscrit islamique: caract´eristiques mat´erielles et typologie. In Ancient and Medieval Book Materials and Techniques, volume 2, pages 269–331. Biblioteca Apostolica Vaticana, 1993.
Ayman Al-Dmour and Fares Fraij. Segmenting arabic handwritten documents into text lines and words. International journal of Advancements in Computing technology, 6(3):109, 2014.
Islamic medical manuscripts at the national library of medicine. https://www.nlm.nih.gov/hmd/arabic/arabichome.html. Accessed: 2023-03-10.
Bibliothèque nationale de tunisie. http://www.bibliotheque.nat.tn. Accessed: 2023-03-10.
Thibault Lebore. Segmentation d’image application aux documents anciens. Mémoire de Master de recherche, Université de Nante, France, 2007.
Takwa Ben Aïcha Gader and Afef Kacem Echi. Unconstrained handwritten arabic text-lines segmentation based on ar2u-net. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 349–354. IEEE, 2020.
A Bennasri, A Zahour, and B Taconet. Extraction des lignes d’un texte manuscrit arabe. In Vision interface, volume 99, pages 42–48, 1999.
Alamri Huda, J Sadri, CY Suen, and Nicola Nobile. A novel comprehensive database for arabic off-line handwriting recognition. In Proceedings of 11th International Conference on Frontiers in Handwriting Recognition, ICFHR, volume 8, pages 664–669, 2008.

AM Hafiz, ZA Bhat, M Jan, MU Bhat, IB Sofi, IA Tantray, and GM Bhat. Ku±database of handwritten arabic words. 2016.
Ahmed El-Sawy, Mohamed Loey, and Hazem El-Bakry. Arabic handwritten characters recognition using convolutional neural network. WSEAS Transactions on Computer Research, 5(1):11–19, 2017.
Abdel Belaïd. Analyse du document: de l’image à la représentation par les normes de codage. Document numérique, 1(1):21–38, 1997.
Hala Kaileh. L’accès à distance aux manuscrits arabes numérisés en mode image. PhD thesis, Lyon 2, 2004.
N. van den Boogert. Some notes on maghribi script. https://books.google.dz/books?id=MK9tQwAACAAJ, 1989.
Christian Wolf and David Doermann. Binarization of low quality text using a markov random field model. In 2002 International Conference on Pattern Recognition, volume 3, pages 160–163. IEEE, 2002.
Nawwaf Kharma, Maher Ahmed, andRababWard. Anewcomprehensive database of handwritten arabic words, numbers, and signatures used for ocr testing. In Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No. 99TH8411), volume 2, pages 766–768. IEEE, 1999.
Ahmed Lawgali, Maia Angelova, and Ahmed Bouridane. Hacdb: Handwritten arabic characters database for automatic character recognition. In European workshop on visual information processing (EUVIP), pages 255–259. IEEE, 2013.
Vassilis Papavassiliou, Themos Stafylakis, Vassilis Katsouros, and George Carayannis. Handwritten document image segmentation into text lines and words. Pattern recognition, 43(1):369–377, 2010.
Mario Pechwitz, S Snoussi Maddouri, Volker Märgner, Noureddine Ellouze, Hamid Amiri, et al. Ifn/enit-database of handwritten arabic words. In Proc. of CIFED, volume 2, pages 127–136. Citeseer, 2002.
Hassanin M Al-Barhamtoshy, Kamal M Jambi, Hany Ahmed, Shaimaa Mohamed, Sherif M Abdo, and Mohsen A Rashwan. Arabic calligraphy, typewritten and handwritten using optical character recognition (ocr) system.
Amani Ali Ahmed Ali and M Suresha. Survey on segmentation and recognition of handwritten arabic script. SN Computer Science, 1(4):192, 2020.
Yi-Hong Tseng and Hsi-Jian Lee. Recognition-based handwritten chinese character segmentation using a probabilistic viterbi algorithm. Pattern Recognition Letters, 20(8):791–806, 1999.
Laurence Likforman-Sulem, Abderrazak Zahour, and Bruno Taconet. Text line segmentation of historical documents: a survey. International Journal of Document Analysis and Recognition (IJDAR), 9:123–138, 2007.
Wafa Boussellaa, Abderrazak Zahour, Haikal Elabed, Abdellatif Benabdelhafid, and Adel M Alimi. Unsupervised block covering analysis for text-line segmentation of arabic ancient handwritten document images. In 2010 20th International Conference on Pattern Recognition, pages 1929–1932. IEEE, 2010.
Jayant Kumar, Wael Abd-Almageed, Le Kang, and David Doermann. Handwritten arabic text line segmentation using affinity propagation. In Proceedings of the 9th IAPR international workshop on document analysis systems, pages 135–142, 2010.
Jayant Kumar, Le Kang, David Doermann, and Wael Abd-Almageed. Segmentation of handwritten textlines in presence of touching components. In 2011 International Conference on Document Analysis and Recognition, pages 109–113. IEEE, 2011.
Muna Khayyat, Louisa Lam, Ching Y Suen, Fei Yin, and Cheng-Lin Liu. Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In 2012 10th IAPR International Workshop on Document Analysis Systems, pages 100–104. IEEE, 2012.
Ayman Al-Dmour and Fares Fraij. Segmenting arabic handwritten documents into text lines and words. International journal of Advancements in Computing technology, 6(3):109, 2014.
MSuresha and Amani Ali Ahmed Ali. Segmentation of handwritten text lines with touching of line. International Journal of Computer Engineering and Applications, 12(6):1–12, 2018.
Chemseddine Neche, Abdel Belaid, and Afef Kacem-Echi. Arabic handwritten documents segmentation into text-lines and words using deep learning. In 2019 international conference on document analysis and recognition workshops (ICDARW), volume 6, pages 19–24. IEEE, 2019.
Olfa Mechi, Maroua Mehri, Rolf Ingold, and Najoua Essoukri Ben Amara. Combining deep and ad-hoc solutions to localize text lines in ancient arabic document images. In 25th International Conference on Pattern Recognition (ICPR), pages 7759–7766. IEEE, 2021.
Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, Houda Latrache, and Mourad Abbas. Arabic handwritten text to line segmentation. In International Conference on Information Systems and Advanced Technologies (ICISAT), pages 1–5, 2021.
Takwa Ben Aïcha Gader and Afef Kacem Echi. Deep learning-based segmentation of connected components in arabic handwritten documents. In International Conference on Intelligent Systems and Pattern Recognition, pages 93–106. Springer, 2022.
Hakim A Abdo, Ahmed Abdu, Ramesh R Manza, and Shobha Bawiskar. An approach to analysis of arabic text documents into text lines, words, and characters. Indonesian Journal of Electrical Engineering and Computer Science, 26(2):754763, 2022.
Somaya Al-Ma’adeed, Dave Elliman, and Colin A Higgins. A data base for arabic handwritten text recognition research. In Proceedings eighth international workshop on frontiers in handwriting recognition, pages 485–489. IEEE, 2002.
Yousef Al-Ohali, Mohamed Cheriet, and Ching Suen. Databases for recognition of handwritten arabic cheques. Pattern Recognition, 36(1):111–121, 2003.
Saeed Mozaffari, Karim Faez, Farhad Faradji, Majid Ziaratban, and S Mohamad Golzan. A comprehensive isolated farsi/arabic character database for handwritten ocr research. In Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.
Ezzat Ali El-Sherif and Sherif Abdelazeem. A two-stage system for arabic handwritten digit recognition tested on a new large database. In Artificial intelligence and pattern recognition, pages 237–242, 2007.
AMER Al-Nassiri and SHUBAIR A ABDULLA. A new arabic (ahd/amsh) handwritten database. https://www.acit2k.org/ACIT2007/Proceeding/59.pdf.
Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M Alimi, and Jean Hennebert. A new arabic printed text image database and evaluation protocols. In 2009 10th international conference on document analysis and recognition, pages 946–950. IEEE, 2009.
Mohamed EM Musa. Arabic handwritten datasets for pattern recognition and machine learning. In 2011 5th International Conference on Application of Information and Communication Technologies (AICT), pages 1–3. IEEE, 2011.
Sabri A Mahmoud, Irfan Ahmad, Wasfi G Al-Khatib, Mohammad Alshayeb, Mohammad Tanvir Parvez, Volker Märgner, and Gernot A Fink. Khatt: An open arabic offline handwritten text database. Pattern Recognition, 47(3):1096–1112, 2014.
Marwan Torki, Mohamed E Hussein, Ahmed Elsallamy, Mahmoud Fayyaz, and Shehab Yaser. Window-based descriptors for arabic handwritten alphabet recognition: a comparative study on a novel dataset. https://arxiv.org/pdf/1411.3519, 2014.
N Lamghari and S Raghay. Recognition of arabic handwritten diacritics using the new database dbahd. In Journal of Physics: Conference Series, volume 1743, page 012023. IOP Publishing, 2021.
Soumia Djaghbellou, Abdelouahab Attia, Abderraouf Bouziane, and Zahid Akhtar. Local features enhancement using deep auto-encoder scheme for the recognition of the proposed handwritten arabic-maghrebi characters database. Multimedia Tools and Applications, 81(22):31553–31571, 2022.

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Review Article

Authors

Soumia Djaghbellou ^*
Algeria

Abdelouahab Attıa
Algeria

Abderraouf Bouzıane
Algeria

Early Pub Date

May 28, 2024

Publication Date

June 13, 2024

Submission Date

December 20, 2023

Acceptance Date

April 3, 2024

Published in Issue

Year 2024 Volume: 7 Number: 1

DOI

https://doi.org/10.53508/ijiam.1407236

IZ

https://izlik.org/JA37YG88CW

Cite

RIS / Bibtex

APA

Djaghbellou, S., Attıa, A., & Bouzıane, A. (2024). A Survey on Text-Line Segmentation in Arab Historical Manuscripts. International Journal of Informatics and Applied Mathematics, 7(1), 14-32. https://doi.org/10.53508/ijiam.1407236

AMA

1.Djaghbellou S, Attıa A, Bouzıane A. A Survey on Text-Line Segmentation in Arab Historical Manuscripts. IJIAM. 2024;7(1):14-32. doi:10.53508/ijiam.1407236

Chicago

Djaghbellou, Soumia, Abdelouahab Attıa, and Abderraouf Bouzıane. 2024. “A Survey on Text-Line Segmentation in Arab Historical Manuscripts”. International Journal of Informatics and Applied Mathematics 7 (1): 14-32. https://doi.org/10.53508/ijiam.1407236.

EndNote

Djaghbellou S, Attıa A, Bouzıane A (June 1, 2024) A Survey on Text-Line Segmentation in Arab Historical Manuscripts. International Journal of Informatics and Applied Mathematics 7 1 14–32.

IEEE

[1]S. Djaghbellou, A. Attıa, and A. Bouzıane, “A Survey on Text-Line Segmentation in Arab Historical Manuscripts”, IJIAM, vol. 7, no. 1, pp. 14–32, June 2024, doi: 10.53508/ijiam.1407236.

ISNAD

Djaghbellou, Soumia - Attıa, Abdelouahab - Bouzıane, Abderraouf. “A Survey on Text-Line Segmentation in Arab Historical Manuscripts”. International Journal of Informatics and Applied Mathematics 7/1 (June 1, 2024): 14-32. https://doi.org/10.53508/ijiam.1407236.

JAMA

1.Djaghbellou S, Attıa A, Bouzıane A. A Survey on Text-Line Segmentation in Arab Historical Manuscripts. IJIAM. 2024;7:14–32.

MLA

Djaghbellou, Soumia, et al. “A Survey on Text-Line Segmentation in Arab Historical Manuscripts”. International Journal of Informatics and Applied Mathematics, vol. 7, no. 1, June 2024, pp. 14-32, doi:10.53508/ijiam.1407236.

Vancouver

1.Soumia Djaghbellou, Abdelouahab Attıa, Abderraouf Bouzıane. A Survey on Text-Line Segmentation in Arab Historical Manuscripts. IJIAM. 2024 Jun. 1;7(1):14-32. doi:10.53508/ijiam.1407236

Cited By

Recent advances in text line segmentation and baseline detection in historical document images: a systematic review

International Journal on Document Analysis and Recognition (IJDAR)

https://doi.org/10.1007/s10032-025-00526-w

A Systematic Literature Review of Deep Learning Methods for Handwritten Text Recognition in Historical Arabic Manuscripts

Engineering, Technology & Applied Science Research

https://doi.org/10.48084/etasr.12123