The segmentation process entails dividing or decomposing the entire document image into segments. This operation serves as a fundamental step in developing any writing or optical character recognition system. However, numerous existing segmentation schemes encounter challenges when dealing with specific script styles, like ancient or historical Arabic writing found in ancient manuscripts, which possesses unique characteristics. These characteristics include inclined text lines, overlapping letters, diacritic marks, decorative elements, variable letter forms, and ligatures (combinations of two or more letters merged to form a single connected shape). Thus, in this paper, we present a thorough survey of the field. The survey is composed of two parts. The first section provides a concise overview of historical Arabic documents. The second, which serves as the primary section, focuses on the crucial step of handwritten document recognition, specifically segmentation. A detailed and systematic overview of various segmentation approaches at different levels for extracting handwritten Arabic text-lines is outlined, followed by a literature study analyzing proposed works in this area.
Primary Language | English |
---|---|
Subjects | Artificial Intelligence (Other) |
Journal Section | Articles |
Authors | |
Early Pub Date | May 28, 2024 |
Publication Date | June 13, 2024 |
Submission Date | December 20, 2023 |
Acceptance Date | April 3, 2024 |
Published in Issue | Year 2024 |
International Journal of Informatics and Applied Mathematics