Physical structure extraction of Algerian baccalaureate transcripts
Abstract
In recent years, Algerian universities have become aware of the interest of electronic archiving and the digitization of archives for a better management of their documents. The development of systems enabling the analysis and understanding of archival documents became an unavoidable need. The present paper follows this trend; it proposes a system for the analysis of the physical structure of Algerian baccalaureate transcripts, stored in the universities archives. The proposed system proceeds in two phases: 1) preprocessing, in which several operations are applied in order to reduce the noise present in the input images. 2) Segmentation; It starts with the elimination of the transcript border. Then, it extracts the text lines and the blocks, based on RLSA algorithm and the projection profiles analysis. After, it proceeds to the classification of the blocks in three: textual block, table, and graphic. Finally, it recovers textual content from textual blocks and tables.
Keywords
References
- M. Agrawal, and D. Doermann, “Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features,” Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011-1015, 2009.
- M. Agrawal, and D. Doermann, “Context-aware and content-based dynamic Voronoi page segmentation,” Proc. 9th IAPR International Workshop on Document Analysis Systems, pp. 73-80, 2010.
- O.T. Akindele, and A. Belaid, “Page segmentation by segment tracing,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 341-344, 1993.
- A. Amin, and R. Shiu, “Page segmentation and classification utilizing bottom-up approach,” International Journal of Image and Graphics, vol. 1, No. 2, pp. 345-361, 2001.
- A. Antonacopoulos, and R.T. Ritchings, “Flexible page segmentation using the background,” Proc. 12th IAPR International Conference on Pattern Recognition, vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 339-344, 1994.
- A. Ben Salah, “Maîtrise de la qualité des transcriptions numériques dans les projets de numérisation de masse,” doctoral dissertation, Université de Rouen-France, 2014.
- R.N. Bracewell, “Two-Dimensional Imaging,” Englewood Cliffs: Prentice Hall, vol. 247, 1995, pp. 505-537.
- C. Carton, A. Lemaitre, and B. Coüasnon, “Automatic and interactive rule inference without ground truth,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 696-700, 2015.
Details
Primary Language
English
Subjects
Software Engineering (Other)
Journal Section
Research Article
Authors
Abderrahmane Kefali
*
Algeria
Ahlem Obeizi
This is me
Algeria
Chokri Ferkous
This is me
Algeria
Publication Date
September 23, 2019
Submission Date
July 19, 2019
Acceptance Date
September 8, 2019
Published in Issue
Year 2019 Volume: 2 Number: 1