Physical structure extraction of Algerian baccalaureate transcripts

Abderrahmane Kefali; Ahlem Obeizi; Chokri Ferkous

Research Article

Year 2019, Volume: 2 Issue: 1, 48 - 72, 23.09.2019

Abderrahmane Kefali , Ahlem Obeizi Chokri Ferkous

Abstract

References

M. Agrawal, and D. Doermann, “Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features,” Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011-1015, 2009.
M. Agrawal, and D. Doermann, “Context-aware and content-based dynamic Voronoi page segmentation,” Proc. 9th IAPR International Workshop on Document Analysis Systems, pp. 73-80, 2010.
O.T. Akindele, and A. Belaid, “Page segmentation by segment tracing,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 341-344, 1993.
A. Amin, and R. Shiu, “Page segmentation and classification utilizing bottom-up approach,” International Journal of Image and Graphics, vol. 1, No. 2, pp. 345-361, 2001.‏
A. Antonacopoulos, and R.T. Ritchings, “Flexible page segmentation using the background,” Proc. 12th IAPR International Conference on Pattern Recognition, vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 339-344, 1994.
A. Ben Salah, “Maîtrise de la qualité des transcriptions numériques dans les projets de numérisation de masse,” doctoral dissertation, Université de Rouen-France, 2014.
R.N. Bracewell, “Two-Dimensional Imaging,” Englewood Cliffs: Prentice Hall, vol. 247, 1995, pp. 505-537.
C. Carton, A. Lemaitre, and B. Coüasnon, “Automatic and interactive rule inference without ground truth,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 696-700, 2015.
K. Chen, F. Yin, and C.L. Liu, “Hybrid page segmentation with efficient whitespace rectangles extraction and grouping,” Proc. 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 958-962, 2013.
N. Chinchor, “MUC-4 Evaluation Metrics,” Proc. 4th Message Understanding Conference, pp. 22-29, 1992.
B. Coüasnon, “DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way,” International Journal of Document Analysis and Recognition (IJDAR), vol. 8, No. 2-3, pp. 111-122, 2006.
P. Courmontagne, “Transformée de radon et filtrage : Application à la détection de sillages de mobiles marins,” TS. Traitement du signal, vol. 15, No. 4, pp. 297– 307, 1998.
S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.‏
J. Fisher, S. Hinds, and K. D’Amato, “A Rule-Based System for Document Image Segmentation,” Proc. 10th International Conference on Pattern Recognition, pp. 113-122, Atlantic City, USA, 1990.
B. Gatos, I. Pratikakis, and S.J. Perantonis, “Adaptive degraded document image binarization,” Pattern recognition, vol. 39, No. 3, pp. 317-327, 2006.‏
S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.
C. Faure, and N. Vincent, “Simultaneous detection of vertical and horizontal text lines based on perceptual organization,” In Document Recognition and Retrieval XVI, vol. 7247, pp. 72470M, International Society for Optics and Photonics, 2009.
A. Kefali, T. Sari, H. Bahi, “ Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts”, Informatica, vol. 38, No. 4, pp. 329–338, 2014.
A. Kefali, and S. Drabsia, “Localization of scores and average in Algerian baccalaureate transcripts,” Proc. International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1-6, 2018.‏
A. Kefali, A. Obeizi, and C. Ferkous, “Segmentation of Algerian baccalaureate transcripts,” Proc. 2nd Conference on Informatics and Applied Mathematics, Guelma - Algeria, 2019.
K. Kise, A. Sato, and M. Iwata, “Segmentation of Page Images Using the Area Voronoi Diagram,” Computer Vision and Image Understanding, vol. 70, No. 3, pp. 370-382, 1998.
F. Lebourgeois, Z. Bublinski, and H. Emptoz, “A Fast and Efficient Method for Extracting Text Paragraphs and Graphics From Unconstrained Documents,” Proc. 11th International Conference on Pattern Recognition, pp. 272-276, The Hague, 1992.
A. Lemaitre, J. Camillerapp, and B. Couasnon, “Contribution of multiresolution description for archive document structure recognition,” Proc. 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 247-251, 2007.
A. Lemaitre, J. Camillerapp, and B. Coüasnon, “Multiresolution cooperation makes easier document structure recognition,” International Journal of Document Analysis and Recognition (IJDAR), vol. 11, No. 2, pp. 97-109, 2008.
G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line and word segmentation of handwritten documents,” Pattern Recognition, vol. 42, No. 12, pp. 3169-3183, 2009.
S. Mao, A. Rosenfeld, and T. Kanungo, “Document structure analysis algorithms: a literature survey,” In Document Recognition and Retrieval X, vol. 5010, International Society for Optics and Photonics, 2003 pp. 197-208.‏
G. Nagy, and S. Seth, “Hierarchical representation of optically scanned documents,” Proc. 7th International Conference on Pattern Recognition (ICPR), pp. 347-349, 1984.
G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals,” Computer, vol. 25, No. 7, pp. 10-22, 1992.
L. O'Gorman, “The document spectrum for page layout analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 11, pp. 1162-1173, 1993.
N. Ouwayed, and A. Belaïd, “A general approach for multi-oriented text line extraction of handwritten documents,” International Journal on Document Analysis and Recognition (IJDAR), vol. 15, No. 4, pp. 297-314, 2012.
T. Pavlidis, and Z. Jiangying, “Page segmentation and classification,” CVGIP: Graphical models and image processing, vol. 54, No. 6, pp. 484-496, 1992.
R. Sarkar, S. Moulik, N. Das, S. Basu, M. Nasipuri, and M. Kundu, “Suppression of non-text components in handwritten document images,” Proc. International Conference on Image Information Processing, pp. 1-7, 2011.
Z. Shi, and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” Proc. International Workshop on Document Image Analysis for Libraries, pp. 23–24, 2004.
A.L. Spitz, “Recognition processing for multilingual documents,” Proc. International Conference on Electronic Publishing, Document Manipulation and Typography, pp. 193-205, Gaithe rsburg, Maryland, 1990.
N. Stamatopoulos, B. Gatos, and S.J. Perantonis, “A method for combining complementary techniques for document image segmentation,” Pattern Recognition, vol. 42, No. 12, pp. 3158-3168, 2009.
D. Sylwester, and S. Seth, “A trainable, single-pass algorithm for column segmentation,” Proc. 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 615-618, 1995.
T.A. Tran, I.S. Na, and S.H. Kim, “Hybrid page segmentation using multilevel homogeneity structure,” Proc. 9th International Conference on Ubiquitous Information Management and Communication, pp. 78, 2015.
T.A. Tran, I.S. Na, and S.H. Kim, “Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology,” International Journal on Document Analysis and Recognition (IJDAR), vol. 19, No. 3, pp. 191-209, 2016.
M. Viswanathan, “Analysis of scanned documents - A syntactic approach,” In Structured Document Image Analysis, pp. 115-136, Springer, Berlin, Heidelberg, 1992.
F. Wahl, K. Wong , and R. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer Vision Graphics, and Image Processing, vol. 20, pp. 375-390, 1982.
D. Wang, and S.N. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision Graphics and Image Processing, vol. 47, No. 3, pp. 327 - 352, 1989.
Y. Wang, Y. Zhou, and Z. Tang, “Comic frame extraction via line segments combination,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 856-860, 2015.
A. Yamashita, T. Amano, Y. Hirayama, N. Itoh, S. Katoh, T. Mano, and K. Toyokawa, “A document recognition system and its applications,” IBM journal of research and development, vol. 40, No. 3, pp. 341-352, 1996.

Physical structure extraction of Algerian baccalaureate transcripts

Year 2019, Volume: 2 Issue: 1, 48 - 72, 23.09.2019

Abderrahmane Kefali , Ahlem Obeizi Chokri Ferkous

Abstract

In recent years, Algerian universities have become aware
of the interest of electronic archiving and the digitization of archives for a better management of their documents.
The development of systems enabling the analysis and understanding of archival
documents became an unavoidable need. The present paper follows this trend; it
proposes a system for the analysis of the physical structure of Algerian
baccalaureate transcripts, stored in the universities archives. The proposed
system proceeds in two phases: 1) preprocessing, in which several
operations are applied in order to reduce the noise present in the input
images. 2) Segmentation; It starts with the elimination of the
transcript border. Then, it extracts the text lines and the blocks, based on
RLSA algorithm and the projection profiles analysis. After, it proceeds to the
classification of the blocks in three: textual block, table, and graphic.
Finally, it recovers textual content from textual blocks and tables.

Keywords

structure of document , document understanding , segmentation , document image

References

M. Agrawal, and D. Doermann, “Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features,” Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011-1015, 2009.
M. Agrawal, and D. Doermann, “Context-aware and content-based dynamic Voronoi page segmentation,” Proc. 9th IAPR International Workshop on Document Analysis Systems, pp. 73-80, 2010.
O.T. Akindele, and A. Belaid, “Page segmentation by segment tracing,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 341-344, 1993.
A. Amin, and R. Shiu, “Page segmentation and classification utilizing bottom-up approach,” International Journal of Image and Graphics, vol. 1, No. 2, pp. 345-361, 2001.‏
A. Antonacopoulos, and R.T. Ritchings, “Flexible page segmentation using the background,” Proc. 12th IAPR International Conference on Pattern Recognition, vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 339-344, 1994.
A. Ben Salah, “Maîtrise de la qualité des transcriptions numériques dans les projets de numérisation de masse,” doctoral dissertation, Université de Rouen-France, 2014.
R.N. Bracewell, “Two-Dimensional Imaging,” Englewood Cliffs: Prentice Hall, vol. 247, 1995, pp. 505-537.
C. Carton, A. Lemaitre, and B. Coüasnon, “Automatic and interactive rule inference without ground truth,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 696-700, 2015.
K. Chen, F. Yin, and C.L. Liu, “Hybrid page segmentation with efficient whitespace rectangles extraction and grouping,” Proc. 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 958-962, 2013.
N. Chinchor, “MUC-4 Evaluation Metrics,” Proc. 4th Message Understanding Conference, pp. 22-29, 1992.
B. Coüasnon, “DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way,” International Journal of Document Analysis and Recognition (IJDAR), vol. 8, No. 2-3, pp. 111-122, 2006.
P. Courmontagne, “Transformée de radon et filtrage : Application à la détection de sillages de mobiles marins,” TS. Traitement du signal, vol. 15, No. 4, pp. 297– 307, 1998.
S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.‏
J. Fisher, S. Hinds, and K. D’Amato, “A Rule-Based System for Document Image Segmentation,” Proc. 10th International Conference on Pattern Recognition, pp. 113-122, Atlantic City, USA, 1990.
B. Gatos, I. Pratikakis, and S.J. Perantonis, “Adaptive degraded document image binarization,” Pattern recognition, vol. 39, No. 3, pp. 317-327, 2006.‏
S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.
C. Faure, and N. Vincent, “Simultaneous detection of vertical and horizontal text lines based on perceptual organization,” In Document Recognition and Retrieval XVI, vol. 7247, pp. 72470M, International Society for Optics and Photonics, 2009.
A. Kefali, T. Sari, H. Bahi, “ Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts”, Informatica, vol. 38, No. 4, pp. 329–338, 2014.
A. Kefali, and S. Drabsia, “Localization of scores and average in Algerian baccalaureate transcripts,” Proc. International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1-6, 2018.‏
A. Kefali, A. Obeizi, and C. Ferkous, “Segmentation of Algerian baccalaureate transcripts,” Proc. 2nd Conference on Informatics and Applied Mathematics, Guelma - Algeria, 2019.
K. Kise, A. Sato, and M. Iwata, “Segmentation of Page Images Using the Area Voronoi Diagram,” Computer Vision and Image Understanding, vol. 70, No. 3, pp. 370-382, 1998.
F. Lebourgeois, Z. Bublinski, and H. Emptoz, “A Fast and Efficient Method for Extracting Text Paragraphs and Graphics From Unconstrained Documents,” Proc. 11th International Conference on Pattern Recognition, pp. 272-276, The Hague, 1992.
A. Lemaitre, J. Camillerapp, and B. Couasnon, “Contribution of multiresolution description for archive document structure recognition,” Proc. 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 247-251, 2007.
A. Lemaitre, J. Camillerapp, and B. Coüasnon, “Multiresolution cooperation makes easier document structure recognition,” International Journal of Document Analysis and Recognition (IJDAR), vol. 11, No. 2, pp. 97-109, 2008.
G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line and word segmentation of handwritten documents,” Pattern Recognition, vol. 42, No. 12, pp. 3169-3183, 2009.
S. Mao, A. Rosenfeld, and T. Kanungo, “Document structure analysis algorithms: a literature survey,” In Document Recognition and Retrieval X, vol. 5010, International Society for Optics and Photonics, 2003 pp. 197-208.‏
G. Nagy, and S. Seth, “Hierarchical representation of optically scanned documents,” Proc. 7th International Conference on Pattern Recognition (ICPR), pp. 347-349, 1984.
G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals,” Computer, vol. 25, No. 7, pp. 10-22, 1992.
L. O'Gorman, “The document spectrum for page layout analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 11, pp. 1162-1173, 1993.
N. Ouwayed, and A. Belaïd, “A general approach for multi-oriented text line extraction of handwritten documents,” International Journal on Document Analysis and Recognition (IJDAR), vol. 15, No. 4, pp. 297-314, 2012.
T. Pavlidis, and Z. Jiangying, “Page segmentation and classification,” CVGIP: Graphical models and image processing, vol. 54, No. 6, pp. 484-496, 1992.
R. Sarkar, S. Moulik, N. Das, S. Basu, M. Nasipuri, and M. Kundu, “Suppression of non-text components in handwritten document images,” Proc. International Conference on Image Information Processing, pp. 1-7, 2011.
Z. Shi, and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” Proc. International Workshop on Document Image Analysis for Libraries, pp. 23–24, 2004.
A.L. Spitz, “Recognition processing for multilingual documents,” Proc. International Conference on Electronic Publishing, Document Manipulation and Typography, pp. 193-205, Gaithe rsburg, Maryland, 1990.
N. Stamatopoulos, B. Gatos, and S.J. Perantonis, “A method for combining complementary techniques for document image segmentation,” Pattern Recognition, vol. 42, No. 12, pp. 3158-3168, 2009.
D. Sylwester, and S. Seth, “A trainable, single-pass algorithm for column segmentation,” Proc. 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 615-618, 1995.
T.A. Tran, I.S. Na, and S.H. Kim, “Hybrid page segmentation using multilevel homogeneity structure,” Proc. 9th International Conference on Ubiquitous Information Management and Communication, pp. 78, 2015.
T.A. Tran, I.S. Na, and S.H. Kim, “Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology,” International Journal on Document Analysis and Recognition (IJDAR), vol. 19, No. 3, pp. 191-209, 2016.
M. Viswanathan, “Analysis of scanned documents - A syntactic approach,” In Structured Document Image Analysis, pp. 115-136, Springer, Berlin, Heidelberg, 1992.
F. Wahl, K. Wong , and R. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer Vision Graphics, and Image Processing, vol. 20, pp. 375-390, 1982.
D. Wang, and S.N. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision Graphics and Image Processing, vol. 47, No. 3, pp. 327 - 352, 1989.
Y. Wang, Y. Zhou, and Z. Tang, “Comic frame extraction via line segments combination,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 856-860, 2015.
A. Yamashita, T. Amano, Y. Hirayama, N. Itoh, S. Katoh, T. Mano, and K. Toyokawa, “A document recognition system and its applications,” IBM journal of research and development, vol. 40, No. 3, pp. 341-352, 1996.

There are 43 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Articles
Authors	Abderrahmane Kefali Ahlem Obeizi This is me Chokri Ferkous This is me
Publication Date	September 23, 2019
Acceptance Date	September 8, 2019
Published in Issue	Year 2019 Volume: 2 Issue: 1

Cite

APA	Kefali, A., Obeizi, A., & Ferkous, C. (2019). Physical structure extraction of Algerian baccalaureate transcripts. International Journal of Informatics and Applied Mathematics, 2(1), 48-72.
AMA	Kefali A, Obeizi A, Ferkous C. Physical structure extraction of Algerian baccalaureate transcripts. IJIAM. September 2019;2(1):48-72.
Chicago	Kefali, Abderrahmane, Ahlem Obeizi, and Chokri Ferkous. “Physical Structure Extraction of Algerian Baccalaureate Transcripts”. International Journal of Informatics and Applied Mathematics 2, no. 1 (September 2019): 48-72.
EndNote	Kefali A, Obeizi A, Ferkous C (September 1, 2019) Physical structure extraction of Algerian baccalaureate transcripts. International Journal of Informatics and Applied Mathematics 2 1 48–72.
IEEE	A. Kefali, A. Obeizi, and C. Ferkous, “Physical structure extraction of Algerian baccalaureate transcripts”, IJIAM, vol. 2, no. 1, pp. 48–72, 2019.
ISNAD	Kefali, Abderrahmane et al. “Physical Structure Extraction of Algerian Baccalaureate Transcripts”. International Journal of Informatics and Applied Mathematics 2/1 (September2019), 48-72.
JAMA	Kefali A, Obeizi A, Ferkous C. Physical structure extraction of Algerian baccalaureate transcripts. IJIAM. 2019;2:48–72.
MLA	Kefali, Abderrahmane et al. “Physical Structure Extraction of Algerian Baccalaureate Transcripts”. International Journal of Informatics and Applied Mathematics, vol. 2, no. 1, 2019, pp. 48-72.
Vancouver	Kefali A, Obeizi A, Ferkous C. Physical structure extraction of Algerian baccalaureate transcripts. IJIAM. 2019;2(1):48-72.

Article Files

Full Text

International Journal of Informatics and Applied Mathematics