Research Article
PDF Zotero Mendeley EndNote BibTex Cite

Year 2019, Volume 2, Issue 1, 48 - 72, 23.09.2019

Abstract

References

  • M. Agrawal, and D. Doermann, “Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features,” Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011-1015, 2009.
  • M. Agrawal, and D. Doermann, “Context-aware and content-based dynamic Voronoi page segmentation,” Proc. 9th IAPR International Workshop on Document Analysis Systems, pp. 73-80, 2010.
  • O.T. Akindele, and A. Belaid, “Page segmentation by segment tracing,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 341-344, 1993.
  • A. Amin, and R. Shiu, “Page segmentation and classification utilizing bottom-up approach,” International Journal of Image and Graphics, vol. 1, No. 2, pp. 345-361, 2001.‏
  • A. Antonacopoulos, and R.T. Ritchings, “Flexible page segmentation using the background,” Proc. 12th IAPR International Conference on Pattern Recognition, vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 339-344, 1994.
  • A. Ben Salah, “Maîtrise de la qualité des transcriptions numériques dans les projets de numérisation de masse,” doctoral dissertation, Université de Rouen-France, 2014.
  • R.N. Bracewell, “Two-Dimensional Imaging,” Englewood Cliffs: Prentice Hall, vol. 247, 1995, pp. 505-537.
  • C. Carton, A. Lemaitre, and B. Coüasnon, “Automatic and interactive rule inference without ground truth,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 696-700, 2015.
  • K. Chen, F. Yin, and C.L. Liu, “Hybrid page segmentation with efficient whitespace rectangles extraction and grouping,” Proc. 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 958-962, 2013.
  • N. Chinchor, “MUC-4 Evaluation Metrics,” Proc. 4th Message Understanding Conference, pp. 22-29, 1992.
  • B. Coüasnon, “DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way,” International Journal of Document Analysis and Recognition (IJDAR), vol. 8, No. 2-3, pp. 111-122, 2006.
  • P. Courmontagne, “Transformée de radon et filtrage : Application à la détection de sillages de mobiles marins,” TS. Traitement du signal, vol. 15, No. 4, pp. 297– 307, 1998.
  • S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.‏
  • J. Fisher, S. Hinds, and K. D’Amato, “A Rule-Based System for Document Image Segmentation,” Proc. 10th International Conference on Pattern Recognition, pp. 113-122, Atlantic City, USA, 1990.
  • B. Gatos, I. Pratikakis, and S.J. Perantonis, “Adaptive degraded document image binarization,” Pattern recognition, vol. 39, No. 3, pp. 317-327, 2006.‏
  • S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.
  • C. Faure, and N. Vincent, “Simultaneous detection of vertical and horizontal text lines based on perceptual organization,” In Document Recognition and Retrieval XVI, vol. 7247, pp. 72470M, International Society for Optics and Photonics, 2009.
  • A. Kefali, T. Sari, H. Bahi, “ Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts”, Informatica, vol. 38, No. 4, pp. 329–338, 2014.
  • A. Kefali, and S. Drabsia, “Localization of scores and average in Algerian baccalaureate transcripts,” Proc. International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1-6, 2018.‏
  • A. Kefali, A. Obeizi, and C. Ferkous, “Segmentation of Algerian baccalaureate transcripts,” Proc. 2nd Conference on Informatics and Applied Mathematics, Guelma - Algeria, 2019.
  • K. Kise, A. Sato, and M. Iwata, “Segmentation of Page Images Using the Area Voronoi Diagram,” Computer Vision and Image Understanding, vol. 70, No. 3, pp. 370-382, 1998.
  • F. Lebourgeois, Z. Bublinski, and H. Emptoz, “A Fast and Efficient Method for Extracting Text Paragraphs and Graphics From Unconstrained Documents,” Proc. 11th International Conference on Pattern Recognition, pp. 272-276, The Hague, 1992.
  • A. Lemaitre, J. Camillerapp, and B. Couasnon, “Contribution of multiresolution description for archive document structure recognition,” Proc. 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 247-251, 2007.
  • A. Lemaitre, J. Camillerapp, and B. Coüasnon, “Multiresolution cooperation makes easier document structure recognition,” International Journal of Document Analysis and Recognition (IJDAR), vol. 11, No. 2, pp. 97-109, 2008.
  • G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line and word segmentation of handwritten documents,” Pattern Recognition, vol. 42, No. 12, pp. 3169-3183, 2009.
  • S. Mao, A. Rosenfeld, and T. Kanungo, “Document structure analysis algorithms: a literature survey,” In Document Recognition and Retrieval X, vol. 5010, International Society for Optics and Photonics, 2003 pp. 197-208.‏
  • G. Nagy, and S. Seth, “Hierarchical representation of optically scanned documents,” Proc. 7th International Conference on Pattern Recognition (ICPR), pp. 347-349, 1984.
  • G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals,” Computer, vol. 25, No. 7, pp. 10-22, 1992.
  • L. O'Gorman, “The document spectrum for page layout analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 11, pp. 1162-1173, 1993.
  • N. Ouwayed, and A. Belaïd, “A general approach for multi-oriented text line extraction of handwritten documents,” International Journal on Document Analysis and Recognition (IJDAR), vol. 15, No. 4, pp. 297-314, 2012.
  • T. Pavlidis, and Z. Jiangying, “Page segmentation and classification,” CVGIP: Graphical models and image processing, vol. 54, No. 6, pp. 484-496, 1992.
  • R. Sarkar, S. Moulik, N. Das, S. Basu, M. Nasipuri, and M. Kundu, “Suppression of non-text components in handwritten document images,” Proc. International Conference on Image Information Processing, pp. 1-7, 2011.
  • Z. Shi, and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” Proc. International Workshop on Document Image Analysis for Libraries, pp. 23–24, 2004.
  • A.L. Spitz, “Recognition processing for multilingual documents,” Proc. International Conference on Electronic Publishing, Document Manipulation and Typography, pp. 193-205, Gaithe rsburg, Maryland, 1990.
  • N. Stamatopoulos, B. Gatos, and S.J. Perantonis, “A method for combining complementary techniques for document image segmentation,” Pattern Recognition, vol. 42, No. 12, pp. 3158-3168, 2009.
  • D. Sylwester, and S. Seth, “A trainable, single-pass algorithm for column segmentation,” Proc. 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 615-618, 1995.
  • T.A. Tran, I.S. Na, and S.H. Kim, “Hybrid page segmentation using multilevel homogeneity structure,” Proc. 9th International Conference on Ubiquitous Information Management and Communication, pp. 78, 2015.
  • T.A. Tran, I.S. Na, and S.H. Kim, “Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology,” International Journal on Document Analysis and Recognition (IJDAR), vol. 19, No. 3, pp. 191-209, 2016.
  • M. Viswanathan, “Analysis of scanned documents - A syntactic approach,” In Structured Document Image Analysis, pp. 115-136, Springer, Berlin, Heidelberg, 1992.
  • F. Wahl, K. Wong , and R. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer Vision Graphics, and Image Processing, vol. 20, pp. 375-390, 1982.
  • D. Wang, and S.N. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision Graphics and Image Processing, vol. 47, No. 3, pp. 327 - 352, 1989.
  • Y. Wang, Y. Zhou, and Z. Tang, “Comic frame extraction via line segments combination,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 856-860, 2015.
  • A. Yamashita, T. Amano, Y. Hirayama, N. Itoh, S. Katoh, T. Mano, and K. Toyokawa, “A document recognition system and its applications,” IBM journal of research and development, vol. 40, No. 3, pp. 341-352, 1996.

Physical structure extraction of Algerian baccalaureate transcripts

Year 2019, Volume 2, Issue 1, 48 - 72, 23.09.2019

Abstract

In recent years, Algerian universities have become aware of the interest of electronic archiving and the digitization of archives for a better management of their documents. The development of systems enabling the analysis and understanding of archival documents became an unavoidable need. The present paper follows this trend; it proposes a system for the analysis of the physical structure of Algerian baccalaureate transcripts, stored in the universities archives. The proposed system proceeds in two phases: 1) preprocessing, in which several operations are applied in order to reduce the noise present in the input images. 2) Segmentation; It starts with the elimination of the transcript border. Then, it extracts the text lines and the blocks, based on RLSA algorithm and the projection profiles analysis. After, it proceeds to the classification of the blocks in three: textual block, table, and graphic. Finally, it recovers textual content from textual blocks and tables.

References

  • M. Agrawal, and D. Doermann, “Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features,” Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011-1015, 2009.
  • M. Agrawal, and D. Doermann, “Context-aware and content-based dynamic Voronoi page segmentation,” Proc. 9th IAPR International Workshop on Document Analysis Systems, pp. 73-80, 2010.
  • O.T. Akindele, and A. Belaid, “Page segmentation by segment tracing,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 341-344, 1993.
  • A. Amin, and R. Shiu, “Page segmentation and classification utilizing bottom-up approach,” International Journal of Image and Graphics, vol. 1, No. 2, pp. 345-361, 2001.‏
  • A. Antonacopoulos, and R.T. Ritchings, “Flexible page segmentation using the background,” Proc. 12th IAPR International Conference on Pattern Recognition, vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 339-344, 1994.
  • A. Ben Salah, “Maîtrise de la qualité des transcriptions numériques dans les projets de numérisation de masse,” doctoral dissertation, Université de Rouen-France, 2014.
  • R.N. Bracewell, “Two-Dimensional Imaging,” Englewood Cliffs: Prentice Hall, vol. 247, 1995, pp. 505-537.
  • C. Carton, A. Lemaitre, and B. Coüasnon, “Automatic and interactive rule inference without ground truth,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 696-700, 2015.
  • K. Chen, F. Yin, and C.L. Liu, “Hybrid page segmentation with efficient whitespace rectangles extraction and grouping,” Proc. 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 958-962, 2013.
  • N. Chinchor, “MUC-4 Evaluation Metrics,” Proc. 4th Message Understanding Conference, pp. 22-29, 1992.
  • B. Coüasnon, “DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way,” International Journal of Document Analysis and Recognition (IJDAR), vol. 8, No. 2-3, pp. 111-122, 2006.
  • P. Courmontagne, “Transformée de radon et filtrage : Application à la détection de sillages de mobiles marins,” TS. Traitement du signal, vol. 15, No. 4, pp. 297– 307, 1998.
  • S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.‏
  • J. Fisher, S. Hinds, and K. D’Amato, “A Rule-Based System for Document Image Segmentation,” Proc. 10th International Conference on Pattern Recognition, pp. 113-122, Atlantic City, USA, 1990.
  • B. Gatos, I. Pratikakis, and S.J. Perantonis, “Adaptive degraded document image binarization,” Pattern recognition, vol. 39, No. 3, pp. 317-327, 2006.‏
  • S. Eskenazi, P. Gomez-Krämer, and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008,” Pattern Recognition, vol. 64, pp. 1-14, 2017.
  • C. Faure, and N. Vincent, “Simultaneous detection of vertical and horizontal text lines based on perceptual organization,” In Document Recognition and Retrieval XVI, vol. 7247, pp. 72470M, International Society for Optics and Photonics, 2009.
  • A. Kefali, T. Sari, H. Bahi, “ Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts”, Informatica, vol. 38, No. 4, pp. 329–338, 2014.
  • A. Kefali, and S. Drabsia, “Localization of scores and average in Algerian baccalaureate transcripts,” Proc. International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1-6, 2018.‏
  • A. Kefali, A. Obeizi, and C. Ferkous, “Segmentation of Algerian baccalaureate transcripts,” Proc. 2nd Conference on Informatics and Applied Mathematics, Guelma - Algeria, 2019.
  • K. Kise, A. Sato, and M. Iwata, “Segmentation of Page Images Using the Area Voronoi Diagram,” Computer Vision and Image Understanding, vol. 70, No. 3, pp. 370-382, 1998.
  • F. Lebourgeois, Z. Bublinski, and H. Emptoz, “A Fast and Efficient Method for Extracting Text Paragraphs and Graphics From Unconstrained Documents,” Proc. 11th International Conference on Pattern Recognition, pp. 272-276, The Hague, 1992.
  • A. Lemaitre, J. Camillerapp, and B. Couasnon, “Contribution of multiresolution description for archive document structure recognition,” Proc. 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 247-251, 2007.
  • A. Lemaitre, J. Camillerapp, and B. Coüasnon, “Multiresolution cooperation makes easier document structure recognition,” International Journal of Document Analysis and Recognition (IJDAR), vol. 11, No. 2, pp. 97-109, 2008.
  • G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line and word segmentation of handwritten documents,” Pattern Recognition, vol. 42, No. 12, pp. 3169-3183, 2009.
  • S. Mao, A. Rosenfeld, and T. Kanungo, “Document structure analysis algorithms: a literature survey,” In Document Recognition and Retrieval X, vol. 5010, International Society for Optics and Photonics, 2003 pp. 197-208.‏
  • G. Nagy, and S. Seth, “Hierarchical representation of optically scanned documents,” Proc. 7th International Conference on Pattern Recognition (ICPR), pp. 347-349, 1984.
  • G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals,” Computer, vol. 25, No. 7, pp. 10-22, 1992.
  • L. O'Gorman, “The document spectrum for page layout analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 11, pp. 1162-1173, 1993.
  • N. Ouwayed, and A. Belaïd, “A general approach for multi-oriented text line extraction of handwritten documents,” International Journal on Document Analysis and Recognition (IJDAR), vol. 15, No. 4, pp. 297-314, 2012.
  • T. Pavlidis, and Z. Jiangying, “Page segmentation and classification,” CVGIP: Graphical models and image processing, vol. 54, No. 6, pp. 484-496, 1992.
  • R. Sarkar, S. Moulik, N. Das, S. Basu, M. Nasipuri, and M. Kundu, “Suppression of non-text components in handwritten document images,” Proc. International Conference on Image Information Processing, pp. 1-7, 2011.
  • Z. Shi, and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” Proc. International Workshop on Document Image Analysis for Libraries, pp. 23–24, 2004.
  • A.L. Spitz, “Recognition processing for multilingual documents,” Proc. International Conference on Electronic Publishing, Document Manipulation and Typography, pp. 193-205, Gaithe rsburg, Maryland, 1990.
  • N. Stamatopoulos, B. Gatos, and S.J. Perantonis, “A method for combining complementary techniques for document image segmentation,” Pattern Recognition, vol. 42, No. 12, pp. 3158-3168, 2009.
  • D. Sylwester, and S. Seth, “A trainable, single-pass algorithm for column segmentation,” Proc. 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 615-618, 1995.
  • T.A. Tran, I.S. Na, and S.H. Kim, “Hybrid page segmentation using multilevel homogeneity structure,” Proc. 9th International Conference on Ubiquitous Information Management and Communication, pp. 78, 2015.
  • T.A. Tran, I.S. Na, and S.H. Kim, “Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology,” International Journal on Document Analysis and Recognition (IJDAR), vol. 19, No. 3, pp. 191-209, 2016.
  • M. Viswanathan, “Analysis of scanned documents - A syntactic approach,” In Structured Document Image Analysis, pp. 115-136, Springer, Berlin, Heidelberg, 1992.
  • F. Wahl, K. Wong , and R. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer Vision Graphics, and Image Processing, vol. 20, pp. 375-390, 1982.
  • D. Wang, and S.N. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision Graphics and Image Processing, vol. 47, No. 3, pp. 327 - 352, 1989.
  • Y. Wang, Y. Zhou, and Z. Tang, “Comic frame extraction via line segments combination,” Proc. 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 856-860, 2015.
  • A. Yamashita, T. Amano, Y. Hirayama, N. Itoh, S. Katoh, T. Mano, and K. Toyokawa, “A document recognition system and its applications,” IBM journal of research and development, vol. 40, No. 3, pp. 341-352, 1996.

Details

Primary Language English
Subjects Computer Science, Interdisciplinary Application
Journal Section Articles
Authors

Abderrahmane KEFALİ (Primary Author)
Département d’Informatique, Université 8 Mai 1945- Guelma
Algeria


Ahlem OBEİZİ This is me
Département d’Informatique, Université 8 Mai 1945- Guelma
Algeria


Chokri FERKOUS This is me
Département d’Informatique, Université 8 Mai 1945- Guelma
Algeria

Publication Date September 23, 2019
Published in Issue Year 2019, Volume 2, Issue 1

Cite

Bibtex @research article { ijiam594016, journal = {International Journal of Informatics and Applied Mathematics}, issn = {}, eissn = {2667-6990}, address = {}, publisher = {International Society of Academicians}, year = {2019}, volume = {2}, pages = {48 - 72}, doi = {}, title = {Physical structure extraction of Algerian baccalaureate transcripts}, key = {cite}, author = {Kefali, Abderrahmane and Obeizi, Ahlem and Ferkous, Chokri} }
APA Kefali, A. , Obeizi, A. & Ferkous, C. (2019). Physical structure extraction of Algerian baccalaureate transcripts . International Journal of Informatics and Applied Mathematics , 2 (1) , 48-72 . Retrieved from https://dergipark.org.tr/en/pub/ijiam/issue/48898/594016
MLA Kefali, A. , Obeizi, A. , Ferkous, C. "Physical structure extraction of Algerian baccalaureate transcripts" . International Journal of Informatics and Applied Mathematics 2 (2019 ): 48-72 <https://dergipark.org.tr/en/pub/ijiam/issue/48898/594016>
Chicago Kefali, A. , Obeizi, A. , Ferkous, C. "Physical structure extraction of Algerian baccalaureate transcripts". International Journal of Informatics and Applied Mathematics 2 (2019 ): 48-72
RIS TY - JOUR T1 - Physical structure extraction of Algerian baccalaureate transcripts AU - Abderrahmane Kefali , Ahlem Obeizi , Chokri Ferkous Y1 - 2019 PY - 2019 N1 - DO - T2 - International Journal of Informatics and Applied Mathematics JF - Journal JO - JOR SP - 48 EP - 72 VL - 2 IS - 1 SN - -2667-6990 M3 - UR - Y2 - 2019 ER -
EndNote %0 International Journal of Informatics and Applied Mathematics Physical structure extraction of Algerian baccalaureate transcripts %A Abderrahmane Kefali , Ahlem Obeizi , Chokri Ferkous %T Physical structure extraction of Algerian baccalaureate transcripts %D 2019 %J International Journal of Informatics and Applied Mathematics %P -2667-6990 %V 2 %N 1 %R %U
ISNAD Kefali, Abderrahmane , Obeizi, Ahlem , Ferkous, Chokri . "Physical structure extraction of Algerian baccalaureate transcripts". International Journal of Informatics and Applied Mathematics 2 / 1 (September 2019): 48-72 .
AMA Kefali A. , Obeizi A. , Ferkous C. Physical structure extraction of Algerian baccalaureate transcripts. IJIAM. 2019; 2(1): 48-72.
Vancouver Kefali A. , Obeizi A. , Ferkous C. Physical structure extraction of Algerian baccalaureate transcripts. International Journal of Informatics and Applied Mathematics. 2019; 2(1): 48-72.
IEEE A. Kefali , A. Obeizi and C. Ferkous , "Physical structure extraction of Algerian baccalaureate transcripts", International Journal of Informatics and Applied Mathematics, vol. 2, no. 1, pp. 48-72, Sep. 2019

International Journal of Informatics and Applied Mathematics