Research Article
BibTex RIS Cite

Resim Formatındaki Dokümanların Bilgisayarlı Görü, Doğal Dil İşleme ve Makine Öğrenmesi Kullanılarak Latex Formatına Dönüştürülmesi

Year 2023, , 253 - 266, 29.12.2023
https://doi.org/10.26650/acin.1258719

Abstract

Birkaç on yıl önce insanlar bilgi edinmek için kitap ve dergi gibi basılı kaynakları kullanmaktaydılar. Teknolojinin gelişmesi ile basılı kaynakların yerini dijital dokümanlar almıştır. Bu dokümanlar görüntü biçiminde veya farklı metin formatları şeklinde olabilmektedir. Dijital dokümanları hazırlamak için birçok farklı uygulama bulunmaktadır. Bunlardan bir tanesi LaTex’ tir. LaTex doküman hazırlama sistemi ve dizgi yazılımıdır. Yüksek kalitede dokümanlar hazırlamak için özellikle bilimsel yayınlar ve matematik alanında kullanılmaktadır. LaTex ile doküman hazırlanırken içerik bir işaretleme dili kullanılarak hazırlanılmaktadır. Bu durum bazı kullanıcılar için bir zorluk oluşturmaktadır. Ancak LaTex sistemini kullanmanın ana avantajlarından biri doküman içeriğini biçimlendirmeden ayırmasıdır. Bir kere içerik oluşturulduktan sonra biçimlendirme kolaylıkla değiştirilebilmektedir. Görüntü formatındaki bir dokümandan LaTex kodunun üretilmesi bilgisayarlı görü ve doğal dil işleme alanlarının birlikte kullanılmasını gerektirmektedir. Bu çalışmada öncelikle görüntü üzerinde metin, tablo ve şekillerin bulunduğu yerlerin sınırları (bloklar) tespit edilmiştir. Sonrasında bulunan bu blokların doğal dil işleme metotları kullanılarak metin sınıflama yapılmıştır. Bir sonraki aşamada anlam akışının bozulmaması için okuma sırası tespit edilmiştir. Son aşamada elde edilen bilgiler kullanılarak LaTex kodu üretilmiştir.

References

  • Akpan, U. I., & Starkey, A. (2021). Review of classification algorithms with changing inter-class distances. Machine Learning with Applications, 4, 100031. https://doi.org/10.1016/j.mlwa.2021.100031 google scholar
  • Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.Org/10.1016/j.asoc.2016.06.003 google scholar
  • Clark, C., & Divvala, S. (2016). PDFFigures 2.0. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 143-152. https://doi.org/10.1145/2910896.2910904 google scholar
  • CTAN Team. (n.d.). What are TEX and its friends? Retrieved May 8, 2022, from https://www.ctan.org/tex google scholar
  • Deivalakshmi, S., Palanisamy, P., & Vishwanathan, G. (2013). A novel method for text and non-text segmentation in document images. 2013 International Conference on Communication and Signal Processing, 255-259. https://doi.org/10.1109/iccsp.2013.6577054 google scholar
  • Deng, Y., Rosenberg, D., & Mann, G. (2019). Challenges in End-to-End Neural Scientific Table Recognition. 2019 International Conference on Document Analysis and Recognition (ICDAR), 894-901. https://doi.org/10.1109/ICDAR.2019.00148 google scholar
  • Ding, H., Chen, K., & Huo, Q. (2019). Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. Pattern Recognition, 96, 106957. https://doi.org/10.1016/j.patcog.2019.07.002 google scholar
  • Doğan, M. İ., Orman, A., Örkcü, M., & Örkcü, H. H. (2019). Çok gruplu sınıflandırma problemlerine regresyon analizi ve matematiksel programlama tabanlı yeni bir yaklaşım. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi. https://doi.org/10.17341/gazimmfd.571643 google scholar
  • Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., & Spampinato, C. (2019). A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11752 LNCS, 292-302. https://doi.org/10.1007/978-3-030-30645-8_27 google scholar
  • Klatsky, S. (2003). WYSIWYG. Aesthetic Surgery Journal, 23(4), 274-275. https://doi.org/10.1016/S1090-820X(03)00150-X google scholar
  • Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2019). TableBank: A Benchmark Dataset for Table Detection and Recognition. http://arxiv.org/abs/1903.01949 google scholar
  • Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. 2011 IEEE Control and System Graduate Research Colloquium, 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826 google scholar
  • Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. https://doi.org/10.1109/TSMC.1979.4310076 google scholar
  • Pang, N., Yang, C., Zhu, X., Li, J., & Yin, X.-C. (2021). Global Context-Based Network with Transformer for Image2latex. 2020 25th International Conference on Pattern Recognition (ICPR), 4650-4656. https://doi.org/10.1109/ICPR48806.2021.9412072 google scholar
  • PRImA. (n.d.). Retrieved May 22, 2022, from https://www.primaresearch.org/ google scholar
  • Recommendation ITU-R BT.601-7. (2011, March). https://www.itu.int/dmspubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf google scholar
  • Safnuk, B., & Hu, G. (2018). Reconstructing LaTeX Source Files from Generated PDFs - a Neural Network Approach. 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), 890-895. https://doi.org/10.1109/INDIN.2018.8472050 google scholar
  • Shen, Z., Zhang, R., Dell, M., Lee, B. C. G., Carlson, J., & Li, W. (2021). LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. http://arxiv.org/abs/2103.15348 google scholar
  • Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. https://doi.org/10.1016/j.knosys.2011.04.014 google scholar
  • Wang, Z., & Liu, J. C. (2021). Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition, 24(1-2), 63-75. https://doi.org/10.1007/s10032-020-00360-2 google scholar
  • Wang, Z., Xu, Y., Cui, L., Shang, J., & Wei, F. (2021). LayoutReader: Pre-training of Text and Layout for Reading Order Detection. http://arxiv.org/abs/2108.11591 google scholar
  • Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., & Huang, T. S. (2015). DeepFont: Identify Your Font from An Image. Proceedings of the 23rd ACM International Conference on Multimedia, 451-459. https://doi.org/10.1145/2733373 google scholar
  • Xu, C., Shi, C., Bi, H., Liu, C., Yuan, Y., Guo, H., & Chen, Y. (2021). A Page Object Detection Method Based on Mask R-CNN. IEEE Access, 9, 143448-143457. https://doi.org/10.1109/ACCESS.2021.3121152 google scholar
  • Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Under-standing. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 20, 1192-1200. https://doi.org/10.1145/3394486.3403172 google scholar Zhong, X., Tang, J., & Yepes, A. J. (2019). PubLayNet: largest dataset ever for document layout analysis. http://arxiv.org/abs/1908.07836 google scholar

Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning

Year 2023, , 253 - 266, 29.12.2023
https://doi.org/10.26650/acin.1258719

Abstract

A few decades ago, people used printed resources such as books and magazines to learn. With the development of technology, digital documents have replaced printed resources. These documents can occur in the form of images or various text formats. Many different applications exist for preparing digital documents, one of these being LaTeX. LaTeX is a document preparation system and typesetting software that is used especially in the field of scientific publications and mathematics for preparing high quality documents. When preparing a document using LaTeX, the content is made ready using a markup language, which creates difficulties for some users. However, one of the main advantages of using the LaTeX system is that it distinguishes the document’s content from its formatting. Once the content is created, the formatting can be easily replaced. Generating LaTeX code from an image-formatted document requires both the use of computer vision and NLP. This study discovers the boundaries (blocks) of the places where text, tables, and figures are located on an image before making a text classification using the natural language processing methods of these blocks. The next stage of the study determines the reading order to enable meaningful flow. The final stage of the study produces a LaTeX code using the obtained information.

References

  • Akpan, U. I., & Starkey, A. (2021). Review of classification algorithms with changing inter-class distances. Machine Learning with Applications, 4, 100031. https://doi.org/10.1016/j.mlwa.2021.100031 google scholar
  • Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.Org/10.1016/j.asoc.2016.06.003 google scholar
  • Clark, C., & Divvala, S. (2016). PDFFigures 2.0. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 143-152. https://doi.org/10.1145/2910896.2910904 google scholar
  • CTAN Team. (n.d.). What are TEX and its friends? Retrieved May 8, 2022, from https://www.ctan.org/tex google scholar
  • Deivalakshmi, S., Palanisamy, P., & Vishwanathan, G. (2013). A novel method for text and non-text segmentation in document images. 2013 International Conference on Communication and Signal Processing, 255-259. https://doi.org/10.1109/iccsp.2013.6577054 google scholar
  • Deng, Y., Rosenberg, D., & Mann, G. (2019). Challenges in End-to-End Neural Scientific Table Recognition. 2019 International Conference on Document Analysis and Recognition (ICDAR), 894-901. https://doi.org/10.1109/ICDAR.2019.00148 google scholar
  • Ding, H., Chen, K., & Huo, Q. (2019). Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. Pattern Recognition, 96, 106957. https://doi.org/10.1016/j.patcog.2019.07.002 google scholar
  • Doğan, M. İ., Orman, A., Örkcü, M., & Örkcü, H. H. (2019). Çok gruplu sınıflandırma problemlerine regresyon analizi ve matematiksel programlama tabanlı yeni bir yaklaşım. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi. https://doi.org/10.17341/gazimmfd.571643 google scholar
  • Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., & Spampinato, C. (2019). A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11752 LNCS, 292-302. https://doi.org/10.1007/978-3-030-30645-8_27 google scholar
  • Klatsky, S. (2003). WYSIWYG. Aesthetic Surgery Journal, 23(4), 274-275. https://doi.org/10.1016/S1090-820X(03)00150-X google scholar
  • Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2019). TableBank: A Benchmark Dataset for Table Detection and Recognition. http://arxiv.org/abs/1903.01949 google scholar
  • Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. 2011 IEEE Control and System Graduate Research Colloquium, 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826 google scholar
  • Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. https://doi.org/10.1109/TSMC.1979.4310076 google scholar
  • Pang, N., Yang, C., Zhu, X., Li, J., & Yin, X.-C. (2021). Global Context-Based Network with Transformer for Image2latex. 2020 25th International Conference on Pattern Recognition (ICPR), 4650-4656. https://doi.org/10.1109/ICPR48806.2021.9412072 google scholar
  • PRImA. (n.d.). Retrieved May 22, 2022, from https://www.primaresearch.org/ google scholar
  • Recommendation ITU-R BT.601-7. (2011, March). https://www.itu.int/dmspubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf google scholar
  • Safnuk, B., & Hu, G. (2018). Reconstructing LaTeX Source Files from Generated PDFs - a Neural Network Approach. 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), 890-895. https://doi.org/10.1109/INDIN.2018.8472050 google scholar
  • Shen, Z., Zhang, R., Dell, M., Lee, B. C. G., Carlson, J., & Li, W. (2021). LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. http://arxiv.org/abs/2103.15348 google scholar
  • Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. https://doi.org/10.1016/j.knosys.2011.04.014 google scholar
  • Wang, Z., & Liu, J. C. (2021). Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition, 24(1-2), 63-75. https://doi.org/10.1007/s10032-020-00360-2 google scholar
  • Wang, Z., Xu, Y., Cui, L., Shang, J., & Wei, F. (2021). LayoutReader: Pre-training of Text and Layout for Reading Order Detection. http://arxiv.org/abs/2108.11591 google scholar
  • Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., & Huang, T. S. (2015). DeepFont: Identify Your Font from An Image. Proceedings of the 23rd ACM International Conference on Multimedia, 451-459. https://doi.org/10.1145/2733373 google scholar
  • Xu, C., Shi, C., Bi, H., Liu, C., Yuan, Y., Guo, H., & Chen, Y. (2021). A Page Object Detection Method Based on Mask R-CNN. IEEE Access, 9, 143448-143457. https://doi.org/10.1109/ACCESS.2021.3121152 google scholar
  • Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Under-standing. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 20, 1192-1200. https://doi.org/10.1145/3394486.3403172 google scholar Zhong, X., Tang, J., & Yepes, A. J. (2019). PubLayNet: largest dataset ever for document layout analysis. http://arxiv.org/abs/1908.07836 google scholar
There are 24 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Research Article
Authors

Murat Kazanç 0000-0002-8405-0181

Tolga Ensari 0000-0003-0896-3058

Mustafa Dağtekin 0000-0002-0797-9392

Publication Date December 29, 2023
Submission Date March 1, 2023
Published in Issue Year 2023

Cite

APA Kazanç, M., Ensari, T., & Dağtekin, M. (2023). Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. Acta Infologica, 7(2), 253-266. https://doi.org/10.26650/acin.1258719
AMA Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. December 2023;7(2):253-266. doi:10.26650/acin.1258719
Chicago Kazanç, Murat, Tolga Ensari, and Mustafa Dağtekin. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica 7, no. 2 (December 2023): 253-66. https://doi.org/10.26650/acin.1258719.
EndNote Kazanç M, Ensari T, Dağtekin M (December 1, 2023) Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. Acta Infologica 7 2 253–266.
IEEE M. Kazanç, T. Ensari, and M. Dağtekin, “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”, ACIN, vol. 7, no. 2, pp. 253–266, 2023, doi: 10.26650/acin.1258719.
ISNAD Kazanç, Murat et al. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica 7/2 (December 2023), 253-266. https://doi.org/10.26650/acin.1258719.
JAMA Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. 2023;7:253–266.
MLA Kazanç, Murat et al. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica, vol. 7, no. 2, 2023, pp. 253-66, doi:10.26650/acin.1258719.
Vancouver Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. 2023;7(2):253-66.