Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning

Murat Kazanç; Tolga Ensari; Mustafa Dağtekin

doi:10.26650/acin.1258719

Research Article

Resim Formatındaki Dokümanların Bilgisayarlı Görü, Doğal Dil İşleme ve Makine Öğrenmesi Kullanılarak Latex Formatına Dönüştürülmesi

Year 2023, Volume: 7 Issue: 2, 253 - 266, 29.12.2023

Murat Kazanç , Tolga Ensari , Mustafa Dağtekin

https://doi.org/10.26650/acin.1258719

Abstract

Birkaç on yıl önce insanlar bilgi edinmek için kitap ve dergi gibi basılı kaynakları kullanmaktaydılar. Teknolojinin gelişmesi ile basılı kaynakların yerini dĳital dokümanlar almıştır. Bu dokümanlar görüntü biçiminde veya farklı metin formatları şeklinde olabilmektedir. Dĳital dokümanları hazırlamak için birçok farklı uygulama bulunmaktadır. Bunlardan bir tanesi LaTex’ tir. LaTex doküman hazırlama sistemi ve dizgi yazılımıdır. Yüksek kalitede dokümanlar hazırlamak için özellikle bilimsel yayınlar ve matematik alanında kullanılmaktadır. LaTex ile doküman hazırlanırken içerik bir işaretleme dili kullanılarak hazırlanılmaktadır. Bu durum bazı kullanıcılar için bir zorluk oluşturmaktadır. Ancak LaTex sistemini kullanmanın ana avantajlarından biri doküman içeriğini biçimlendirmeden ayırmasıdır. Bir kere içerik oluşturulduktan sonra biçimlendirme kolaylıkla değiştirilebilmektedir. Görüntü formatındaki bir dokümandan LaTex kodunun üretilmesi bilgisayarlı görü ve doğal dil işleme alanlarının birlikte kullanılmasını gerektirmektedir. Bu çalışmada öncelikle görüntü üzerinde metin, tablo ve şekillerin bulunduğu yerlerin sınırları (bloklar) tespit edilmiştir. Sonrasında bulunan bu blokların doğal dil işleme metotları kullanılarak metin sınıflama yapılmıştır. Bir sonraki aşamada anlam akışının bozulmaması için okuma sırası tespit edilmiştir. Son aşamada elde edilen bilgiler kullanılarak LaTex kodu üretilmiştir.

Keywords

Bilgisayarlı görü, metin sınıflama, okuma sırası, makine öğrenmesi

References

Akpan, U. I., & Starkey, A. (2021). Review of classification algorithms with changing inter-class distances. Machine Learning with Applications, 4, 100031. https://doi.org/10.1016/j.mlwa.2021.100031 google scholar
Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.Org/10.1016/j.asoc.2016.06.003 google scholar
Clark, C., & Divvala, S. (2016). PDFFigures 2.0. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 143-152. https://doi.org/10.1145/2910896.2910904 google scholar
CTAN Team. (n.d.). What are TEX and its friends? Retrieved May 8, 2022, from https://www.ctan.org/tex google scholar
Deivalakshmi, S., Palanisamy, P., & Vishwanathan, G. (2013). A novel method for text and non-text segmentation in document images. 2013 International Conference on Communication and Signal Processing, 255-259. https://doi.org/10.1109/iccsp.2013.6577054 google scholar
Deng, Y., Rosenberg, D., & Mann, G. (2019). Challenges in End-to-End Neural Scientific Table Recognition. 2019 International Conference on Document Analysis and Recognition (ICDAR), 894-901. https://doi.org/10.1109/ICDAR.2019.00148 google scholar
Ding, H., Chen, K., & Huo, Q. (2019). Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. Pattern Recognition, 96, 106957. https://doi.org/10.1016/j.patcog.2019.07.002 google scholar
Doğan, M. İ., Orman, A., Örkcü, M., & Örkcü, H. H. (2019). Çok gruplu sınıflandırma problemlerine regresyon analizi ve matematiksel programlama tabanlı yeni bir yaklaşım. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi. https://doi.org/10.17341/gazimmfd.571643 google scholar
Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., & Spampinato, C. (2019). A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11752 LNCS, 292-302. https://doi.org/10.1007/978-3-030-30645-8_27 google scholar
Klatsky, S. (2003). WYSIWYG. Aesthetic Surgery Journal, 23(4), 274-275. https://doi.org/10.1016/S1090-820X(03)00150-X google scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2019). TableBank: A Benchmark Dataset for Table Detection and Recognition. http://arxiv.org/abs/1903.01949 google scholar
Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. 2011 IEEE Control and System Graduate Research Colloquium, 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826 google scholar
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. https://doi.org/10.1109/TSMC.1979.4310076 google scholar
Pang, N., Yang, C., Zhu, X., Li, J., & Yin, X.-C. (2021). Global Context-Based Network with Transformer for Image2latex. 2020 25th International Conference on Pattern Recognition (ICPR), 4650-4656. https://doi.org/10.1109/ICPR48806.2021.9412072 google scholar
PRImA. (n.d.). Retrieved May 22, 2022, from https://www.primaresearch.org/ google scholar
Recommendation ITU-R BT.601-7. (2011, March). https://www.itu.int/dmspubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf google scholar
Safnuk, B., & Hu, G. (2018). Reconstructing LaTeX Source Files from Generated PDFs - a Neural Network Approach. 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), 890-895. https://doi.org/10.1109/INDIN.2018.8472050 google scholar
Shen, Z., Zhang, R., Dell, M., Lee, B. C. G., Carlson, J., & Li, W. (2021). LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. http://arxiv.org/abs/2103.15348 google scholar
Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. https://doi.org/10.1016/j.knosys.2011.04.014 google scholar
Wang, Z., & Liu, J. C. (2021). Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition, 24(1-2), 63-75. https://doi.org/10.1007/s10032-020-00360-2 google scholar
Wang, Z., Xu, Y., Cui, L., Shang, J., & Wei, F. (2021). LayoutReader: Pre-training of Text and Layout for Reading Order Detection. http://arxiv.org/abs/2108.11591 google scholar
Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., & Huang, T. S. (2015). DeepFont: Identify Your Font from An Image. Proceedings of the 23rd ACM International Conference on Multimedia, 451-459. https://doi.org/10.1145/2733373 google scholar
Xu, C., Shi, C., Bi, H., Liu, C., Yuan, Y., Guo, H., & Chen, Y. (2021). A Page Object Detection Method Based on Mask R-CNN. IEEE Access, 9, 143448-143457. https://doi.org/10.1109/ACCESS.2021.3121152 google scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Under-standing. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 20, 1192-1200. https://doi.org/10.1145/3394486.3403172 google scholar Zhong, X., Tang, J., & Yepes, A. J. (2019). PubLayNet: largest dataset ever for document layout analysis. http://arxiv.org/abs/1908.07836 google scholar

Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning

Year 2023, Volume: 7 Issue: 2, 253 - 266, 29.12.2023

Murat Kazanç , Tolga Ensari , Mustafa Dağtekin

https://doi.org/10.26650/acin.1258719

Abstract

A few decades ago, people used printed resources such as books and magazines to learn. With the development of technology, digital documents have replaced printed resources. These documents can occur in the form of images or various text formats. Many different applications exist for preparing digital documents, one of these being LaTeX. LaTeX is a document preparation system and typesetting software that is used especially in the field of scientific publications and mathematics for preparing high quality documents. When preparing a document using LaTeX, the content is made ready using a markup language, which creates difficulties for some users. However, one of the main advantages of using the LaTeX system is that it distinguishes the document’s content from its formatting. Once the content is created, the formatting can be easily replaced. Generating LaTeX code from an image-formatted document requires both the use of computer vision and NLP. This study discovers the boundaries (blocks) of the places where text, tables, and figures are located on an image before making a text classification using the natural language processing methods of these blocks. The next stage of the study determines the reading order to enable meaningful flow. The final stage of the study produces a LaTeX code using the obtained information.

Keywords

Computer vision, text classification, reading order, machine learning

References

Akpan, U. I., & Starkey, A. (2021). Review of classification algorithms with changing inter-class distances. Machine Learning with Applications, 4, 100031. https://doi.org/10.1016/j.mlwa.2021.100031 google scholar
Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.Org/10.1016/j.asoc.2016.06.003 google scholar
Clark, C., & Divvala, S. (2016). PDFFigures 2.0. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 143-152. https://doi.org/10.1145/2910896.2910904 google scholar
CTAN Team. (n.d.). What are TEX and its friends? Retrieved May 8, 2022, from https://www.ctan.org/tex google scholar
Deivalakshmi, S., Palanisamy, P., & Vishwanathan, G. (2013). A novel method for text and non-text segmentation in document images. 2013 International Conference on Communication and Signal Processing, 255-259. https://doi.org/10.1109/iccsp.2013.6577054 google scholar
Deng, Y., Rosenberg, D., & Mann, G. (2019). Challenges in End-to-End Neural Scientific Table Recognition. 2019 International Conference on Document Analysis and Recognition (ICDAR), 894-901. https://doi.org/10.1109/ICDAR.2019.00148 google scholar
Ding, H., Chen, K., & Huo, Q. (2019). Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. Pattern Recognition, 96, 106957. https://doi.org/10.1016/j.patcog.2019.07.002 google scholar
Doğan, M. İ., Orman, A., Örkcü, M., & Örkcü, H. H. (2019). Çok gruplu sınıflandırma problemlerine regresyon analizi ve matematiksel programlama tabanlı yeni bir yaklaşım. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi. https://doi.org/10.17341/gazimmfd.571643 google scholar
Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., & Spampinato, C. (2019). A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11752 LNCS, 292-302. https://doi.org/10.1007/978-3-030-30645-8_27 google scholar
Klatsky, S. (2003). WYSIWYG. Aesthetic Surgery Journal, 23(4), 274-275. https://doi.org/10.1016/S1090-820X(03)00150-X google scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2019). TableBank: A Benchmark Dataset for Table Detection and Recognition. http://arxiv.org/abs/1903.01949 google scholar
Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. 2011 IEEE Control and System Graduate Research Colloquium, 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826 google scholar
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. https://doi.org/10.1109/TSMC.1979.4310076 google scholar
Pang, N., Yang, C., Zhu, X., Li, J., & Yin, X.-C. (2021). Global Context-Based Network with Transformer for Image2latex. 2020 25th International Conference on Pattern Recognition (ICPR), 4650-4656. https://doi.org/10.1109/ICPR48806.2021.9412072 google scholar
PRImA. (n.d.). Retrieved May 22, 2022, from https://www.primaresearch.org/ google scholar
Recommendation ITU-R BT.601-7. (2011, March). https://www.itu.int/dmspubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf google scholar
Safnuk, B., & Hu, G. (2018). Reconstructing LaTeX Source Files from Generated PDFs - a Neural Network Approach. 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), 890-895. https://doi.org/10.1109/INDIN.2018.8472050 google scholar
Shen, Z., Zhang, R., Dell, M., Lee, B. C. G., Carlson, J., & Li, W. (2021). LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. http://arxiv.org/abs/2103.15348 google scholar
Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. https://doi.org/10.1016/j.knosys.2011.04.014 google scholar
Wang, Z., & Liu, J. C. (2021). Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition, 24(1-2), 63-75. https://doi.org/10.1007/s10032-020-00360-2 google scholar
Wang, Z., Xu, Y., Cui, L., Shang, J., & Wei, F. (2021). LayoutReader: Pre-training of Text and Layout for Reading Order Detection. http://arxiv.org/abs/2108.11591 google scholar
Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., & Huang, T. S. (2015). DeepFont: Identify Your Font from An Image. Proceedings of the 23rd ACM International Conference on Multimedia, 451-459. https://doi.org/10.1145/2733373 google scholar
Xu, C., Shi, C., Bi, H., Liu, C., Yuan, Y., Guo, H., & Chen, Y. (2021). A Page Object Detection Method Based on Mask R-CNN. IEEE Access, 9, 143448-143457. https://doi.org/10.1109/ACCESS.2021.3121152 google scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Under-standing. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 20, 1192-1200. https://doi.org/10.1145/3394486.3403172 google scholar Zhong, X., Tang, J., & Yepes, A. J. (2019). PubLayNet: largest dataset ever for document layout analysis. http://arxiv.org/abs/1908.07836 google scholar

There are 24 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Research Article
Authors	Murat Kazanç 0000-0002-8405-0181 Tolga Ensari 0000-0003-0896-3058 Mustafa Dağtekin 0000-0002-0797-9392
Publication Date	December 29, 2023
Submission Date	March 1, 2023
Published in Issue	Year 2023 Volume: 7 Issue: 2

Cite

APA	Kazanç, M., Ensari, T., & Dağtekin, M. (2023). Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. Acta Infologica, 7(2), 253-266. https://doi.org/10.26650/acin.1258719
AMA	Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. December 2023;7(2):253-266. doi:10.26650/acin.1258719
Chicago	Kazanç, Murat, Tolga Ensari, and Mustafa Dağtekin. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica 7, no. 2 (December 2023): 253-66. https://doi.org/10.26650/acin.1258719.
EndNote	Kazanç M, Ensari T, Dağtekin M (December 1, 2023) Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. Acta Infologica 7 2 253–266.
IEEE	M. Kazanç, T. Ensari, and M. Dağtekin, “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”, ACIN, vol. 7, no. 2, pp. 253–266, 2023, doi: 10.26650/acin.1258719.
ISNAD	Kazanç, Murat et al. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica 7/2 (December 2023), 253-266. https://doi.org/10.26650/acin.1258719.
JAMA	Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. 2023;7:253–266.
MLA	Kazanç, Murat et al. “Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning”. Acta Infologica, vol. 7, no. 2, 2023, pp. 253-66, doi:10.26650/acin.1258719.
Vancouver	Kazanç M, Ensari T, Dağtekin M. Converting Image Files to LaTeX Format Using Computer Vision, Natural Language Processing, and Machine Learning. ACIN. 2023;7(2):253-66.

Download Cover Image

Article Files

Full Text