TR
EN
From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms
Abstract
Processing visual data and converting it into text plays a crucial role in fields like information retrieval and data analysis in the digital world. At this juncture, the "image-to-text" transformation, which bridges the gap between visual and textual data, has garnered significant interest from researchers and industry experts. This article presents a study on generating text from images. The study aims to measure the contribution of adding an attention mechanism to the encoder-decoder-based Inception v3 deep learning architecture for image-to-text generation. In the model, the Inception v3 model is trained on the Flickr8k dataset to extract image features. The encoder-decoder structure with an attention mechanism is employed for next-word prediction, and the model is trained on the train images of the Flickr8k dataset for performance evaluation. Experimental results demonstrate the model's satisfactory ability to accurately perceive objects in images.
Keywords
References
- [1] M. Bahani, A. E. Ouaazizi, and K. Maalmi, "The effectiveness of T5, GPT-2, and BERT on text-to-image generation task," Pattern Recognition Letters, Aug. 2023, doi: 10.1016/j.patrec.2023.08.001.
- [2] Y. Tian, A. Ding, D. Wang, X. Luo, B. Wan, and Y. Wang, "Bi-Attention enhanced representation learning for image-text matching," Pattern Recognition, vol. 140, p. 109548, Aug. 2023, doi: 10.1016/j.patcog.2023.109548.
- [3] H. Polat, M. U. Aluçlu, and M. S. Özerdem, "Evaluation of potential auras in generalized epilepsy from EEG signals using deep convolutional neural networks and time-frequency representation," Biomedical Engineering / Biomedizinische Technik, vol. 65, no. 4, pp. 379-391, 2020, doi: 10.1515/bmt-2019-0098.
- [4] H. Elfaik and E. H. Nfaoui, "Leveraging feature-level fusion representations and attentional bidirectional RNN-CNN deep models for Arabic affect analysis on Twitter," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 1, pp. 462–482, Jan. 2023, doi: 10.1016/j.jksuci.2022.12.015.
- [5] C. S. Kanimozhiselvi, K. V, K. S. P, and K. S, "Image Captioning Using Deep Learning," in 2022 International Conference on Computer Communication and Informatics (ICCCI), Jan. 2022, pp. 1-7, doi: 10.1109/ICCCI54379.2022.9740788.
- [6] C. Bai, A. Zheng, Y. Huang, X. Pan, and N. Chen, "Boosting convolutional image captioning with semantic content and visual relationship," Displays, vol. 70, p. 102069, Dec. 2021, doi: 10.1016/j.displa.2021.102069.
- [7] V. Agrawal, S. Dhekane, N. Tuniya, and V. Vyas, "Image Caption Generator Using Attention Mechanism," in 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2021, pp. 1-6, doi: 10.1109/ICCCNT51525.2021.9579967.
- [8] M. Kılıçkaya, E. Erdem, A. Erdem, N. İ. Cinbiş, and R. Çakıcı, "Data-driven image captioning with meta-class based retrieval," in 2014 22nd Signal Processing and Communications Applications Conference (SIU), Apr. 2014, pp. 1922-1925, doi: 10.1109/SIU.2014.6830631.
Details
Primary Language
English
Subjects
Natural Language Processing
Journal Section
Research Article
Early Pub Date
December 31, 2023
Publication Date
December 31, 2023
Submission Date
August 10, 2023
Acceptance Date
November 11, 2023
Published in Issue
Year 2023 Volume: 14 Number: 4
APA
Karaca, Z., & Daş, B. (2023). From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 14(4), 603-610. https://doi.org/10.24012/dumf.1340656
AMA
1.Karaca Z, Daş B. From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms. DUJE. 2023;14(4):603-610. doi:10.24012/dumf.1340656
Chicago
Karaca, Zeynep, and Bihter Daş. 2023. “From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation Using Inception V3 and Attention Mechanisms”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 14 (4): 603-10. https://doi.org/10.24012/dumf.1340656.
EndNote
Karaca Z, Daş B (December 1, 2023) From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 14 4 603–610.
IEEE
[1]Z. Karaca and B. Daş, “From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms”, DUJE, vol. 14, no. 4, pp. 603–610, Dec. 2023, doi: 10.24012/dumf.1340656.
ISNAD
Karaca, Zeynep - Daş, Bihter. “From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation Using Inception V3 and Attention Mechanisms”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 14/4 (December 1, 2023): 603-610. https://doi.org/10.24012/dumf.1340656.
JAMA
1.Karaca Z, Daş B. From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms. DUJE. 2023;14:603–610.
MLA
Karaca, Zeynep, and Bihter Daş. “From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation Using Inception V3 and Attention Mechanisms”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, vol. 14, no. 4, Dec. 2023, pp. 603-10, doi:10.24012/dumf.1340656.
Vancouver
1.Zeynep Karaca, Bihter Daş. From Pixels to Paragraphs: Exploring Enhanced Image-to-Text Generation using Inception v3 and Attention Mechanisms. DUJE. 2023 Dec. 1;14(4):603-10. doi:10.24012/dumf.1340656