Araştırma Makalesi

Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation

Cilt: 5 Sayı: 1 27 Haziran 2025
PDF İndir
TR EN

Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation

Öz

Recent advancements in artificial intelligence (AI) have brought Generative AI models dealing with Text-to-Image and Image-to-Text transformation to the forefront. While these models offer significant potential, their effectiveness hinges on the proper utilization of prompts – user-provided instructions guiding the generation process. It is crucial to ascertain the success of images generated through prompt input. In this context, following the text-to-image generation process, the creation of descriptive text for the produced image is also of significant importance. Thus, by comparing the input prompt and the output text, it becomes possible to determine the degree of success of the generatively produced image. This study discusses the consistency between natural language used by humans and prompt language used by Generative AI models. We propose a novel approach: a natural language text-to-image model generates an image, which is then described in text by an image-to-text model, and this text is subsequently used as a prompt. A comparison module then identifies the prompt and corresponding image pair from a pre-built database that has the highest similarity to a human-generated description. This approach aims to maximize the benefit from existing AI models and promote explainability – a crucial principle in AI. This solution addresses the problem of improving the AI model's ability to generate human-like descriptions and enhancing the process of evaluating the accuracy of these descriptions.

Anahtar Kelimeler

Kaynakça

  1. R. Rombach, A. Blattmann, D. Lorenz, P. Esser and B. Ommer, “High-resolution image synthesis with latent diffusion models”, In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10684-10695, 2022.
  2. J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li ... and A. Ramesh, “Improving image generation with better captions”, Computer Science, 2(3), 8, https://cdn. openai. com/papers/dall-e-3.pdf, 2023.
  3. J. Li, D. Li, C. Xiong and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation”, In International conference on machine learning, PMLR, pp. 12888-12900, 2022.
  4. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele and H. Lee, “Generative adversarial text to image synthesis”, In Proceedings of the 33rd International Conference on Machine Learning (ICML), PMLR, pp. 1060-1069, 2016.
  5. R. Kiros, R. Salakhutdinov and R. S. Zemel, “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”, arXiv preprint arXiv:1411.2539, 2014.
  6. A. Lin, L. Monteiro Paes, S. H. Tanneru, S. Srinivas and H. Lakkaraju, “Word-Level Explanations for Analyzing Bias in Text-to-Image Models”, arXiv preprint arXiv:2302.06578, 2023.
  7. L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, and M. H. Yang, “Diffusion models: A comprehensive survey of methods and applications”. ACM Computing Surveys, 56(4), 1-39, 2023.
  8. J. Ho, A. Jain and P. Abbeel, “Denoising diffusion probabilistic models”, Advances in neural information processing systems, 33, 6840-6851, 2020.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Multimodal Analiz ve Sentez, Derin Öğrenme, Doğal Dil İşleme

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

27 Haziran 2025

Gönderilme Tarihi

31 Mayıs 2025

Kabul Tarihi

23 Haziran 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 5 Sayı: 1

Kaynak Göster

APA
Doğan, O., Altıntaş, A., Yücetürk, B., Aydın, D., Soygazi, F., & Kılıçaslan, Y. (2025). Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science, 5(1), 53-62. https://izlik.org/JA86FE93UF
AMA
1.Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science. 2025;5(1):53-62. https://izlik.org/JA86FE93UF
Chicago
Doğan, Onur, Almila Altıntaş, Buse Yücetürk, Doğa Aydın, Fatih Soygazi, ve Yılmaz Kılıçaslan. 2025. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science 5 (1): 53-62. https://izlik.org/JA86FE93UF.
EndNote
Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y (01 Haziran 2025) Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science 5 1 53–62.
IEEE
[1]O. Doğan, A. Altıntaş, B. Yücetürk, D. Aydın, F. Soygazi, ve Y. Kılıçaslan, “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”, Journal of Artificial Intelligence and Data Science, c. 5, sy 1, ss. 53–62, Haz. 2025, [çevrimiçi]. Erişim adresi: https://izlik.org/JA86FE93UF
ISNAD
Doğan, Onur - Altıntaş, Almila - Yücetürk, Buse - Aydın, Doğa - Soygazi, Fatih - Kılıçaslan, Yılmaz. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science 5/1 (01 Haziran 2025): 53-62. https://izlik.org/JA86FE93UF.
JAMA
1.Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science. 2025;5:53–62.
MLA
Doğan, Onur, vd. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science, c. 5, sy 1, Haziran 2025, ss. 53-62, https://izlik.org/JA86FE93UF.
Vancouver
1.Onur Doğan, Almila Altıntaş, Buse Yücetürk, Doğa Aydın, Fatih Soygazi, Yılmaz Kılıçaslan. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science [Internet]. 01 Haziran 2025;5(1):53-62. Erişim adresi: https://izlik.org/JA86FE93UF