Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation

Onur Doğan; Almila Altıntaş; Buse Yücetürk; Doğa Aydın; Fatih Soygazi; Yılmaz Kılıçaslan

TR EN

Metinden-Görüntüye ve Görüntüden-Metine Dönüşüm Yoluyla Üretken Yapay Zekâda Anlamsal Tutarlılığın İncelenmesi

Öz

Yapay zekâ (YZ) alanındaki son gelişmeler, metinden-görüntüye (text-to-image) ve görüntüden-metine (image-to-text) dönüşüm gerçekleştiren Üretken YZ (Generative AI) modellerini ön plana çıkarmıştır. Bu modeller önemli bir potansiyele sahip olmakla birlikte, etkinlikleri büyük ölçüde üretim sürecini yönlendiren kullanıcı talimatları olan girdilerin (prompt) doğru kullanımına bağlıdır. Üretilen görüntülerin bu girdiler aracılığıyla ne ölçüde başarılı bir şekilde oluşturulduğunun belirlenmesi büyük önem taşımaktadır. Bu bağlamda, metinden-görüntüye üretim sürecinin ardından, ortaya çıkan görselin betimleyici bir metinle ifade edilmesi de kritik öneme sahiptir. Böylece, giriş prompt’u ile çıktı olarak üretilen metin karşılaştırılarak üretici modelin başarısı ölçülebilir hale gelir. Bu çalışma, insanların kullandığı doğal dil ile üretici YZ modellerinin kullandığı prompt dili arasındaki tutarlılığı ele almaktadır. Bu doğrultuda özgün bir yaklaşım önerilmektedir: doğal dil girdisiyle çalışan bir metinden-görüntüye modeli bir görsel üretir; ardından bu görsel, görüntüden-metine dönüşüm gerçekleştiren bir model aracılığıyla betimlenir; daha sonra elde edilen bu betimleyici metin yeni bir prompt olarak kullanılır. Son aşamada ise, önceden oluşturulmuş bir veri tabanında yer alan prompt ve görüntü eşleşmeleri ile karşılaştırma yapılır ve insan tarafından üretilmiş betimlemeye en çok benzeyen çift belirlenir. Bu yaklaşım, mevcut YZ modellerinden maksimum düzeyde fayda sağlamayı ve YZ’nin temel ilkelerinden biri olan açıklanabilirliği artırmayı hedeflemektedir. Önerilen çözüm, modellerin insan benzeri betimlemeler üretme yetkinliğini geliştirmeyi ve bu betimlemelerin doğruluğunun değerlendirilmesi sürecini iyileştirmeyi amaçlamaktadır.

Anahtar Kelimeler

Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation

Öz

Recent advancements in artificial intelligence (AI) have brought Generative AI models dealing with Text-to-Image and Image-to-Text transformation to the forefront. While these models offer significant potential, their effectiveness hinges on the proper utilization of prompts – user-provided instructions guiding the generation process. It is crucial to ascertain the success of images generated through prompt input. In this context, following the text-to-image generation process, the creation of descriptive text for the produced image is also of significant importance. Thus, by comparing the input prompt and the output text, it becomes possible to determine the degree of success of the generatively produced image. This study discusses the consistency between natural language used by humans and prompt language used by Generative AI models. We propose a novel approach: a natural language text-to-image model generates an image, which is then described in text by an image-to-text model, and this text is subsequently used as a prompt. A comparison module then identifies the prompt and corresponding image pair from a pre-built database that has the highest similarity to a human-generated description. This approach aims to maximize the benefit from existing AI models and promote explainability – a crucial principle in AI. This solution addresses the problem of improving the AI model's ability to generate human-like descriptions and enhancing the process of evaluating the accuracy of these descriptions.

Anahtar Kelimeler

Kaynakça

R. Rombach, A. Blattmann, D. Lorenz, P. Esser and B. Ommer, “High-resolution image synthesis with latent diffusion models”, In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10684-10695, 2022.
J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li ... and A. Ramesh, “Improving image generation with better captions”, Computer Science, 2(3), 8, https://cdn. openai. com/papers/dall-e-3.pdf, 2023.
J. Li, D. Li, C. Xiong and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation”, In International conference on machine learning, PMLR, pp. 12888-12900, 2022.
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele and H. Lee, “Generative adversarial text to image synthesis”, In Proceedings of the 33rd International Conference on Machine Learning (ICML), PMLR, pp. 1060-1069, 2016.
R. Kiros, R. Salakhutdinov and R. S. Zemel, “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”, arXiv preprint arXiv:1411.2539, 2014.
A. Lin, L. Monteiro Paes, S. H. Tanneru, S. Srinivas and H. Lakkaraju, “Word-Level Explanations for Analyzing Bias in Text-to-Image Models”, arXiv preprint arXiv:2302.06578, 2023.
L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, and M. H. Yang, “Diffusion models: A comprehensive survey of methods and applications”. ACM Computing Surveys, 56(4), 1-39, 2023.
J. Ho, A. Jain and P. Abbeel, “Denoising diffusion probabilistic models”, Advances in neural information processing systems, 33, 6840-6851, 2020.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Multimodal Analiz ve Sentez, Derin Öğrenme, Doğal Dil İşleme

Bölüm

Araştırma Makalesi

Yazarlar

Onur Doğan
0009-0001-5083-0163
Türkiye

Almila Altıntaş
0009-0008-7955-3789
Türkiye

Buse Yücetürk
0009-0003-3078-4352
Türkiye

Doğa Aydın
0009-0000-7782-0830
Türkiye

Fatih Soygazi ^*
0000-0001-8426-2283
Türkiye

Yılmaz Kılıçaslan
0000-0002-5020-6547
Türkiye

Yayımlanma Tarihi

27 Haziran 2025

Gönderilme Tarihi

31 Mayıs 2025

Kabul Tarihi

23 Haziran 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 5 Sayı: 1

IZ

https://izlik.org/JA86FE93UF

Kaynak Göster

RIS / Bibtex

APA

Doğan, O., Altıntaş, A., Yücetürk, B., Aydın, D., Soygazi, F., & Kılıçaslan, Y. (2025). Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science, 5(1), 53-62. https://izlik.org/JA86FE93UF

AMA

1.Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science. 2025;5(1):53-62. https://izlik.org/JA86FE93UF

Chicago

Doğan, Onur, Almila Altıntaş, Buse Yücetürk, Doğa Aydın, Fatih Soygazi, ve Yılmaz Kılıçaslan. 2025. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science 5 (1): 53-62. https://izlik.org/JA86FE93UF.

EndNote

Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y (01 Haziran 2025) Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science 5 1 53–62.

IEEE

[1]O. Doğan, A. Altıntaş, B. Yücetürk, D. Aydın, F. Soygazi, ve Y. Kılıçaslan, “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”, Journal of Artificial Intelligence and Data Science, c. 5, sy 1, ss. 53–62, Haz. 2025, [çevrimiçi]. Erişim adresi: https://izlik.org/JA86FE93UF

ISNAD

Doğan, Onur - Altıntaş, Almila - Yücetürk, Buse - Aydın, Doğa - Soygazi, Fatih - Kılıçaslan, Yılmaz. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science 5/1 (01 Haziran 2025): 53-62. https://izlik.org/JA86FE93UF.

JAMA

1.Doğan O, Altıntaş A, Yücetürk B, Aydın D, Soygazi F, Kılıçaslan Y. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science. 2025;5:53–62.

MLA

Doğan, Onur, vd. “Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation”. Journal of Artificial Intelligence and Data Science, c. 5, sy 1, Haziran 2025, ss. 53-62, https://izlik.org/JA86FE93UF.

Vancouver

1.Onur Doğan, Almila Altıntaş, Buse Yücetürk, Doğa Aydın, Fatih Soygazi, Yılmaz Kılıçaslan. Exploring Semantic Consistency in Generative Artificial Intelligence via Text-to-Image and Image-to-Text Transformation. Journal of Artificial Intelligence and Data Science [Internet]. 01 Haziran 2025;5(1):53-62. Erişim adresi: https://izlik.org/JA86FE93UF