Araştırma Makalesi

Fusion of High-Level Visual Attributes for Image Captioning

Sayı: 52 15 Aralık 2023
PDF İndir
TR EN

Fusion of High-Level Visual Attributes for Image Captioning

Öz

Image captioning aims to generate a natural language description that accurately conveys the content of an image. Recently, deep learning models have been used to extract visual attributes from images, enhancing the accuracy of captions. However, it is essential to assess these visual attributes to ensure optimal performance and avoid incorporating redundant or misleading information. In this study, we employ the visual attributes of semantic segmentation, object detection, instance segmentation, keypoint detection, and their fusion. Experimental evaluations on the commonly used datasets VizWiz and MSCOCO Captions demonstrate that the fusion of visual attributes improves the accuracy of caption generation. Furthermore, the image captioning model, which utilizes the fusion of visual attributes, has been embedded into our custom-designed Android application, named NObstacle, enabling captioning without the need for an internet connection.

Anahtar Kelimeler

Destekleyen Kurum

TUBITAK ve İKCU BAP

Proje Numarası

120N995, 2021-ÖDL-MÜMF-0006, 2022-TYL-FEBE-0012

Teşekkür

This research was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) British Council (The Newton Katip Celebi Fund Institutional Links, Turkey UK project: 120N995) and by the scientific research projects coordination unit of Izmir Katip Celebi University (project no: 2021-ÖDL-MÜMF-0006, & 2022-TYL-FEBE-0012).

Kaynakça

  1. Akosman, Ş. A., Öktem, M., Moral, Ö. T., & Kılıç, V. (2021). Deep Learning-based Semantic Segmentation for Crack Detection on Marbles. 29th Signal Processing and Communications Applications Conference (SIU),
  2. Amit, Y., Felzenszwalb, P., & Girshick, R. (2020). Object detection. Computer Vision: A Reference Guide, 1-9.
  3. Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). Spice: Semantic propositional image caption evaluation. Computer Vision–ECCV: 14th European Conference, Amsterdam, The Netherlands, October 11-14, Proceedings, Part V 14,
  4. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. Proceedings of the IEEE conference on computer vision and pattern recognition,
  5. Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization,
  6. Baran, M., Moral, Ö. T., & Kılıç, V. (2021). Akıllı telefonlar için birleştirme modeli tabanlı görüntü altyazılama. Avrupa Bilim ve Teknoloji Dergisi(26), 191-196.
  7. Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key. net: Keypoint detection by handcrafted and learned cnn filters. Proceedings of the IEEE/CVF international conference on computer vision,
  8. Betül, U., Çaylı, Ö., Kılıç, V., & Onan, A. (2022). Resnet based deep gated recurrent unit for image captioning on smartphone. Avrupa Bilim ve Teknoloji Dergisi(35), 610-615.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Görme, Doğal Dil İşleme

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

5 Aralık 2023

Yayımlanma Tarihi

15 Aralık 2023

Gönderilme Tarihi

18 Ağustos 2023

Kabul Tarihi

10 Eylül 2023

Yayımlandığı Sayı

Yıl 2023 Sayı: 52

Kaynak Göster

APA
Kılcı, M., Çaylı, Ö., & Kılıç, V. (2023). Fusion of High-Level Visual Attributes for Image Captioning. Avrupa Bilim ve Teknoloji Dergisi, 52, 161-168. https://izlik.org/JA87DT39DN
AMA
1.Kılcı M, Çaylı Ö, Kılıç V. Fusion of High-Level Visual Attributes for Image Captioning. EJOSAT. 2023;(52):161-168. https://izlik.org/JA87DT39DN
Chicago
Kılcı, Murat, Özkan Çaylı, ve Volkan Kılıç. 2023. “Fusion of High-Level Visual Attributes for Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi, sy 52: 161-68. https://izlik.org/JA87DT39DN.
EndNote
Kılcı M, Çaylı Ö, Kılıç V (01 Aralık 2023) Fusion of High-Level Visual Attributes for Image Captioning. Avrupa Bilim ve Teknoloji Dergisi 52 161–168.
IEEE
[1]M. Kılcı, Ö. Çaylı, ve V. Kılıç, “Fusion of High-Level Visual Attributes for Image Captioning”, EJOSAT, sy 52, ss. 161–168, Ara. 2023, [çevrimiçi]. Erişim adresi: https://izlik.org/JA87DT39DN
ISNAD
Kılcı, Murat - Çaylı, Özkan - Kılıç, Volkan. “Fusion of High-Level Visual Attributes for Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi. 52 (01 Aralık 2023): 161-168. https://izlik.org/JA87DT39DN.
JAMA
1.Kılcı M, Çaylı Ö, Kılıç V. Fusion of High-Level Visual Attributes for Image Captioning. EJOSAT. 2023;:161–168.
MLA
Kılcı, Murat, vd. “Fusion of High-Level Visual Attributes for Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi, sy 52, Aralık 2023, ss. 161-8, https://izlik.org/JA87DT39DN.
Vancouver
1.Murat Kılcı, Özkan Çaylı, Volkan Kılıç. Fusion of High-Level Visual Attributes for Image Captioning. EJOSAT [Internet]. 01 Aralık 2023;(52):161-8. Erişim adresi: https://izlik.org/JA87DT39DN