Araştırma Makalesi

Fusion of High-Level Visual Attributes for Image Captioning

Sayı: 52 15 Aralık 2023
PDF İndir
TR EN

Fusion of High-Level Visual Attributes for Image Captioning

Abstract

Image captioning aims to generate a natural language description that accurately conveys the content of an image. Recently, deep learning models have been used to extract visual attributes from images, enhancing the accuracy of captions. However, it is essential to assess these visual attributes to ensure optimal performance and avoid incorporating redundant or misleading information. In this study, we employ the visual attributes of semantic segmentation, object detection, instance segmentation, keypoint detection, and their fusion. Experimental evaluations on the commonly used datasets VizWiz and MSCOCO Captions demonstrate that the fusion of visual attributes improves the accuracy of caption generation. Furthermore, the image captioning model, which utilizes the fusion of visual attributes, has been embedded into our custom-designed Android application, named NObstacle, enabling captioning without the need for an internet connection.

Keywords

Destekleyen Kurum

TUBITAK ve İKCU BAP

Proje Numarası

120N995, 2021-ÖDL-MÜMF-0006, 2022-TYL-FEBE-0012

Teşekkür

This research was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) British Council (The Newton Katip Celebi Fund Institutional Links, Turkey UK project: 120N995) and by the scientific research projects coordination unit of Izmir Katip Celebi University (project no: 2021-ÖDL-MÜMF-0006, & 2022-TYL-FEBE-0012).

Kaynakça

  1. Akosman, Ş. A., Öktem, M., Moral, Ö. T., & Kılıç, V. (2021). Deep Learning-based Semantic Segmentation for Crack Detection on Marbles. 29th Signal Processing and Communications Applications Conference (SIU),
  2. Amit, Y., Felzenszwalb, P., & Girshick, R. (2020). Object detection. Computer Vision: A Reference Guide, 1-9.
  3. Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). Spice: Semantic propositional image caption evaluation. Computer Vision–ECCV: 14th European Conference, Amsterdam, The Netherlands, October 11-14, Proceedings, Part V 14,
  4. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. Proceedings of the IEEE conference on computer vision and pattern recognition,
  5. Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization,
  6. Baran, M., Moral, Ö. T., & Kılıç, V. (2021). Akıllı telefonlar için birleştirme modeli tabanlı görüntü altyazılama. Avrupa Bilim ve Teknoloji Dergisi(26), 191-196.
  7. Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key. net: Keypoint detection by handcrafted and learned cnn filters. Proceedings of the IEEE/CVF international conference on computer vision,
  8. Betül, U., Çaylı, Ö., Kılıç, V., & Onan, A. (2022). Resnet based deep gated recurrent unit for image captioning on smartphone. Avrupa Bilim ve Teknoloji Dergisi(35), 610-615.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Görme , Doğal Dil İşleme

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

5 Aralık 2023

Yayımlanma Tarihi

15 Aralık 2023

Gönderilme Tarihi

18 Ağustos 2023

Kabul Tarihi

10 Eylül 2023

Yayımlandığı Sayı

Yıl 2023 Sayı: 52

Kaynak Göster

APA
Kılcı, M., Çaylı, Ö., & Kılıç, V. (2023). Fusion of High-Level Visual Attributes for Image Captioning. Avrupa Bilim ve Teknoloji Dergisi, 52, 161-168. https://izlik.org/JA87DT39DN