Araştırma Makalesi

A Benchmark for Feature-injection Architectures in Image Captioning

Sayı: 31 31 Aralık 2021
PDF İndir
TR EN

A Benchmark for Feature-injection Architectures in Image Captioning

Öz

Describing an image with a grammatically and semantically correct sentence, known as image captioning, has been improved significantly with recent advances in computer vision (CV) and natural language processing (NLP) communities. The integration of these communities leads to the development of feature-injection architectures, which define how extracted features are used in captioning. In this paper, a benchmark of feature-injection architectures that utilize CV and NLP techniques is reported for encoder-decoder based captioning. Benchmark evaluations include Inception-v3 convolutional neural network to extract image features in the encoder while the feature-injection architectures such as init-inject, pre-inject, par-inject and merge are applied with a multi-layer gated recurrent unit (GRU) to generate captions in the decoder. Architectures have been evaluated extensively on the MSCOCO dataset across eight performance metrics. It has been concluded that the init-inject architecture with 3-layer GRU outperforms the other architectures in terms of captioning accuracy.

Anahtar Kelimeler

Destekleyen Kurum

TÜBİTAK

Proje Numarası

120N995

Teşekkür

This research was supported by the Scientific and Technological Research Council of Turkey (TUBITAK)-British Council (The Newton-Katip Celebi Fund Institutional Links, Turkey-UK project: 120N995).

Kaynakça

  1. Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). Spice: Semantic propositional image caption evaluation. Paper presented at the European Conference on Computer Vision.
  2. Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Paper presented at the Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization.
  3. Baran, M., Moral, Ö. T., & Kılıç, V. (2021). Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. Avrupa Bilim ve Teknoloji Dergisi(26), 191-196.
  4. Çaylı, Ö., Makav, B., Kılıç, V., & Onan, A. (2020). Mobile Application Based Automatic Caption Generation for Visually Impaired. Paper presented at the International Conference on Intelligent and Fuzzy Systems.
  5. Chang, S.-F. (1995). Compressed-domain techniques for image/video indexing and manipulation. Paper presented at the Proceedings., International Conference on Image Processing.
  6. Chiarella, D., Yarbrough, J., & Jackson, C. A.-L. (2020). Using alt text to make science Twitter more accessible for people with visual impairments. Nature Communications, 11(1), 1-3.
  7. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  8. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Aralık 2021

Gönderilme Tarihi

22 Ekim 2021

Kabul Tarihi

6 Aralık 2021

Yayımlandığı Sayı

Yıl 2021 Sayı: 31

Kaynak Göster

APA
Keskin, R., Çaylı, Ö., Moral, Ö. T., Kılıç, V., & Onan, A. (2021). A Benchmark for Feature-injection Architectures in Image Captioning. Avrupa Bilim ve Teknoloji Dergisi, 31, 461-468. https://doi.org/10.31590/ejosat.1013329
AMA
1.Keskin R, Çaylı Ö, Moral ÖT, Kılıç V, Onan A. A Benchmark for Feature-injection Architectures in Image Captioning. EJOSAT. 2021;(31):461-468. doi:10.31590/ejosat.1013329
Chicago
Keskin, Rumeysa, Özkan Çaylı, Özge Taylan Moral, Volkan Kılıç, ve Aytuğ Onan. 2021. “A Benchmark for Feature-injection Architectures in Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi, sy 31: 461-68. https://doi.org/10.31590/ejosat.1013329.
EndNote
Keskin R, Çaylı Ö, Moral ÖT, Kılıç V, Onan A (01 Aralık 2021) A Benchmark for Feature-injection Architectures in Image Captioning. Avrupa Bilim ve Teknoloji Dergisi 31 461–468.
IEEE
[1]R. Keskin, Ö. Çaylı, Ö. T. Moral, V. Kılıç, ve A. Onan, “A Benchmark for Feature-injection Architectures in Image Captioning”, EJOSAT, sy 31, ss. 461–468, Ara. 2021, doi: 10.31590/ejosat.1013329.
ISNAD
Keskin, Rumeysa - Çaylı, Özkan - Moral, Özge Taylan - Kılıç, Volkan - Onan, Aytuğ. “A Benchmark for Feature-injection Architectures in Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi. 31 (01 Aralık 2021): 461-468. https://doi.org/10.31590/ejosat.1013329.
JAMA
1.Keskin R, Çaylı Ö, Moral ÖT, Kılıç V, Onan A. A Benchmark for Feature-injection Architectures in Image Captioning. EJOSAT. 2021;:461–468.
MLA
Keskin, Rumeysa, vd. “A Benchmark for Feature-injection Architectures in Image Captioning”. Avrupa Bilim ve Teknoloji Dergisi, sy 31, Aralık 2021, ss. 461-8, doi:10.31590/ejosat.1013329.
Vancouver
1.Rumeysa Keskin, Özkan Çaylı, Özge Taylan Moral, Volkan Kılıç, Aytuğ Onan. A Benchmark for Feature-injection Architectures in Image Captioning. EJOSAT. 01 Aralık 2021;(31):461-8. doi:10.31590/ejosat.1013329

Cited By