Araştırma Makalesi

Resnet based Deep Gated Recurrent Unit for Image Captioning on Smartphone

Sayı: 35 7 Mayıs 2022
PDF İndir
TR EN

Resnet based Deep Gated Recurrent Unit for Image Captioning on Smartphone

Abstract

Image captioning aims at generating grammatically and semantically acceptable natural language sentences for visual contents. Gated recurrent units (GRU) based approaches have recently attracted much attention due to their performance in caption generation. Challenges with GRU are to deal with vanishing gradient problems and modulation of the most relevant information flow in deep networks. In this paper, we propose a resnet-based deep GRU approach to overcome the vanishing gradient problem with residual connections while the most relevant information is ensured to flow using multiple layers of GRU. Residual connections are employed between consecutive layers of deep GRU, which improves the gradient flow from lower to upper layers. Experimental investigations on the publicly available MSCOCO dataset prove that the proposed approach achieves comparable performance with some state-of-the-art approaches. Moreover, the approach is embedded into our custom-designed Android application, CaptionEye, which offers the ability to generate captions without an internet connection under a voice user interface.

Keywords

Kaynakça

  1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  2. Aydın, S., Çaylı, Ö., Kılıç, V., & Aytuğ Onan. (2022). Sequence-to-sequence video captioning with residual connected gated recurrent units. European Journal of Science and Technology((35), 380–386.
  3. Baran, M., Moral, Ö. T., & Kılıç, V. (2021). Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. European Journal of Science and Technology(26), 191-196.
  4. Bengio, Y., Simard, P., & Frasconi, P. J. I. t. o. n. n. (1994). Learning long-term dependencies with gradient descent is difficult. 5(2), 157-166.
  5. Chang, S.-F. (1995). Compressed-domain techniques for image/video indexing and manipulation. Paper presented at the Proceedings., International Conference on Image Processing.
  6. Chen, T., Zhang, Z., You, Q., Fang, C., Wang, Z., Jin, H., & Luo, J. (2018). ``Factual''or``Emotional'': Stylized Image Captioning with Adaptive Learning and Attention. Paper presented at the Proceedings of the European Conference on Computer Vision (ECCV).
  7. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  8. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. J. a. p. a. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

7 Mayıs 2022

Gönderilme Tarihi

21 Nisan 2022

Kabul Tarihi

27 Nisan 2022

Yayımlandığı Sayı

Yıl 2022 Sayı: 35

Kaynak Göster

APA
Uslu, B., Çaylı, Ö., Kılıç, V., & Onan, A. (2022). Resnet based Deep Gated Recurrent Unit for Image Captioning on Smartphone. Avrupa Bilim ve Teknoloji Dergisi, 35, 610-615. https://doi.org/10.31590/ejosat.1107035

Cited By