Araştırma Makalesi

Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones

Sayı: 32 31 Aralık 2021
PDF İndir
TR EN

Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones

Öz

Video captioning is the visual understanding process to generate grammatically and semantically meaningful descriptions that are of interest in the fields of computer vision (CV) and natural language processing (NLP). Recent advances in the computing power of the mobile platform have led to many video captioning applications that use CV and NLP techniques. These video captioning applications mainly depend on the encoder-decoder approach running with the internet connection, which employs convolutional neural networks (CNNs) on the encoder and recurrent neural networks (RNNs) on the decoder. However, this approach is not powerful enough to get accurate captioning results, and fast response due to online data transfer. In this paper, therefore, the encoder-decoder approach has been extended with a sequence-to-sequence model under a multi-layer gated recurrent unit (GRU) to generate a semantically more coherent caption. Visual information from image features of each video frame is extracted with ResNet-101 CNN in the encoder to feed the multi-layer GRU based decoder for caption generation. The proposed approach has been compared with the state-of-the-art approaches using experiments on the MSVD dataset under eight performance metrics. In addition, the proposed approach is embedded into our custom-designed Android application, called WeCap, capable of faster caption generation without an internet connection.

Anahtar Kelimeler

Destekleyen Kurum

TÜBİTAK, BAP

Proje Numarası

120N995, 2021-ÖDL-MÜMF-0006

Kaynakça

  1. Amaresh, M., & Chitrakala, S. (2019). Video captioning using deep learning: An overview of methods, datasets and metrics. Paper presented at the 2019 International Conference on Communication and Signal Processing.
  2. Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). Spice: Semantic propositional image caption evaluation. Paper presented at the European Conference on Computer Vision.
  3. Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Paper presented at the Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
  4. Baraldi, L., Grana, C., & Cucchiara, R. (2017). Hierarchical boundary-aware neural encoder for video captioning. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  5. Çaylı, Ö., Makav, B., Kılıç, V., & Onan, A. (2020). Mobile Application Based Automatic Caption Generation for Visually Impaired. Paper presented at the International Conference on Intelligent and Fuzzy Systems.
  6. Chen, D., & Dolan, W. B. (2011). Collecting highly parallel data for paraphrase evaluation. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
  7. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  8. Gan, C., Yao, T., Yang, K., Yang, Y., & Mei, T. (2016). You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Aralık 2021

Gönderilme Tarihi

21 Aralık 2021

Kabul Tarihi

2 Ocak 2022

Yayımlandığı Sayı

Yıl 2021 Sayı: 32

Kaynak Göster

APA
Fetiler, B., Çaylı, Ö., Moral, Ö. T., Kılıç, V., & Onan, A. (2021). Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones. Avrupa Bilim ve Teknoloji Dergisi, 32, 221-226. https://doi.org/10.31590/ejosat.1039242
AMA
1.Fetiler B, Çaylı Ö, Moral ÖT, Kılıç V, Onan A. Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones. EJOSAT. 2021;(32):221-226. doi:10.31590/ejosat.1039242
Chicago
Fetiler, Bengü, Özkan Çaylı, Özge Taylan Moral, Volkan Kılıç, ve Aytuğ Onan. 2021. “Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones”. Avrupa Bilim ve Teknoloji Dergisi, sy 32: 221-26. https://doi.org/10.31590/ejosat.1039242.
EndNote
Fetiler B, Çaylı Ö, Moral ÖT, Kılıç V, Onan A (01 Aralık 2021) Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones. Avrupa Bilim ve Teknoloji Dergisi 32 221–226.
IEEE
[1]B. Fetiler, Ö. Çaylı, Ö. T. Moral, V. Kılıç, ve A. Onan, “Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones”, EJOSAT, sy 32, ss. 221–226, Ara. 2021, doi: 10.31590/ejosat.1039242.
ISNAD
Fetiler, Bengü - Çaylı, Özkan - Moral, Özge Taylan - Kılıç, Volkan - Onan, Aytuğ. “Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones”. Avrupa Bilim ve Teknoloji Dergisi. 32 (01 Aralık 2021): 221-226. https://doi.org/10.31590/ejosat.1039242.
JAMA
1.Fetiler B, Çaylı Ö, Moral ÖT, Kılıç V, Onan A. Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones. EJOSAT. 2021;:221–226.
MLA
Fetiler, Bengü, vd. “Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones”. Avrupa Bilim ve Teknoloji Dergisi, sy 32, Aralık 2021, ss. 221-6, doi:10.31590/ejosat.1039242.
Vancouver
1.Bengü Fetiler, Özkan Çaylı, Özge Taylan Moral, Volkan Kılıç, Aytuğ Onan. Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones. EJOSAT. 01 Aralık 2021;(32):221-6. doi:10.31590/ejosat.1039242

Cited By