Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama

Muharrem Baran; Özge Taylan Moral; Volkan Kılıç

doi:10.31590/ejosat.950924

TR EN

Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama

Öz

Görüntü altyazılama, bir görüntünün metinsel açıklamasını doğal dil işleme ve bilgisayarlı görü kullanılarak oluşturma işlemidir. Bir görüntünün görsel içeriğini makineye tanımlatmak, potansiyel uygulamaları nedeniyle son yıllarda artarak ilgi görmüştür. Bu çalışmada, akıllı telefonlarda uygulanabilir, kodlayıcı-kod çözücü yaklaşımına dayanan birleştirme modeli tabanlı bir görüntü altyazılama sistemi önerilmektedir. Önerilen birleştirme modelinde kodlayıcı olarak görüntü özniteliklerini çıkarmak için VGG16 evrişimsel sinir ağları ve kelime özelliklerini çıkarmak için uzun-kısa dönemli bellek yapısı kullanılmıştır. Bu iki işlem sonrası, görüntü özniteliklerinin ve oluşturulan kelime özelliklerinin kodlanmış biçimleri önerilen modelde birleştirilmiştir. Bu iki kodlanmış girdinin kombinasyonu daha sonra dizideki bir sonraki kelimeyi oluşturmak için çok basit bir kod çözücü modeli tarafından kullanılarak görüntülerin doğal dile uygun altyazıları başarıyla üretilmiştir. Önerilen sistem Flickr8k/30k veri kümeleri üzerinde BLEUn metriği kullanılarak test edilmiş ve literatürdeki çalışmalarla kıyaslanarak sağladığı üstünlük gösterilmiştir. Önerilen sistem, ayrıca, benzer çalışmalardan farklı olarak internet bağlantısı olmadan görüntü altyazısı üretebilecek şekilde geliştirdiğimiz ImCap adlı Android uygulamamız üzerinde de başarıyla çalıştırılmıştır. Bu uygulama ile görüntü altyazılamanın daha çok kullanıcıya ulaşması amaçlanmıştır.

Anahtar Kelimeler

Merge Model Based Image Captioning for Smartphones

Öz

Image Captioning is the process of generating a textual description of an image by using both natural language processing and computer vision. Definition of the visual content of an image to the machine has attracted increasing attention in recent years due to its potential applications. In this study, an image captioning system based on an encoder-decoder merge model approach, applicable to smartphones, is proposed. In the proposed merge model, VGG16 convolutional neural networks are used to extract the image features and long-short term memory are used to extract the word features as encoder. After these two processes, the encoded forms of the images and the word features were merged in the proposed model. Image captioning was done successfully after the combination of these two encoded inputs had been used by a very simple decoder model to generate the next word in the sequence. The proposed system was tested using the BLEUn metric on the Flickr8k/30k dataset and its superiority was shown by comparing it with the studies in the literature. The proposed system was also integrated with our Android application called ImCap, which we have developed to generate captions without an internet connection, unlike other similar studies. With this application, image captioning is aimed to reach more users.

Anahtar Kelimeler

Kaynakça

Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., . . . White, S. (2010). Vizwiz: nearly real-time answers to visual questions. Paper presented at the Proceedings of the 23rd annual ACM symposium on User interface software and technology.
Brownlee, J. (2019). A gentle introduction to pooling layers for convolutional neural networks. Machine Learning Mastery, 22.
Çaylı, Ö., Makav, B., Kılıç, V., & Onan, A. (2020). Mobile Application Based Automatic Caption Generation for Visually Impaired. Paper presented at the International Conference on Intelligent and Fuzzy Systems.
Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dollár, P., & Zitnick, C. L. (2015). Microsoft coco captions: Data collection and evaluation server. J arXiv preprint arXiv:.00325.
Chen, X., & Zitnick, C. L. (2014). Learning a recurrent visual representation for image caption generation. J arXiv preprint arXiv:1411.5654.
Elliott, D., & Keller, F. (2013). Image description using visual dependency representations. Paper presented at the Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.
Flair, D. (2019). Python based Project – Learn to Build Image Caption Generator with CNN and LSTM.
Hendricks, L. A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., & Darrell, T. (2016). Deep compositional captioning: Describing novel object categories without paired training data. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.

Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. J Journal of Artificial Intelligence Research, 47, 853-899.
Hossain, M. Z., Sohel, F., Shiratuddin, M. F., & Laga, H. (2019). A comprehensive survey of deep learning for image captioning. ACM Computing Surveys, 51(6), 1-36.
Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). Unifying visual-semantic embeddings with multimodal neural language models. J arXiv preprint arXiv:.
Kuznetsova, P., Ordonez, V., Berg, T. L., & Choi, Y. (2014). Treetalk: Composition and compression of trees for image descriptions. Transactions of the Association for Computational Linguistics, 2, 351-362.
Leon, V., Mouselinos, S., Koliogeorgi, K., Xydis, S., Soudris, D., & Pekmestzi, K. (2020). A tensorflow extension framework for optimized generation of hardware cnn inference engines. J Technologies, 8(1), 6.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., . . . Zitnick, C. L. (2014). Microsoft coco: Common objects in context. Paper presented at the European Conference on Computer Vision.
Makav, B., & Kılıç, V. (2019a). A new image captioning approach for visually impaired people. Paper presented at the 2019 11th International Conference on Electrical and Electronics Engineering (ELECO).
Makav, B., & Kılıç, V. (2019b). Smartphone-based image captioning for visually and hearing impaired. Paper presented at the 2019 11th International Conference on Electrical and Electronics Engineering (ELECO).
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., & Yuille, A. (2014). Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:.
Mason, R., & Charniak, E. (2014). Nonparametric method for data-driven image captioning. Paper presented at the Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Mathews, A., Xie, L., & He, X. (2018). Semstyle: Learning to generate stylised image captions using unaligned text. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Socher, R., Karpathy, A., Le, Q. V., Manning, C. D., & Ng, A. Y. (2014). Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2, 207-218.
Tanti, M., Gatt, A., & Camilleri, K. P. (2018). Where to put the image in an image caption generator. Natural Language Engineering, 24(3), 467-489.
Wang, H., Wang, H., & Xu, K. (2020). Evolutionary Recurrent Neural Network for Image Captioning. Neurocomputing.
Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Journal of Wiley Interdisciplinary Reviews: Data Mining Knowledge Discovery, 8(4), e1253.

Ayrıntılar

Birincil Dil

Türkçe

Konular

Mühendislik

Bölüm

Konferans Bildirisi

Yazarlar

Muharrem Baran
0000-0001-7394-3649
Türkiye

Özge Taylan Moral
0000-0003-0482-267X
Türkiye

Volkan Kılıç ^*
0000-0002-3164-1981
Türkiye

Yayımlanma Tarihi

31 Temmuz 2021

Gönderilme Tarihi

11 Haziran 2021

Kabul Tarihi

25 Haziran 2021

Yayımlandığı Sayı

Yıl 2021 Sayı: 26

DOI

https://doi.org/10.31590/ejosat.950924

IZ

https://izlik.org/JA75RD73XR

Kaynak Göster

RIS / Bibtex

APA

Baran, M., Moral, Ö. T., & Kılıç, V. (2021). Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. Avrupa Bilim ve Teknoloji Dergisi, 26, 191-196. https://doi.org/10.31590/ejosat.950924

AMA

1.Baran M, Moral ÖT, Kılıç V. Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. EJOSAT. 2021;(26):191-196. doi:10.31590/ejosat.950924

Chicago

Baran, Muharrem, Özge Taylan Moral, ve Volkan Kılıç. 2021. “Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama”. Avrupa Bilim ve Teknoloji Dergisi, sy 26: 191-96. https://doi.org/10.31590/ejosat.950924.

EndNote

Baran M, Moral ÖT, Kılıç V (01 Temmuz 2021) Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. Avrupa Bilim ve Teknoloji Dergisi 26 191–196.

IEEE

[1]M. Baran, Ö. T. Moral, ve V. Kılıç, “Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama”, EJOSAT, sy 26, ss. 191–196, Tem. 2021, doi: 10.31590/ejosat.950924.

ISNAD

Baran, Muharrem - Moral, Özge Taylan - Kılıç, Volkan. “Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama”. Avrupa Bilim ve Teknoloji Dergisi. 26 (01 Temmuz 2021): 191-196. https://doi.org/10.31590/ejosat.950924.

JAMA

1.Baran M, Moral ÖT, Kılıç V. Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. EJOSAT. 2021;:191–196.

MLA

Baran, Muharrem, vd. “Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama”. Avrupa Bilim ve Teknoloji Dergisi, sy 26, Temmuz 2021, ss. 191-6, doi:10.31590/ejosat.950924.

Vancouver

1.Muharrem Baran, Özge Taylan Moral, Volkan Kılıç. Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama. EJOSAT. 01 Temmuz 2021;(26):191-6. doi:10.31590/ejosat.950924

Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama

Akıllı Telefonlar için Birleştirme Modeli Tabanlı Görüntü Altyazılama

Öz

Anahtar Kelimeler

Merge Model Based Image Captioning for Smartphones

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Birincil Dil

Konular

Bölüm

Yazarlar

Yayımlanma Tarihi

Gönderilme Tarihi

Kabul Tarihi

Yayımlandığı Sayı

DOI

IZ

Kaynak Göster

Cited By

Sequence-to-Sequence Video Captioning with Residual Connected Gated Recurrent Units

A Benchmark for Feature-injection Architectures in Image Captioning

Resnet based Deep Gated Recurrent Unit for Image Captioning on Smartphone