TY - JOUR T1 - Türkçe dilinde görüntü altyazısı: veritabanı ve model TT - Images captioning in Turkish language: database and model AU - Battini Sonmez, Elena AU - Yıldız, Tuğba AU - Yılmaz, Berk Dursun AU - Demir, Ali Emre PY - 2020 DA - July Y2 - 2020 DO - 10.17341/gazimmfd.597089 JF - Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi JO - GUMMFD PB - Gazi Üniversitesi WT - DergiPark SN - 1300-1884 SP - 2089 EP - 2100 VL - 35 IS - 4 LA - tr AB - Otomatik görüntü altyazısı, yapay zekânın hem bilgisayarlagörme hem de doğal dil işleme alanlarını kapsayan bir konudur. Makine çevirisialanındaki gelişmelerden ilham alan ve bu alanda başarılı sonuçlar verenkodlayıcı-kod çözücü tekniği, şu anda İngilizce görüntü altyazısı konusunda kullanılanmevcut yöntemlerden biridir. Bu çalışmada, Türkçe dili için otomatik görüntü altyazısıoluşturan bir model sunulmaktadır. Bu çalışma, verilen görüntülerinözelliklerini çıkarmaktan sorumlu olan bir CNN kodlayıcıyı, altyazı oluşturmaktansorumlu olan bir RNN kod çözücüsü ile birleştirilerek, Türkçe MS COCO veritabanını üzerinde Türkçe görüntü altyazısı kodlayıcı-kod çözücü modelini test etmektedir.Üretken modelin performansı yeni oluşturulan veri tabanında hem BLEU, METEOR,ROUGE ve CIDEr gibi en yaygın değerlendirme ölçütleri hem de insan tabanlıyöntemler kullanılarak değerlendirilmiştir. Sonuçlar, önerilen modelinperformansının hem niteliksel hem de niceliksel olarak tatmin edici olduğunugöstermektedir. Çalışma sonunda hazırlanan, herkesin kullanımına açık bir Web uygulamasıuygulaması[1]sayesinde Türkçe dili için MS COCO görüntülerine ait Türkçe girişlerinyapıldığı bir ortam kullanıcıya sunulmuştur. Tüm görüntüler tamamlandığında,Türkçe’ye özgü ve karşılaştırmalı çalışmaların yapıldığı bir veri kümesitamamlanmış olacaktır.[1] http://mscococontributor.herokuapp.com/website/ KW - Türkçe görüntü altyazısı KW - Türkçe MS COCO KW - Bilgisayarlı görme KW - Doğal dil işleme KW - CNN KW - RNN N2 - Automatic image captioning is a challenging issue inartificial intelligence, which covers both the fields of computer vision andnatural language processing. Inspired by the later advances in machinetranslation, a successful encoder-decoder technique is currently thestate-of-the-art in English language captioning. In this study, we proposed animage captioning model for Turkish Language. This paper evaluate theencoder-decoder model on MS COCO database by coupling an encoder CNN -thecomponent that is responsible for extracting the features of the given images-,with a decoder RNN -the component that is responsible for generating captionsusing the given inputs- to generate Turkish captions. We conducted theexperiments using the most common evaluation metrics such as BLEU, METEOR,ROUGE and CIDEr. Results show that the performance of the proposed model issatisfactory in both qualitative and quantitatively evaluations. Finally, thisstudy introduces a Web platform, which is proposed to improve the dataset viacrowd-sourcing and free to use. The Turkish MS COCO database is available forresearch purpose. When all the imagesare completed, a Turkish dataset will be available for comparative studies. CR - Yang, Y., Teo, C.L., Daume, H. ve Aloimono, Y., Corpus-Guided Sentence Generation of Natural Images, Conference on Empirical Methods in Natural Language Processing, Edinburgh - United Kingdom, 444–454, July 27 - 31, 2011. CR - Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Stratos, K., Han, X., Mensch, A., Berg, A. Berg, H. ve Daume, H., Generating Image Descriptions from Computer Vision Detections, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon - France, 747–756, April 2012. CR - Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A.C. ve Berg, T. L., Baby talk: Understanding and Generating Simple Image Descriptions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2891–2903, 2013. CR - Ushiku, Y., Yamaguchi, M., Mukuta, Y. ve Harada, T., Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images, IEEE International Conference on Computer Vision, Washington DC - USA, 2668–2676, December 07-13, 2015. CR - Ordonez, V., Kulkarni, G. ve Berg, T.L., Im2text: Describing Images Using 1 Million Captioned Photographs, Advances in Neural Information Processing Systems 24, 1143—1151, 2011. CR - Gupta, A., Verma, Y. ve Jawahar., C.V., Choosing Linguistics over Vision to Describe Images, AAAI Conference on Artificial Intelligence, Toronto - Canada, 606-612, July 22-26, 2012. CR - Farhadi, A. ve Sadeghi, M. A., Phrasal Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2854–2865, 2013.Mason, R. ve Charniak, E., Nonparametric Method for Data-Driven Image Captioning, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore - Maryland, 592–598, June, 2014. CR - Kuznetsova, P., Ordonez, V., Berg, T. ve Choi, Y., Tree talk: Composition and Compression of Trees for Image Descriptions, Transaction of Association for Computational Linguistics, 2 (10), 351–362, 2014. CR - Kalchbrenner, N. ve Blunsom, P., Two Recurrent Continuous Translation Models, ACL Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 1700–1709, 2013. CR - Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., ve Bengio, Y., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, CoRR, abs/1406.1078, 2014. CR - Sutskever, I., Vinyals, O. ve Quoc V. Le, Q.V., Sequence to Sequence Learning with Neural Networks, 27th International Conference on Neural Information Processing Systems (NIPS'14), 2, Editör: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D. ve Weinberger, K.Q, MIT Press, Cambridge, MA, USA, 3104-3112, 2014. CR - Vinyals, O., Alexander Toshev, A., Bengio, S., Erhan, D., Show and Tell: A Neural Image Caption Generator, CoRR, 2014. CR - Hochreiter, S. ve Schmidhuber, J., Long Short-Term Memory, Neural Computation, 9(8):1735–1780, 1997. CR - Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P. ve Zitnick, C.L., Microsoft COCO: Common Objects in Context, Computer Vision, Springer International Publishing, ECCV 2014, Zurich - Switzerland, 740—755, September 6-12, 2014. CR - Kiros, R., Salakhutdinov, R. ve Zemel, R., Multimodal Neural Language Models, 31st International Conference on Machine Learning, Proceedings of Machine Learning Research (PMLR), 32(2), 595-603, 2014. CR - Kiros, R., Salakhutdinov, R. ve Zemel, R.S., Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, CoRR, abs/1411.2539, 2014. CR - Mao, J., Xu, W., Yang, Y., Wang, J. ve Yuille, A.L., Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN), 3rd International Conference on Learning Representations (ICLR), San Diego - CA - USA, May 7-9, 2015. CR - Hodosh, M., Young, P. ve Hockenmaier, J., Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics, Journal of Artificial Intelligence Research 47, 853-899, 2013. CR - Young, P., Lai, A., Hodosh, M. ve Hockenmaier, J., From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions, TACL,2, 67-78, 2014. CR - Socher, R., Karpathy, A., Le, Q.V., Manning, C.D. ve Ng, A., Grounded Compositional Semantics for Finding and Describing Images with Sentences, Transactions of the Association for Computational Linguistics, 2, 207–218, 2014. CR - Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K. ve Darrell, T. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description, IEEE Conference on Computer Vision and Pattern Recognition, 2625–2634, 2015. CR - Karpathy, A. ve Fei-Fei, L., Deep Visual-Semantic Alignments for Generating Image Descriptions, IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 664-676, April 2017. CR - Jia, X., Gavves, E., Fernando, B. ve Tuytelaars, T., Guiding the Long-Short Term Memory Model for Image Caption Generation, IEEE International Conference on Computer Vision, 2407–2415, 2015. CR - Yang, Z., Yuan, Y., Wu, Y., Cohen, W.W. ve Salakhutdinov, R.R., Review Networks for Caption Generation, Advances in Neural Information Processing Systems 29 (NIPS2016_6167), Editör: Lee D.D., Sugiyama, M., Luxburg, U.V., Guyon, I. ve Garnett, R., 2361—2369, 2016. CR - Xu, K., Lei Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R.S. ve Bengio, Y, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 32nd International Conference on Machine Learning - Volume 37 (ICML'15), 37, Editör: Bach, F. ve David Blei, D, JMLR.org 2048-2057, 2015. CR - Park, C.C., Kim, B. ve G. Kim, G., Attend to You: Personalized Image Captioning with Context Sequence Memory Networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu - HI, 6432-6440, 2017. CR - Tavakoli, H.R., Shetty, R., Borji, A. ve Laaksonen, J., Paying Attention to Descriptions Generatedby Image Captioning Models, IEEE Conference on Computer Vision and Pattern Recognition, 2506-2515, 2017. CR - Liu, C., Mao, J., Sha, F., ve Yuille, A.L., Attention Correctness in Neural Image Captioning, 31st AAAI Conference on Artificial Intelligence (AAAI'17), AAAI Press, 4176–4182, 2017. CR - Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J. ve Chua, T.S., SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6298–6306, 2017. CR - Lu, J., Xiong, C., Parikh, D. ve Socher, R., Knowing When to Look: Adaptive Attention via Avisual Sentinel for Image Captioning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3242–3250, 2017. CR - You, Q., Jin, H., Wang, Z., Fang, C. ve Luo, J., Image Captioning with Semantic Attention, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas - NV, 4651–4659, 2016. CR - Yao, T., Pan, Y., Li, Y., Qiu, Z. ve Tao Mei, T., Boosting Image Captioning with Attributes, IEEE International Conference on Computer Vision (ICCV), Venice - Italy, 4904–4912, 2017. CR - Shetty, R., Rohrbach, M., Hendricks, L.A., Fritz, M. ve Schiele, B., Speaking the SameLanguage: Matching Machine to Human Captions by Adversarial Training, IEEE International Conference onComputer Vision (ICCV), Venice - Italy, 4155–4164, 2017. CR - Dai, B., Lin, D., Urtasun, R. ve Fidler, S., Towards Diverse and Natural Image Descriptions via a Conditional GAN, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu - HI, 2989–2998, 2017. CR - Aneja, J., Deshpande, A. ve Schwing, A.G., Convolutional Image Captioning, IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City - UT, 5561–5570, 2018. CR - Wang, Q. ve Chan, A.B., {CNN+CNN:} Convolutional Decoders for Image Captioning, CoRR, abs/1805.09019, 2018. CR - Unal, M.E., Citamak, B., Yagcioglu, S., Erdem, A., Erdem, E., Cinbis, N.I. ve Cakici, R., TasvirEt: A Benchmark Dataset for Automatic Turkish Description Generation from Images, 24th Signal Processing and Communication Application Conference (SIU), Zonguldak - Turkey, 2016. CR - Samet, N., Hiçsönmez, S., Duygulu, P. ve Akbas, E., Görüntü Altyazılama için Otomatik Tercümeyle Egitim Kümesi Olusturulabilir mi? Could We Create A Training Set For Image Captioning Using Automatic Translation? 25th Signal Processing and Communications Applications Conference (SIU), Antalya-TR, 2017. CR - Kuyu, M., Erdem, A., ve Erdem, E., Image Captioning in Turkish with Subword Units, 26th Signal Processing and Communications Applications Conference (SIU), Izmir-TR, 2018, 1-4, 2018. CR - Yüksek, Y. ve Karasulu, B., Coklu Ortam Ontolojilerini Kullanan Anlamsal Video Analizi Üzerine bir İnceleme (A Review on Semantic Video Analysis Using Multimedia Ontologies), Gazi Üniv. Müh. Mim. Fak. Dergisi, 25(4), 719-739, 2010. UR - https://doi.org/10.17341/gazimmfd.597089 L1 - https://dergipark.org.tr/tr/download/article-file/1211769 ER -