Research Article
BibTex RIS Cite

Video Duygu Analizi

Year 2021, Issue: Ejosat Ek Özel Sayı (HORA), 34 - 41, 28.02.2021
https://doi.org/10.31590/ejosat.1115837

Abstract

Bu çalışmada, videodaki görüntülerden tespit edilen insan yüzleri üzerinde CNN derin öğrenme modeli ile duygu analizi yapılmıştır. Bu analize ait sonuçlar saniye saniye kayıt edilerek bir duygu analizi grafiği çıkarılmıştır.
Çalışma 3 ana safhadan oluşmaktadır. İlki CNN modeli için gerekli duygu yüklü görsellerin bulunup etiketlenmesi, ikincisi duygu analizi yapabilecek bir CNN derin öğrenme modelinin oluşturulması ve üçüncüsü de videolardan yüz görüntülerinin tespit edilmesidir.
Eğitim veri seti oluşturmak amacıyla, seçilen 61 adet filmden binlerce yüz fotoğrafı analiz edilmiştir. Bunların arasında Bay Evet, Karabasan, Yaralı Yüz, Yedi Yaşam gibi farklı duyguların ağırlıklı olduğu filmler bulunmaktadır. İlk olarak 7 duygu türü için yüzler toplanmıştır. Bu duygular bıkkınlık, korku, mutluluk, sakinlik, şaşkınlık, sinirlilik ve üzgünlüktür. Yüz tespiti kısmında Haarcascade tekniği kullanılmıştır. Tespit edilen yüzlerin duygulara göre etiketlenmesinde, Amazon webservisi olan Face Recognition’dan yardım alınmıştır. Çalışmada, 50 bin civarı yüz örneklemi elde edilmiştir. Ancak daha sonra yapılan kontrollerde Haarcascade ile bulunmuş görüntüler arasında yüz olmayan birçok görsel tespit edilerek çıkarılmıştır. Ayrıca, Amazon web servisinden dönen duygu analizlerinde %40 civarında yanlış duygu tespiti olduğu belirlenerek, eğitim veri setinden çıkarılmıştır. Tüm veri seti temizleme çalışmaları sonucunda 7 duygu için etiketlenmiş 20 bin fotoğraf elde edilmiştir.
Derin öğrenme sonucu, yapılan sınamalarda en çok karıştırılan 4 duygudan 2’sinin bıkkınlık ve şaşkınlık olduğu gözlemlenmiştir. Bıkkınlık sakinlikle, şaşkınlık ise korku yüz ifadeleri ile karışmaktadır. Elimizde kalan 5 duygu ile yapılan analizde, önerilen model ile %60’lık doğruluk değerine ulaşılmıştır. Videodan yüzleri çıkarıp modele gönderen ve bu sonuçlar ile bir duygu analizi grafiği çıkaran yazılımda, yüz tespitinin daha doğru olması için gerçek zamanlı analizde Haarcascade yöntemi yerine bir DNN modeli kullanılmıştır.

References

  • OpenCV. 2017. http://www.opencv.org (Erişim Tarihi: 8.7.2017).
  • D. C. Cirean, U. Meier, J. Masci, and L. M. Gambardella, Flexible, High Performance Convolutional Neural Networks for Image Classification, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 1237–1242, 2012.
  • P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol: 1, pp.:I-511I518, 2001
  • R. Hussin, M.R. Juhari, N.W. Kang, R.C. Ismail and A. Kamarudin, “Digital image processing techniques for object detection from complex background image”. Procedia Engineering, vol:41, pp:340-344, 2012
  • Bradski G., Kaehler A. 2008. Learning OpenCV, O’Reilly Media Inc., USA .
  • Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
  • Görüntü İşleme Teknikleri ile 3B Yüz Tanıma , Maltepe Üniversitesi Fen Bilimleri Enstitüsü, 2019
  • Romero M., Pears N., “Landmark localisation in 3D face data”, 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, s.73–78, 2009.
  • Garcia-Garcia, A., Orts, S., Oprea, S., Villena Martinez, V., Martinez-Gonzalez, P., & Rodríguez, J., A Survey on Deep Learning Techniques for Image and Video Semantic Segmentation, Applied Soft Computing, 70, 41-65, 2018.
  • Tumor detection in MR images of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University 34:3 (2019) 1395-1408
  • M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 1, pp. 28–43, 2012.
  • A. Gudi, H. E. Tasli, T. M. Den Uyl, and A. Maroulis, “Deep Learning based FACS Action Unit Occurrence and Intensity Estimation,” vol. 2013, 2015.
  • P. Khorrami, T. L. Paine, K. Brady, C. Dagli, and T. S. Huang, “How Deep Neural Networks Can Improve Emotion Recognition on Video Data,” pp. 1–5, 2016.
  • H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H. Kim, "Facial Emotion Recognition Using an Ensemble of MultiLevel Convolutional Neural Networks," International Journal of Pattern Recognition and Artificial Intelligence, 2018
  • T. Cao and M. Li, "Facial Expression Recognition Algorithm Based on the Combination of CNN and K-Means," presented at the Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 2019.
  • Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
  • M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis, and machine vision. Cengage Learning”, 2014
  • Mesut Pişkin, https://github.com/mesutpiskin/computer-vision-guide/tree/master/code/yuz-tanima/python/dnn_yuz_tespiti
  • Emotion recognition and reaction prediction in videos, (Konferans Tarihi:3-5 Kasım.2017), (DOI:10.1109/ICRCICN.2017.8234476), INSPEC Erişim No: 17467617

Video Emotion Analysis

Year 2021, Issue: Ejosat Ek Özel Sayı (HORA), 34 - 41, 28.02.2021
https://doi.org/10.31590/ejosat.1115837

Abstract

In this study, emotional analysis was carried out with the CNN deep learning model on the human faces detected from the images in the video. The results of this analysis were recorded in seconds and an emotion analysis graph was created.
The study consists of 3 main stages. The first is to find and label the emotional images required for the CNN model, the second is to create a CNN deep learning model that can conduct emotion analysis, and the third is to identify the facial images from the videos.
In order to create a training data set, thousands of photographs from 61 selected films were analyzed. These include films with different feelings such as Yes Man, The Babadook, Scarface, Seven Pounds. First, faces were collected for 7 types of emotions. These feelings are boredom, fear, happiness, calmness, confusion, irritability and sadness. Haarcascade technique is used in the face detection section. Assistance was received from the Amazon Face Recognition web service for tagging detected faces according to emotions. In the study, about 50 thousand face samples were obtained. However, in the subsequent controls, many non-facial images were detected and removed from the images found with Haarcascade. In addition, approximately 40% false emotion detection was determined in the emotional analysis returned from the Amazon web service and removed from the training data set. As a result of the clearing of the whole data set, 20 thousand photos were tagged for 7 emotions.

As a result of the deep learning, it was observed that 2 of the 4 emotions most confused during the tests were boredom and confusion. Boredom is confused with calmness, and confusion with fear facial expressions. In the 5 emotion analysis that we have, 60% accuracy value has been reached with the proposed model. In the software that extracts faces from the video and sends them to the model and displays an emotional analysis graph with these results, a DNN model is used instead of Haarcascade method in real-time analysis to make the face detection more accurate.

References

  • OpenCV. 2017. http://www.opencv.org (Erişim Tarihi: 8.7.2017).
  • D. C. Cirean, U. Meier, J. Masci, and L. M. Gambardella, Flexible, High Performance Convolutional Neural Networks for Image Classification, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 1237–1242, 2012.
  • P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol: 1, pp.:I-511I518, 2001
  • R. Hussin, M.R. Juhari, N.W. Kang, R.C. Ismail and A. Kamarudin, “Digital image processing techniques for object detection from complex background image”. Procedia Engineering, vol:41, pp:340-344, 2012
  • Bradski G., Kaehler A. 2008. Learning OpenCV, O’Reilly Media Inc., USA .
  • Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
  • Görüntü İşleme Teknikleri ile 3B Yüz Tanıma , Maltepe Üniversitesi Fen Bilimleri Enstitüsü, 2019
  • Romero M., Pears N., “Landmark localisation in 3D face data”, 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, s.73–78, 2009.
  • Garcia-Garcia, A., Orts, S., Oprea, S., Villena Martinez, V., Martinez-Gonzalez, P., & Rodríguez, J., A Survey on Deep Learning Techniques for Image and Video Semantic Segmentation, Applied Soft Computing, 70, 41-65, 2018.
  • Tumor detection in MR images of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University 34:3 (2019) 1395-1408
  • M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 1, pp. 28–43, 2012.
  • A. Gudi, H. E. Tasli, T. M. Den Uyl, and A. Maroulis, “Deep Learning based FACS Action Unit Occurrence and Intensity Estimation,” vol. 2013, 2015.
  • P. Khorrami, T. L. Paine, K. Brady, C. Dagli, and T. S. Huang, “How Deep Neural Networks Can Improve Emotion Recognition on Video Data,” pp. 1–5, 2016.
  • H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H. Kim, "Facial Emotion Recognition Using an Ensemble of MultiLevel Convolutional Neural Networks," International Journal of Pattern Recognition and Artificial Intelligence, 2018
  • T. Cao and M. Li, "Facial Expression Recognition Algorithm Based on the Combination of CNN and K-Means," presented at the Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 2019.
  • Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
  • M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis, and machine vision. Cengage Learning”, 2014
  • Mesut Pişkin, https://github.com/mesutpiskin/computer-vision-guide/tree/master/code/yuz-tanima/python/dnn_yuz_tespiti
  • Emotion recognition and reaction prediction in videos, (Konferans Tarihi:3-5 Kasım.2017), (DOI:10.1109/ICRCICN.2017.8234476), INSPEC Erişim No: 17467617
There are 19 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Metin Turan This is me 0000-0002-1941-6693

Emre Arığ This is me 0000-0003-3645-8761

Publication Date February 28, 2021
Published in Issue Year 2021 Issue: Ejosat Ek Özel Sayı (HORA)

Cite

APA Turan, M., & Arığ, E. (2021). Video Duygu Analizi. Avrupa Bilim Ve Teknoloji Dergisi(Ejosat Ek Özel Sayı (HORA), 34-41. https://doi.org/10.31590/ejosat.1115837