Video Duygu Analizi

Emre Arığ; Metin Turan

doi:10.31590/ejosat.1115837

Araştırma Makalesi

Video Duygu Analizi

Yıl 2021, Sayı: Ejosat Ek Özel Sayı (HORA), 34 - 41, 28.02.2021

Emre Arığ Metin Turan

https://doi.org/10.31590/ejosat.1115837

Öz

Bu çalışmada, videodaki görüntülerden tespit edilen insan yüzleri üzerinde CNN derin öğrenme modeli ile duygu analizi yapılmıştır. Bu analize ait sonuçlar saniye saniye kayıt edilerek bir duygu analizi grafiği çıkarılmıştır.
Çalışma 3 ana safhadan oluşmaktadır. İlki CNN modeli için gerekli duygu yüklü görsellerin bulunup etiketlenmesi, ikincisi duygu analizi yapabilecek bir CNN derin öğrenme modelinin oluşturulması ve üçüncüsü de videolardan yüz görüntülerinin tespit edilmesidir.
Eğitim veri seti oluşturmak amacıyla, seçilen 61 adet filmden binlerce yüz fotoğrafı analiz edilmiştir. Bunların arasında Bay Evet, Karabasan, Yaralı Yüz, Yedi Yaşam gibi farklı duyguların ağırlıklı olduğu filmler bulunmaktadır. İlk olarak 7 duygu türü için yüzler toplanmıştır. Bu duygular bıkkınlık, korku, mutluluk, sakinlik, şaşkınlık, sinirlilik ve üzgünlüktür. Yüz tespiti kısmında Haarcascade tekniği kullanılmıştır. Tespit edilen yüzlerin duygulara göre etiketlenmesinde, Amazon webservisi olan Face Recognition’dan yardım alınmıştır. Çalışmada, 50 bin civarı yüz örneklemi elde edilmiştir. Ancak daha sonra yapılan kontrollerde Haarcascade ile bulunmuş görüntüler arasında yüz olmayan birçok görsel tespit edilerek çıkarılmıştır. Ayrıca, Amazon web servisinden dönen duygu analizlerinde %40 civarında yanlış duygu tespiti olduğu belirlenerek, eğitim veri setinden çıkarılmıştır. Tüm veri seti temizleme çalışmaları sonucunda 7 duygu için etiketlenmiş 20 bin fotoğraf elde edilmiştir.
Derin öğrenme sonucu, yapılan sınamalarda en çok karıştırılan 4 duygudan 2’sinin bıkkınlık ve şaşkınlık olduğu gözlemlenmiştir. Bıkkınlık sakinlikle, şaşkınlık ise korku yüz ifadeleri ile karışmaktadır. Elimizde kalan 5 duygu ile yapılan analizde, önerilen model ile %60’lık doğruluk değerine ulaşılmıştır. Videodan yüzleri çıkarıp modele gönderen ve bu sonuçlar ile bir duygu analizi grafiği çıkaran yazılımda, yüz tespitinin daha doğru olması için gerçek zamanlı analizde Haarcascade yöntemi yerine bir DNN modeli kullanılmıştır.

Anahtar Kelimeler

Duygu Analizi , Derin Öğrenme , Video İşleme , Yapay Sinir Ağları

Kaynakça

OpenCV. 2017. http://www.opencv.org (Erişim Tarihi: 8.7.2017).
D. C. Cirean, U. Meier, J. Masci, and L. M. Gambardella, Flexible, High Performance Convolutional Neural Networks for Image Classification, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 1237–1242, 2012.
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol: 1, pp.:I-511I518, 2001
R. Hussin, M.R. Juhari, N.W. Kang, R.C. Ismail and A. Kamarudin, “Digital image processing techniques for object detection from complex background image”. Procedia Engineering, vol:41, pp:340-344, 2012
Bradski G., Kaehler A. 2008. Learning OpenCV, O’Reilly Media Inc., USA .
Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
Görüntü İşleme Teknikleri ile 3B Yüz Tanıma , Maltepe Üniversitesi Fen Bilimleri Enstitüsü, 2019
Romero M., Pears N., “Landmark localisation in 3D face data”, 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, s.73–78, 2009.
Garcia-Garcia, A., Orts, S., Oprea, S., Villena Martinez, V., Martinez-Gonzalez, P., & Rodríguez, J., A Survey on Deep Learning Techniques for Image and Video Semantic Segmentation, Applied Soft Computing, 70, 41-65, 2018.
Tumor detection in MR images of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University 34:3 (2019) 1395-1408
M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 1, pp. 28–43, 2012.
A. Gudi, H. E. Tasli, T. M. Den Uyl, and A. Maroulis, “Deep Learning based FACS Action Unit Occurrence and Intensity Estimation,” vol. 2013, 2015.
P. Khorrami, T. L. Paine, K. Brady, C. Dagli, and T. S. Huang, “How Deep Neural Networks Can Improve Emotion Recognition on Video Data,” pp. 1–5, 2016.
H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H. Kim, "Facial Emotion Recognition Using an Ensemble of MultiLevel Convolutional Neural Networks," International Journal of Pattern Recognition and Artificial Intelligence, 2018
T. Cao and M. Li, "Facial Expression Recognition Algorithm Based on the Combination of CNN and K-Means," presented at the Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 2019.
Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis, and machine vision. Cengage Learning”, 2014
Mesut Pişkin, https://github.com/mesutpiskin/computer-vision-guide/tree/master/code/yuz-tanima/python/dnn_yuz_tespiti
Emotion recognition and reaction prediction in videos, (Konferans Tarihi:3-5 Kasım.2017), (DOI:10.1109/ICRCICN.2017.8234476), INSPEC Erişim No: 17467617

Video Emotion Analysis

Yıl 2021, Sayı: Ejosat Ek Özel Sayı (HORA), 34 - 41, 28.02.2021

Emre Arığ Metin Turan

https://doi.org/10.31590/ejosat.1115837

Öz

In this study, emotional analysis was carried out with the CNN deep learning model on the human faces detected from the images in the video. The results of this analysis were recorded in seconds and an emotion analysis graph was created.
The study consists of 3 main stages. The first is to find and label the emotional images required for the CNN model, the second is to create a CNN deep learning model that can conduct emotion analysis, and the third is to identify the facial images from the videos.
In order to create a training data set, thousands of photographs from 61 selected films were analyzed. These include films with different feelings such as Yes Man, The Babadook, Scarface, Seven Pounds. First, faces were collected for 7 types of emotions. These feelings are boredom, fear, happiness, calmness, confusion, irritability and sadness. Haarcascade technique is used in the face detection section. Assistance was received from the Amazon Face Recognition web service for tagging detected faces according to emotions. In the study, about 50 thousand face samples were obtained. However, in the subsequent controls, many non-facial images were detected and removed from the images found with Haarcascade. In addition, approximately 40% false emotion detection was determined in the emotional analysis returned from the Amazon web service and removed from the training data set. As a result of the clearing of the whole data set, 20 thousand photos were tagged for 7 emotions.

As a result of the deep learning, it was observed that 2 of the 4 emotions most confused during the tests were boredom and confusion. Boredom is confused with calmness, and confusion with fear facial expressions. In the 5 emotion analysis that we have, 60% accuracy value has been reached with the proposed model. In the software that extracts faces from the video and sends them to the model and displays an emotional analysis graph with these results, a DNN model is used instead of Haarcascade method in real-time analysis to make the face detection more accurate.

Anahtar Kelimeler

Emotion Analysis , Deep Learning , Video Processing , Artificial Neural Networks

Kaynakça

OpenCV. 2017. http://www.opencv.org (Erişim Tarihi: 8.7.2017).
D. C. Cirean, U. Meier, J. Masci, and L. M. Gambardella, Flexible, High Performance Convolutional Neural Networks for Image Classification, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 1237–1242, 2012.
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol: 1, pp.:I-511I518, 2001
R. Hussin, M.R. Juhari, N.W. Kang, R.C. Ismail and A. Kamarudin, “Digital image processing techniques for object detection from complex background image”. Procedia Engineering, vol:41, pp:340-344, 2012
Bradski G., Kaehler A. 2008. Learning OpenCV, O’Reilly Media Inc., USA .
Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
Görüntü İşleme Teknikleri ile 3B Yüz Tanıma , Maltepe Üniversitesi Fen Bilimleri Enstitüsü, 2019
Romero M., Pears N., “Landmark localisation in 3D face data”, 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, s.73–78, 2009.
Garcia-Garcia, A., Orts, S., Oprea, S., Villena Martinez, V., Martinez-Gonzalez, P., & Rodríguez, J., A Survey on Deep Learning Techniques for Image and Video Semantic Segmentation, Applied Soft Computing, 70, 41-65, 2018.
Tumor detection in MR images of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University 34:3 (2019) 1395-1408
M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 1, pp. 28–43, 2012.
A. Gudi, H. E. Tasli, T. M. Den Uyl, and A. Maroulis, “Deep Learning based FACS Action Unit Occurrence and Intensity Estimation,” vol. 2013, 2015.
P. Khorrami, T. L. Paine, K. Brady, C. Dagli, and T. S. Huang, “How Deep Neural Networks Can Improve Emotion Recognition on Video Data,” pp. 1–5, 2016.
H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H. Kim, "Facial Emotion Recognition Using an Ensemble of MultiLevel Convolutional Neural Networks," International Journal of Pattern Recognition and Artificial Intelligence, 2018
T. Cao and M. Li, "Facial Expression Recognition Algorithm Based on the Combination of CNN and K-Means," presented at the Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 2019.
Tuncer Ergin, Convolutional Neural Network (ConvNet yada CNN) nedir, nasıl çalışır?, https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad
M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis, and machine vision. Cengage Learning”, 2014
Mesut Pişkin, https://github.com/mesutpiskin/computer-vision-guide/tree/master/code/yuz-tanima/python/dnn_yuz_tespiti
Emotion recognition and reaction prediction in videos, (Konferans Tarihi:3-5 Kasım.2017), (DOI:10.1109/ICRCICN.2017.8234476), INSPEC Erişim No: 17467617

Toplam 19 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Emre Arığ Bu kişi benim 0000-0003-3645-8761 Metin Turan Bu kişi benim 0000-0002-1941-6693
Yayımlanma Tarihi	28 Şubat 2021
Yayımlandığı Sayı	Yıl 2021 Sayı: Ejosat Ek Özel Sayı (HORA)

Kaynak Göster

APA	Arığ, E., & Turan, M. (2021). Video Duygu Analizi. Avrupa Bilim ve Teknoloji Dergisi(Ejosat Ek Özel Sayı (HORA), 34-41. https://doi.org/10.31590/ejosat.1115837

Kapak Resmi İndir

Makale Dosyaları

Tam Metin