Research Article
BibTex RIS Cite

A Literature Review on Emotion Recognition in Speech

Year 2023, Volume: 03 Issue: 02, 46 - 52, 31.12.2023

Abstract

In our age, we are bombarded with multimedia content daily. Although, face-to-face communication always outgrows the potential factors of healthy assessment of our peers through recorded content or live media interaction, (be it text, video, images, speech) new approaches to render us able to understand and discern between emotions of our peers on multimedia content are getting more and more popular and more complex. Two robust topics in this regard are generally named as sentiment analysis and emotion detection. The advent and exponential growth of social networks and for instance, the employment of speech bots have made it a necessity to particularly address the problem of healthy emotion recognition outside face-to-face, everyday conversations or interactions. Machines’ capability to perform the set of tasks through Machine Learning approaches, namely consisting of detecting, expressing and understanding emotions is collectively known as, as in humans, emotional intelligence. Different modes of input as human behaviour like those taken from audio, image, video sources and signal interpretations processed through Electro-encephalography (EEG), related brain wave measurements are used in emotion recognition. My study aim is intended to be the examination and review of recent study approaches in Emotion Detection in Speech, possibly establishing links or differences between recent study publishes because each study paper focuses on a single or set of Machine Learning approaches which are employed in Emotion Detection in Speech. This paper tries to examine various relevant researches involving methods of Machine Learning which were studied and tested under these researches respective to Speech Emotion Recognition (SER). Effectiveness of the involved methods and databases are discussed while commenting on the studies and expressed in the form of their findings. Improvements throughout these studies are, though not chronologically, compared using simple tables which show independent accuracies of several Machine Learning classifier combinations.

References

  • “Survey on Human Emotion Recognition: Speech Database, Features and Classification” – Y.B. Singh, S. Goel [1]
  • “Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition”, International Journal of Speech Technology, vol. 21, no. 1, pp. 167–183, 2018. – S.G. Koolagudi, Y.V.S. Murthy and S.P. Bhaskar [2]
  • “Speech emotion recognition based on hmm and SVM”, In proceedings of International Conference on Machine Learning and Cybernetics, pp. 4898–4901, 2005. Y.L. Lin and G. Wei [3]
  • “A novel speech emotion recognition method via incomplete sparse least square regression”, IEEE Signal Processing Letters, vol. 21, no. 5, pp. 569–572, 2014. – W. Zheng, M. Xin, X. Wang and B. Wang [4]
  • “Speech emotion recognition using deep neural network and extreme learning machine”, In proceedings of Fifteenth Annual Conference of the International Speech Communication Association, pp. 223- 227, 2014. – K. Han, D. Yu and I. Tashev [5]
  • “Speech emotion recognition”, InternationalJournal of Soft Computing and Engineering, vol. 2, no. 1, pp. 235-238, 2012. – A.B. Ingale, D.S. Chaudhari [6]
  • “Automatic emotion recognition using prosodic parameters”, INTERSPEECH, pp. 493–496, 2005. – I. Luengo, E. Navas, I. Hernáez, J. Sánchez [7]
  • “Emotion recognition using LP residual”, In proceedings of IEEE Students’ Technology Symposium (TechSym), pp. 255– 261, 2010. – A. Chauhan, S. G. Koolagudi, S. Kafley and K. S. Rao [8]
  • “Emotion recognition from speech signal using epoch parameters”, In proceedings of IEEE International Conference on Signal Processing and Communications, pp. 1–5, 2010. – S. G. Koolagudi, R. Reddy and K. S. Rao [9]
  • “EmoVoice-A framework for online recognition of emotions from voice”, Perception in multimodal dialogue systems, pp. 188–199, Springer, 2008. – T. Vogt, E. André and N. Bee [10]
  • “Stress and emotion classification using jitter and shimmer features”, In proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. IV, pp. 1081-1084, 2007. – X. Li, J. Tao, M. T. Johnson, J. Soltis, A. Savage, K. M. Leong and J. D. Newman [11]
  • “Emotion recognition in spontaneous speech using GMMs”, INTERSPEECH, pp. 1-4, 2006. – D. Neiberg, K. Elenius and K. Laskowski [12]
  • “Speech emotion recognition based on rough set and SVM”, In proceedings of Fifth IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61, 2006. – J. Zhou, G. Wang, Y. Yang and P. Chen [13]
  • “Automatic speech emotion recognition using modulation spectral features”, Speech communication, vol. 53, no. 5, 768–785, 2011. – S. Wu, T. H. Falk, and W. Y. Chan [14]
  • “Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles”, In proceedings ofNinth European Conference on Speech Communication and Technology, pp.1-4, 2005. – B. Schuller, R. Müller, M. Lang and G. Rigoll [15]
  • “Comparison between fuzzy and NN method for speech emotion recognition”, In proceedings of Third IEEE International Conference on Information Technology and Applications, pp. 297–302, 2005. – A. A. Razak, R. Komiya, M. Izani and Z. Abidin [16]
  • “A neural network approach for human emotion recognition in speech”, In proceedings of IEEE International Symposium on Circuits and Systems, vol. II, pp. 181-184, 2004. – M. W. Bhatti, Y. Wang and L. Guan [17]
  • “Emotion recognition based on phoneme classes”, INTERSPEECH, pp. 205–211, 2004. – C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee and S. Narayanan [18]
  • “Hidden markovmodelbased speech emotion recognition”, In proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 1-4, 2003. – B. Schuller, G. Rigoll and M. Lang [19]
  • “Speech based emotion classification”, In proceedings of IEEE region 10 International Conference on Electrical and Electronic Technology, pp. 297–301, 2001. – T. L. Nwe, F. S. Wei and L. C. De Silva [20]
  • “Speech emotion recognition using hidden markov models”, INTERSPEECH, pp. 2679–2682,2001. – A. Nogueiras, A. Moreno, A. Bonafonte and J.B. Mariño, J. B. [21]

Konuşmada Duygu Tespiti Üzerine Bir Literatür Taraması

Year 2023, Volume: 03 Issue: 02, 46 - 52, 31.12.2023

Abstract

Çağımızda her gün multimedya içerik bombardımanına tutuluyoruz. Yüz yüze iletişim; kayıtlı içerik veya canlı medya etkileşimi (metin, video, görüntü, konuşma) yoluyla akranlarımızın sağlıklı değerlendirilmesinin potansiyel faktörlerini her zaman aşsa da, duyguları anlamamızı ve ayırt etmemizi sağlayan yeni yaklaşımlar multimedya içeriği konusunda giderek daha popüler ve daha karmaşık hale geliyor. Bu konu dahilinde iki güçlü başlık genel olarak duygu analizi ve duygu tespiti olarak adlandırılmaktadır. Sosyal ağların gelişi ve katlanarak büyümesi ve örneğin konuşma botlarının kullanılması, özellikle yüz yüze, günlük konuşmalar veya etkileşimler dışında sağlıklı duygu tanıma sorununun ele alınmasını bir gereklilik haline getirdi. Makinelerin, duyguları algılama, ifade etme ve anlamadan oluşan, Makine Öğrenmesi yaklaşımları aracılığıyla bir dizi görevi yerine getirme yeteneği, insanlarda olduğu gibi topluca duygusal zeka olarak bilinir. Duygu tanımada ses, görüntü, video kaynakları ve Elektro-ensefalografi (EEG) ile işlenen sinyal yorumlarından alınan insan davranışının farklı girdi modları, ilgili beyin dalgası ölçümleri kullanılmaktadır. Çalışma amacım, Konuşmada Duygu Algılama alanındaki son çalışma yaklaşımlarının incelenmesi ve gözden geçirilmesi, muhtemelen son çalışma yayınları arasında bağlantılar veya farklılıklar kurulmasıdır çünkü her çalışma makalesi Konuşmada Duygu Algılamada kullanılan tek bir veya bir dizi Makine Öğrenimi yaklaşımına odaklanır. Bu makale, Konuşma Duygusu Tanıma (SER) ile ilgili bu çalışmalar kapsamında incelenen ve test edilen Makine Öğrenmesi yöntemlerini içeren çeşitli ilgili araştırmaları gözden geçirmektedir. Çalışmalar hakkında yorum yapılırken ilgili yöntemlerin ve veri tabanlarının etkinliği tartışılmaktadır ve bulguları ifade edilmektedir. Bu çalışmalar boyunca yapılan iyileştirmeler, kronolojik olarak olmasa da, çeşitli Makine Öğrenmesi sınıflandırıcı kombinasyonlarının bağımsız doğruluklarını gösteren basit tablolar kullanılarak karşılaştırılmıştır.

Supporting Institution

Ankara Bilim Üniversitesi

Thanks

Prof. Dr. Nergiz Çağıltay'a çalışmama motivasyon katkısı sağladığı için teşekkür ederim.

References

  • “Survey on Human Emotion Recognition: Speech Database, Features and Classification” – Y.B. Singh, S. Goel [1]
  • “Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition”, International Journal of Speech Technology, vol. 21, no. 1, pp. 167–183, 2018. – S.G. Koolagudi, Y.V.S. Murthy and S.P. Bhaskar [2]
  • “Speech emotion recognition based on hmm and SVM”, In proceedings of International Conference on Machine Learning and Cybernetics, pp. 4898–4901, 2005. Y.L. Lin and G. Wei [3]
  • “A novel speech emotion recognition method via incomplete sparse least square regression”, IEEE Signal Processing Letters, vol. 21, no. 5, pp. 569–572, 2014. – W. Zheng, M. Xin, X. Wang and B. Wang [4]
  • “Speech emotion recognition using deep neural network and extreme learning machine”, In proceedings of Fifteenth Annual Conference of the International Speech Communication Association, pp. 223- 227, 2014. – K. Han, D. Yu and I. Tashev [5]
  • “Speech emotion recognition”, InternationalJournal of Soft Computing and Engineering, vol. 2, no. 1, pp. 235-238, 2012. – A.B. Ingale, D.S. Chaudhari [6]
  • “Automatic emotion recognition using prosodic parameters”, INTERSPEECH, pp. 493–496, 2005. – I. Luengo, E. Navas, I. Hernáez, J. Sánchez [7]
  • “Emotion recognition using LP residual”, In proceedings of IEEE Students’ Technology Symposium (TechSym), pp. 255– 261, 2010. – A. Chauhan, S. G. Koolagudi, S. Kafley and K. S. Rao [8]
  • “Emotion recognition from speech signal using epoch parameters”, In proceedings of IEEE International Conference on Signal Processing and Communications, pp. 1–5, 2010. – S. G. Koolagudi, R. Reddy and K. S. Rao [9]
  • “EmoVoice-A framework for online recognition of emotions from voice”, Perception in multimodal dialogue systems, pp. 188–199, Springer, 2008. – T. Vogt, E. André and N. Bee [10]
  • “Stress and emotion classification using jitter and shimmer features”, In proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. IV, pp. 1081-1084, 2007. – X. Li, J. Tao, M. T. Johnson, J. Soltis, A. Savage, K. M. Leong and J. D. Newman [11]
  • “Emotion recognition in spontaneous speech using GMMs”, INTERSPEECH, pp. 1-4, 2006. – D. Neiberg, K. Elenius and K. Laskowski [12]
  • “Speech emotion recognition based on rough set and SVM”, In proceedings of Fifth IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61, 2006. – J. Zhou, G. Wang, Y. Yang and P. Chen [13]
  • “Automatic speech emotion recognition using modulation spectral features”, Speech communication, vol. 53, no. 5, 768–785, 2011. – S. Wu, T. H. Falk, and W. Y. Chan [14]
  • “Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles”, In proceedings ofNinth European Conference on Speech Communication and Technology, pp.1-4, 2005. – B. Schuller, R. Müller, M. Lang and G. Rigoll [15]
  • “Comparison between fuzzy and NN method for speech emotion recognition”, In proceedings of Third IEEE International Conference on Information Technology and Applications, pp. 297–302, 2005. – A. A. Razak, R. Komiya, M. Izani and Z. Abidin [16]
  • “A neural network approach for human emotion recognition in speech”, In proceedings of IEEE International Symposium on Circuits and Systems, vol. II, pp. 181-184, 2004. – M. W. Bhatti, Y. Wang and L. Guan [17]
  • “Emotion recognition based on phoneme classes”, INTERSPEECH, pp. 205–211, 2004. – C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee and S. Narayanan [18]
  • “Hidden markovmodelbased speech emotion recognition”, In proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 1-4, 2003. – B. Schuller, G. Rigoll and M. Lang [19]
  • “Speech based emotion classification”, In proceedings of IEEE region 10 International Conference on Electrical and Electronic Technology, pp. 297–301, 2001. – T. L. Nwe, F. S. Wei and L. C. De Silva [20]
  • “Speech emotion recognition using hidden markov models”, INTERSPEECH, pp. 2679–2682,2001. – A. Nogueiras, A. Moreno, A. Bonafonte and J.B. Mariño, J. B. [21]
There are 21 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Research Article
Authors

Ömer Çağrı Dala 0000-0001-8202-7802

Publication Date December 31, 2023
Published in Issue Year 2023 Volume: 03 Issue: 02

Cite

IEEE Ö. Ç. Dala, “A Literature Review on Emotion Recognition in Speech”, Researcher, vol. 03, no. 02, pp. 46–52, 2023.

The journal "Researcher: Social Sciences Studies" (RSSS), which started its publication life in 2013, continues its activities under the name of "Researcher" as of August 2020, under Ankara Bilim University.
It is an internationally indexed, nationally refereed, scientific and electronic journal that publishes original research articles aiming to contribute to the fields of Engineering and Science in 2021 and beyond.
The journal is published twice a year, except for special issues.
Candidate articles submitted for publication in the journal can be written in Turkish and English. Articles submitted to the journal must not have been previously published in another journal or sent to another journal for publication.