Ses Özniteliklerini Kullanan Ses Duygu Durum Sınıflandırma İçin Derin Öğrenme Tabanlı Bir Yazılımsal Araç

Emir Ali Kıvrak; Bahadir Karasulu; Can Sözbir; Atakan Türkay

Araştırma Makalesi

Ses Özniteliklerini Kullanan Ses Duygu Durum Sınıflandırma İçin Derin Öğrenme Tabanlı Bir Yazılımsal Araç

Yıl 2021, Cilt: 4 Sayı: 3, 14 - 27, 30.12.2021

Emir Ali Kıvrak , Bahadir Karasulu , Can Sözbir , Atakan Türkay

Öz

Ses duygu durum analizi için kullanıcı grafik arayüzü yardımıyla ses verilerini kullanarak ses duygu durumları herhangi bir kaynak kodu satırı yazmadan sınıflandıran derin öğrenme mimari modellerini oluşturan bir yazılımsal araç çalışmamızda tasarlanmıştır. Veri kümelerinin elde edilmesi, ses verilerine yönelik ses özniteliklerinin elde edilmesi, mimarinin oluşturulması ve derin öğrenme modelinin istenilen sinir ağı katmanları ve üstün parametreler ile modelin eğitilmesi sağlanmıştır. Model eğitilirken, eğitim değerlerinin gerçek zamanlı izlenmesi yazılımsal araç ile yapılabilmektedir. Çalışma boyunca, ilgili adımlar hem salt kaynak kodu düzenleme hem de yazılımsal araç kullanılarak gerçekleştirilmiştir. Kod düzenleme tabanlı melez model, mimarisinde uzun kısa süreli bellek ve evrişimli sinir ağları kullanılarak oluşturulmuş, %81,49 doğruluk oranına ulaşmıştır. Ayrıca, herhangi bir kodlama müdahalesi olmaksızın grafik yazılımsal araç tabanlı tekil model, mimarisinde evrişimli sinir ağı ile oluşturulmuştur. Böylece %75,76 doğruluk oranına ulaşmıştır. Yazılımsal aracın geliştirilmesindeki ana motivasyon, farklı ses duygu durumları sınıflandırmak için kullanılabilecek potansiyel bir derin öğrenme mimari modeli oluşturmaktır. Deneysel sonuçlar, yazılımsal aracın yüksek doğrulukla sınıflandırmayı oldukça başarılı bir şekilde gerçekleştirdiğini kanıtlamaktadır. Elde edilen sonuçlara dair tartışmaya da çalışmamızda yer verilmiştir.

Anahtar Kelimeler

Ses duygu analizi, duygu durum, derin öğrenme, yazılımsal araç

Kaynakça

Liu B. Sentiment Analysis and Opinion Mining. California, USA, Morgan Claypool Poblishers, 2012.
Neri F, Aliprandi C. Capeci F, Cuadros M, By T. “Sentiment Analysis on Social Media”. IEEE/ACM 2012 Internation Conferance on Advances in Social Networks Analysis and Mining, 919-926, 2012.
Agarwal B, Mittal N. “Machine Learning Approach for Sentiment Analysis”. Prominent feature extraction for sentiment analysis. Springer, Cham, 21-45 2016.
Aldeneh Z, Provost EM. “Using Regional Saliency for Speech Emotion Recognition”. IEEE Int'l Conferance Acoustics Speech and Signal Processing (ICASSP), 2741-2745, 2017.
Seehapoch T, Wongthanavasu S. “Speech Emotion Recognition Using Support Vector Machines”. International 5th conferance on Knowledge and Smart Technology (KST), 86-91, 2013.
Schuller B., Rigoll G, Lang M. "Hidden Markov Model-Based Speech Emotion Recognation“. IEEE 2th International Conferance on Acoustics, Speech, and Signal Processing, II-1, 2013.
Lee CC, Mower E, Busso C, Lee S, Narayanan S. “ Emotion Recognation Using a Hierarchial Binary Decision Tree Approach”. Speech Communication 55(9-10), 1162-1171, 2011.
Bertero D, Fung P. “First Look Into a Convolutional Neural Network For Speech Emotion Detection” Acoustics Speech and Signal Processing (ICASSP) 2017 IEEE Intl. Conference, 5115-5119, 2017.
Badshah AM, Jamil A, Rahim N, Baik SW. “Speech Emotion Recognition From Spectrograms With Deep Convolutional Neural Network”. IEEE Int'l Conference On Platform Technology And Service (Platcon), 1-5, 2017.
Yoon S, Byun S, Jung K. “Multimodal Speech Emotion Recognition Using Audio and Text”. IEEE Spoken Language Technology Workshop (SLT), 112-118, 2018.
Livingstone SR, Russo FA. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) A dynamic, multimodal set of facial and vocal expressions in North American English”. PIoS one, 13(5), e0196391, 2018.
Cao H, Copper DG, Keutmann MK, Gur RC, Nenkova A, Verma R. “CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset”. IEEE Transactions on Affective Computing, 5(4), 377-390, 2014.
Burkhardt F, Paescheke A, Rolfes M, Sendlmeier F, Weiss B. “A database of German emotional speech”. 9th European Conference on Speech Communication and Technology, 2005.
Haq S, Jackson PJB. “Speaker-Dependent Audio-Visual Emotion Recognition (SAVEE)”. AVSP, 53-58, 2009.
Google LLC. “Google Teachable Machine” https://teachablemachine.withgoogle.com/ (18.04.2021).
Dey N, Borra S, Ashour AS, Shi F. “Medical Images Analysis Based on Multilabel Classification”. Machine Learning in Bio-Signal Analysis and Diagnostic Imaging. Academic Press, Chap. 9, 2018.
Github. “Github Keras Repository”. https://github.com/fchollet/keras
Sundermeyer M, Schlüter R, Ney H. “LSTM neural networks for language modeling”. 13th Annual Conference Of The İnternational Speech Communication Association, 2012.
Srivastava N, Hingon G, Krizhevsky A, Sutskever I, Salakhutdinov R. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. The Journal of Machine Learning Research, 15(1), 1929-1958, 2014.
Ioffe S, Szegedy C. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. International conference on machine learning. PMLR, 448-456, 2015.
Salimans T. Kingma DP. “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks”. arXiv preprint arXiv1602.07868, 2016.
Le X, Wang Y, Jo J. “Combining Deep and Handcrafted Image Features for Vehicle Classification in Drone Imagery”. Digital Image Computing: Techniques and Applications (DICTA), IEEE, 1-6, 2018.
Rossum GV. Python Reference Manual. Amsterdam, Netherlands, Centrum voor Wiskunde en Informatica, 1995.
McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenbergk E, Nieto O. “Librosa: Audio and Music Signal Analysis İn Python”. Proceedings Of The 14th Python İn Science Conference, 18-25, 2015.
Grinberg M. Flask Web Development: Developing Web Applications with Python. O’Reilly Media INC., California, USA, 2018.
The Pallets Projects. “Werkzeug The Python WSGI Utility Library”. https://werkzeug.palletsprojects. com (18.04.2021).
The Pallets Projects. “Click”. www.palletprojects.com/p/click (18.04.2021).
The Pallets Projects. “Jinja”. www.palletprojects.com/p/jinja (18.04.2021).
Allen G, Owens M. The Definitive Guide to SQLite. Apress LP, New York, USA, 2010.
Copeland R. Essential SQLAlchemy. O’Reilly Media INC., California, USA, 2008.
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Zheng X, Google Brain. “Combining Deep and Handcrafted Image Features for Vehicle Classification in Drone Imagery”. 12th Symp. On Operating Systems Design And Implementation, 2016
Start Bootstrap, “SB Admin No 2”. https://startbootstrap.com/theme/sb-admin-2 (18.04.2021).
Bewick V, Liz C, Ball J. “Statistics review 13: Receiver operating characteristic curves”. Critical Care, 8(6) , 1-5, 2004.
TowardsDataScience, "Confusion Matrix for Your Multi-Class Machine Learning Model". https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826 (29.08.2021).
Ses duygu durum analizi yazılımsal araç (DepSemo).https://github.com/CanakkaleDevelopers/audio-sentiment-analysis-deep-learning-tool (29.08.2021).

A Deep Learning based Software Tool for Audio Emotional State Classification using Audio Features

Yıl 2021, Cilt: 4 Sayı: 3, 14 - 27, 30.12.2021

Emir Ali Kıvrak , Bahadir Karasulu , Can Sözbir , Atakan Türkay

Öz

For audio emotional state analysis, a software tool was designed in our study that build deep learning architectural models that classify audio emotional states using audio data with the help of the user graphical interface without writing any line of source codes. Obtaining the desired data sets and audio features for audio data, creating the architecture and training the model with the desired neural network layers and hyperparameters of deep learning model were provided. While the model is being trained, real-time monitoring of training values can be performed over the software tool. Throughout the study, the relevant steps were carried out using both pure source code editing and software tool. The code editing based hybrid model built with long short-term memory and convolutional neural networks in its architecture that achieved an accuracy rate of 81.49%. In addition, the graphical software tool based standalone model without any coding intervention was built with convolutional neural network in its architecture. Thence, it achieved 75.76% accuracy rate. The main motivation in the development of software tool is to build a potential deep learning architectural model that can be used to classify different audio emotional states. Experimental results prove that the software tool performs classification with high accuracy quite successfully. The discussion on the results obtained is included in our study.

Anahtar Kelimeler

Audio emotion analysis, emotional state, deep learning, software tool

Kaynakça

Liu B. Sentiment Analysis and Opinion Mining. California, USA, Morgan Claypool Poblishers, 2012.
Neri F, Aliprandi C. Capeci F, Cuadros M, By T. “Sentiment Analysis on Social Media”. IEEE/ACM 2012 Internation Conferance on Advances in Social Networks Analysis and Mining, 919-926, 2012.
Agarwal B, Mittal N. “Machine Learning Approach for Sentiment Analysis”. Prominent feature extraction for sentiment analysis. Springer, Cham, 21-45 2016.
Aldeneh Z, Provost EM. “Using Regional Saliency for Speech Emotion Recognition”. IEEE Int'l Conferance Acoustics Speech and Signal Processing (ICASSP), 2741-2745, 2017.
Seehapoch T, Wongthanavasu S. “Speech Emotion Recognition Using Support Vector Machines”. International 5th conferance on Knowledge and Smart Technology (KST), 86-91, 2013.
Schuller B., Rigoll G, Lang M. "Hidden Markov Model-Based Speech Emotion Recognation“. IEEE 2th International Conferance on Acoustics, Speech, and Signal Processing, II-1, 2013.
Lee CC, Mower E, Busso C, Lee S, Narayanan S. “ Emotion Recognation Using a Hierarchial Binary Decision Tree Approach”. Speech Communication 55(9-10), 1162-1171, 2011.
Bertero D, Fung P. “First Look Into a Convolutional Neural Network For Speech Emotion Detection” Acoustics Speech and Signal Processing (ICASSP) 2017 IEEE Intl. Conference, 5115-5119, 2017.
Badshah AM, Jamil A, Rahim N, Baik SW. “Speech Emotion Recognition From Spectrograms With Deep Convolutional Neural Network”. IEEE Int'l Conference On Platform Technology And Service (Platcon), 1-5, 2017.
Yoon S, Byun S, Jung K. “Multimodal Speech Emotion Recognition Using Audio and Text”. IEEE Spoken Language Technology Workshop (SLT), 112-118, 2018.
Livingstone SR, Russo FA. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) A dynamic, multimodal set of facial and vocal expressions in North American English”. PIoS one, 13(5), e0196391, 2018.
Cao H, Copper DG, Keutmann MK, Gur RC, Nenkova A, Verma R. “CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset”. IEEE Transactions on Affective Computing, 5(4), 377-390, 2014.
Burkhardt F, Paescheke A, Rolfes M, Sendlmeier F, Weiss B. “A database of German emotional speech”. 9th European Conference on Speech Communication and Technology, 2005.
Haq S, Jackson PJB. “Speaker-Dependent Audio-Visual Emotion Recognition (SAVEE)”. AVSP, 53-58, 2009.
Google LLC. “Google Teachable Machine” https://teachablemachine.withgoogle.com/ (18.04.2021).
Dey N, Borra S, Ashour AS, Shi F. “Medical Images Analysis Based on Multilabel Classification”. Machine Learning in Bio-Signal Analysis and Diagnostic Imaging. Academic Press, Chap. 9, 2018.
Github. “Github Keras Repository”. https://github.com/fchollet/keras
Sundermeyer M, Schlüter R, Ney H. “LSTM neural networks for language modeling”. 13th Annual Conference Of The İnternational Speech Communication Association, 2012.
Srivastava N, Hingon G, Krizhevsky A, Sutskever I, Salakhutdinov R. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. The Journal of Machine Learning Research, 15(1), 1929-1958, 2014.
Ioffe S, Szegedy C. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. International conference on machine learning. PMLR, 448-456, 2015.
Salimans T. Kingma DP. “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks”. arXiv preprint arXiv1602.07868, 2016.
Le X, Wang Y, Jo J. “Combining Deep and Handcrafted Image Features for Vehicle Classification in Drone Imagery”. Digital Image Computing: Techniques and Applications (DICTA), IEEE, 1-6, 2018.
Rossum GV. Python Reference Manual. Amsterdam, Netherlands, Centrum voor Wiskunde en Informatica, 1995.
McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenbergk E, Nieto O. “Librosa: Audio and Music Signal Analysis İn Python”. Proceedings Of The 14th Python İn Science Conference, 18-25, 2015.
Grinberg M. Flask Web Development: Developing Web Applications with Python. O’Reilly Media INC., California, USA, 2018.
The Pallets Projects. “Werkzeug The Python WSGI Utility Library”. https://werkzeug.palletsprojects. com (18.04.2021).
The Pallets Projects. “Click”. www.palletprojects.com/p/click (18.04.2021).
The Pallets Projects. “Jinja”. www.palletprojects.com/p/jinja (18.04.2021).
Allen G, Owens M. The Definitive Guide to SQLite. Apress LP, New York, USA, 2010.
Copeland R. Essential SQLAlchemy. O’Reilly Media INC., California, USA, 2008.
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Zheng X, Google Brain. “Combining Deep and Handcrafted Image Features for Vehicle Classification in Drone Imagery”. 12th Symp. On Operating Systems Design And Implementation, 2016
Start Bootstrap, “SB Admin No 2”. https://startbootstrap.com/theme/sb-admin-2 (18.04.2021).
Bewick V, Liz C, Ball J. “Statistics review 13: Receiver operating characteristic curves”. Critical Care, 8(6) , 1-5, 2004.
TowardsDataScience, "Confusion Matrix for Your Multi-Class Machine Learning Model". https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826 (29.08.2021).
Ses duygu durum analizi yazılımsal araç (DepSemo).https://github.com/CanakkaleDevelopers/audio-sentiment-analysis-deep-learning-tool (29.08.2021).

Toplam 35 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Emir Ali Kıvrak Bahadir Karasulu 0000-0001-8524-874X Can Sözbir Atakan Türkay
Yayımlanma Tarihi	30 Aralık 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 4 Sayı: 3

Kaynak Göster

APA	Kıvrak, E. A., Karasulu, B., Sözbir, C., Türkay, A. (2021). Ses Özniteliklerini Kullanan Ses Duygu Durum Sınıflandırma İçin Derin Öğrenme Tabanlı Bir Yazılımsal Araç. Veri Bilimi, 4(3), 14-27.