Music Emotion Recognition with Machine Learning Based on Audio Features

Mehmet Bilal Er; Emin Murat Esin

doi:10.53070/bbd.945894

Research Article

Music Emotion Recognition with Machine Learning Based on Audio Features

Year 2021, , 133 - 144, 01.12.2021

Mehmet Bilal Er , Emin Murat Esin

https://doi.org/10.53070/bbd.945894

Cited By: 2

Abstract

Understanding the emotional impact of music on its audience is a common field of study in many disciplines such as science, psychology, musicology and art. In this study, a method based on acoustic features is proposed to predict the emotion of different samples from Turkish Music. The proposed method consists of 3 steps: preprocessing, feature extraction and classification on selected music pieces. As a first step, the noise in the signals is removed in the pre-process and all the signals in the data set are brought to the equal sampling frequency. In the second step, a 1x34 size feature vector is extracted from each signal, reflecting the emotional content of the music. The features are normalized before the classifiers are trained. In the last step, the data are classified using Support Vector Machines (SVM), K-Nearest Neighbor (K-NN) and Artificial Neural Network (ANN). Accuracy, precision, sensitivity and F-score are used as classification metrics. The model was tested on a new 4-class data set consisting of Turkish music data. 79.30% Accuracy, 78.77% sensitivity, 78.94% specificity and 79.03% F-score are obtained from the proposed model.

Keywords

Music emotion recognition, Acoustic feature extraction, SVM, ANN, K-NN

References

Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

Music Emotion Recognition with Machine Learning Based on Audio Features

Year 2021, , 133 - 144, 01.12.2021

Mehmet Bilal Er , Emin Murat Esin

https://doi.org/10.53070/bbd.945894

Cited By: 2

Abstract

Müziğin dinleyicileri üzerindeki duygusal etkisini anlamak bilim, psikoloji, müzikoloji ve sanat gibi birçok disiplinin ortak çalışma alanıdır. Bu çalışmada, Türk Müziği'nden farklı örneklerin duygu durumunu tahmin etmek için akustik özelliklere dayalı bir yöntem önerilmiştir. Önerilen yöntem, seçilen müzik parçaları üzerinde ön işleme, özellik çıkarma ve sınıflandırma olmak üzere 3 adımdan oluşmaktadır. İlk adım olarak, ön işlemde sinyallerdeki gürültü giderilir ve veri setindeki tüm sinyaller eşit örnekleme frekansına getirilir. İkinci adımda, müziğin duygusal içeriğini yansıtan her sinyalden 1x34 boyutunda bir öznitelik vektörü çıkarılır. Sınıflandırıcılar eğitilmeden önce öznitelikler normalleştirilir. Son adımda, veriler Destek Vektör Makineleri (SVM), K-En Yakın Komşu (K-NN) ve Yapay Sinir Ağı (YSA) kullanılarak sınıflandırılır. Doğruluk, kesinlik, duyarlılık ve F-skoru sınıflandırma ölçütleri olarak kullanılır. Model, Türk müziği verilerinden oluşan 4 sınıflı yeni bir veri seti üzerinde test edildi. Önerilen modelden %79.30 Doğruluk, %78.77 duyarlılık, %78.94 özgüllük ve %79.03 F skoru elde edilmiştir.

Keywords

Müzik duygu tanıma, Akustik özellik çıkarma, SVM, ANN, K-NN

References

Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

There are 23 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	PAPERS
Authors	Mehmet Bilal Er 0000-0002-2074-1776 Emin Murat Esin 0000-0001-7697-3579
Publication Date	December 1, 2021
Submission Date	June 3, 2021
Acceptance Date	July 7, 2021
Published in Issue	Year 2021

Cite

APA	Er, M. B., & Esin, E. M. (2021). Music Emotion Recognition with Machine Learning Based on Audio Features. Computer Science, 6(3), 133-144. https://doi.org/10.53070/bbd.945894

Bilgisayar Bilimleri

Music Emotion Recognition with Machine Learning Based on Audio Features

Abstract

Keywords

References

Music Emotion Recognition with Machine Learning Based on Audio Features

Abstract

Keywords

References

Details

Cite

Cited By

Research on Role Orientation and Situation Construction of Contextualized Music Performance in the Background of Artificial Intelligence

Applied Mathematics and Nonlinear Sciences

https://doi.org/10.2478/amns.2023.2.01559

Application of deep learning-based ethnic music therapy for selecting repertoire

Journal of Intelligent & Fuzzy Systems

https://doi.org/10.3233/JIFS-230893