Music Emotion Recognition with Machine Learning Based on Audio Features

Mehmet Bilal Er; Emin Murat Esin

doi:10.53070/bbd.945894

Araştırma Makalesi

Music Emotion Recognition with Machine Learning Based on Audio Features

Yıl 2021, Cilt: 6 Sayı: 3, 133 - 144, 01.12.2021

Mehmet Bilal Er , Emin Murat Esin

https://doi.org/10.53070/bbd.945894

Cited By: 3

https://izlik.org/JA36LY79WF

Öz

Understanding the emotional impact of music on its audience is a common field of study in many disciplines such as science, psychology, musicology and art. In this study, a method based on acoustic features is proposed to predict the emotion of different samples from Turkish Music. The proposed method consists of 3 steps: preprocessing, feature extraction and classification on selected music pieces. As a first step, the noise in the signals is removed in the pre-process and all the signals in the data set are brought to the equal sampling frequency. In the second step, a 1x34 size feature vector is extracted from each signal, reflecting the emotional content of the music. The features are normalized before the classifiers are trained. In the last step, the data are classified using Support Vector Machines (SVM), K-Nearest Neighbor (K-NN) and Artificial Neural Network (ANN). Accuracy, precision, sensitivity and F-score are used as classification metrics. The model was tested on a new 4-class data set consisting of Turkish music data. 79.30% Accuracy, 78.77% sensitivity, 78.94% specificity and 79.03% F-score are obtained from the proposed model.

Anahtar Kelimeler

Music emotion recognition , Acoustic feature extraction , SVM , ANN , K-NN

Kaynakça

Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

Music Emotion Recognition with Machine Learning Based on Audio Features

Yıl 2021, Cilt: 6 Sayı: 3, 133 - 144, 01.12.2021

Mehmet Bilal Er , Emin Murat Esin

https://doi.org/10.53070/bbd.945894

Cited By: 3

https://izlik.org/JA36LY79WF

Öz

Müziğin dinleyicileri üzerindeki duygusal etkisini anlamak bilim, psikoloji, müzikoloji ve sanat gibi birçok disiplinin ortak çalışma alanıdır. Bu çalışmada, Türk Müziği'nden farklı örneklerin duygu durumunu tahmin etmek için akustik özelliklere dayalı bir yöntem önerilmiştir. Önerilen yöntem, seçilen müzik parçaları üzerinde ön işleme, özellik çıkarma ve sınıflandırma olmak üzere 3 adımdan oluşmaktadır. İlk adım olarak, ön işlemde sinyallerdeki gürültü giderilir ve veri setindeki tüm sinyaller eşit örnekleme frekansına getirilir. İkinci adımda, müziğin duygusal içeriğini yansıtan her sinyalden 1x34 boyutunda bir öznitelik vektörü çıkarılır. Sınıflandırıcılar eğitilmeden önce öznitelikler normalleştirilir. Son adımda, veriler Destek Vektör Makineleri (SVM), K-En Yakın Komşu (K-NN) ve Yapay Sinir Ağı (YSA) kullanılarak sınıflandırılır. Doğruluk, kesinlik, duyarlılık ve F-skoru sınıflandırma ölçütleri olarak kullanılır. Model, Türk müziği verilerinden oluşan 4 sınıflı yeni bir veri seti üzerinde test edildi. Önerilen modelden %79.30 Doğruluk, %78.77 duyarlılık, %78.94 özgüllük ve %79.03 F skoru elde edilmiştir.

Anahtar Kelimeler

Müzik duygu tanıma , Akustik özellik çıkarma , SVM , ANN , K-NN

Kaynakça

Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

Toplam 23 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Araştırma Makalesi
Yazarlar	Mehmet Bilal Er 0000-0002-2074-1776 Emin Murat Esin 0000-0001-7697-3579
Gönderilme Tarihi	3 Haziran 2021
Kabul Tarihi	7 Temmuz 2021
Yayımlanma Tarihi	1 Aralık 2021
DOI	https://doi.org/10.53070/bbd.945894
IZ	https://izlik.org/JA36LY79WF
Yayımlandığı Sayı	Yıl 2021 Cilt: 6 Sayı: 3

Kaynak Göster

APA	Er, M. B., & Esin, E. M. (2021). Music Emotion Recognition with Machine Learning Based on Audio Features. Computer Science, 6(3), 133-144. https://doi.org/10.53070/bbd.945894

Cited By

Application of deep learning-based ethnic music therapy for selecting repertoire

Journal of Intelligent & Fuzzy Systems

https://doi.org/10.3233/JIFS-230893

Research on Role Orientation and Situation Construction of Contextualized Music Performance in the Background of Artificial Intelligence

Applied Mathematics and Nonlinear Sciences

https://doi.org/10.2478/amns.2023.2.01559

Machine Learning and Deep Learning in Music Emotion Recognition: A Comprehensive Survey

International Journal of Mathematical, Engineering and Management Sciences

https://doi.org/10.33889/IJMEMS.2025.10.4.047

Makale Dosyaları

Tam Metin

The Creative Commons Attribution 4.0 International License is applied to all research papers published by JCS and

a Digital Object Identifier (DOI) is assigned for each published paper.