Research Article
BibTex RIS Cite

Music Emotion Recognition with Machine Learning Based on Audio Features

Year 2021, , 133 - 144, 01.12.2021
https://doi.org/10.53070/bbd.945894

Abstract

Understanding the emotional impact of music on its audience is a common field of study in many disciplines such as science, psychology, musicology and art. In this study, a method based on acoustic features is proposed to predict the emotion of different samples from Turkish Music. The proposed method consists of 3 steps: preprocessing, feature extraction and classification on selected music pieces. As a first step, the noise in the signals is removed in the pre-process and all the signals in the data set are brought to the equal sampling frequency. In the second step, a 1x34 size feature vector is extracted from each signal, reflecting the emotional content of the music. The features are normalized before the classifiers are trained. In the last step, the data are classified using Support Vector Machines (SVM), K-Nearest Neighbor (K-NN) and Artificial Neural Network (ANN). Accuracy, precision, sensitivity and F-score are used as classification metrics. The model was tested on a new 4-class data set consisting of Turkish music data. 79.30% Accuracy, 78.77% sensitivity, 78.94% specificity and 79.03% F-score are obtained from the proposed model.

References

  • Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
  • Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
  • Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
  • Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
  • Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
  • Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
  • Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
  • Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
  • Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
  • Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
  • Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
  • Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
  • On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
  • Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
  • Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
  • Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
  • Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
  • Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
  • Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
  • Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
  • Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
  • Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
  • Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

Music Emotion Recognition with Machine Learning Based on Audio Features

Year 2021, , 133 - 144, 01.12.2021
https://doi.org/10.53070/bbd.945894

Abstract

Müziğin dinleyicileri üzerindeki duygusal etkisini anlamak bilim, psikoloji, müzikoloji ve sanat gibi birçok disiplinin ortak çalışma alanıdır. Bu çalışmada, Türk Müziği'nden farklı örneklerin duygu durumunu tahmin etmek için akustik özelliklere dayalı bir yöntem önerilmiştir. Önerilen yöntem, seçilen müzik parçaları üzerinde ön işleme, özellik çıkarma ve sınıflandırma olmak üzere 3 adımdan oluşmaktadır. İlk adım olarak, ön işlemde sinyallerdeki gürültü giderilir ve veri setindeki tüm sinyaller eşit örnekleme frekansına getirilir. İkinci adımda, müziğin duygusal içeriğini yansıtan her sinyalden 1x34 boyutunda bir öznitelik vektörü çıkarılır. Sınıflandırıcılar eğitilmeden önce öznitelikler normalleştirilir. Son adımda, veriler Destek Vektör Makineleri (SVM), K-En Yakın Komşu (K-NN) ve Yapay Sinir Ağı (YSA) kullanılarak sınıflandırılır. Doğruluk, kesinlik, duyarlılık ve F-skoru sınıflandırma ölçütleri olarak kullanılır. Model, Türk müziği verilerinden oluşan 4 sınıflı yeni bir veri seti üzerinde test edildi. Önerilen modelden %79.30 Doğruluk, %78.77 duyarlılık, %78.94 özgüllük ve %79.03 F skoru elde edilmiştir.

References

  • Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
  • Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
  • Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
  • Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
  • Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
  • Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
  • Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
  • Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
  • Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
  • Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
  • Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
  • Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
  • On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
  • Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
  • Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
  • Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
  • Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
  • Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
  • Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
  • Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
  • Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
  • Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
  • Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).
There are 23 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section PAPERS
Authors

Mehmet Bilal Er 0000-0002-2074-1776

Emin Murat Esin 0000-0001-7697-3579

Publication Date December 1, 2021
Submission Date June 3, 2021
Acceptance Date July 7, 2021
Published in Issue Year 2021

Cite

APA Er, M. B., & Esin, E. M. (2021). Music Emotion Recognition with Machine Learning Based on Audio Features. Computer Science, 6(3), 133-144. https://doi.org/10.53070/bbd.945894

The Creative Commons Attribution 4.0 International License 88x31.png  is applied to all research papers published by JCS and

a Digital Object Identifier (DOI)     Logo_TM.png  is assigned for each published paper.