Araştırma Makalesi
BibTex RIS Kaynak Göster

Music Emotion Recognition with Machine Learning Based on Audio Features

Yıl 2021, , 133 - 144, 01.12.2021
https://doi.org/10.53070/bbd.945894

Öz

Understanding the emotional impact of music on its audience is a common field of study in many disciplines such as science, psychology, musicology and art. In this study, a method based on acoustic features is proposed to predict the emotion of different samples from Turkish Music. The proposed method consists of 3 steps: preprocessing, feature extraction and classification on selected music pieces. As a first step, the noise in the signals is removed in the pre-process and all the signals in the data set are brought to the equal sampling frequency. In the second step, a 1x34 size feature vector is extracted from each signal, reflecting the emotional content of the music. The features are normalized before the classifiers are trained. In the last step, the data are classified using Support Vector Machines (SVM), K-Nearest Neighbor (K-NN) and Artificial Neural Network (ANN). Accuracy, precision, sensitivity and F-score are used as classification metrics. The model was tested on a new 4-class data set consisting of Turkish music data. 79.30% Accuracy, 78.77% sensitivity, 78.94% specificity and 79.03% F-score are obtained from the proposed model.

Kaynakça

  • Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
  • Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
  • Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
  • Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
  • Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
  • Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
  • Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
  • Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
  • Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
  • Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
  • Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
  • Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
  • On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
  • Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
  • Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
  • Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
  • Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
  • Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
  • Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
  • Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
  • Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
  • Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
  • Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).

Music Emotion Recognition with Machine Learning Based on Audio Features

Yıl 2021, , 133 - 144, 01.12.2021
https://doi.org/10.53070/bbd.945894

Öz

Müziğin dinleyicileri üzerindeki duygusal etkisini anlamak bilim, psikoloji, müzikoloji ve sanat gibi birçok disiplinin ortak çalışma alanıdır. Bu çalışmada, Türk Müziği'nden farklı örneklerin duygu durumunu tahmin etmek için akustik özelliklere dayalı bir yöntem önerilmiştir. Önerilen yöntem, seçilen müzik parçaları üzerinde ön işleme, özellik çıkarma ve sınıflandırma olmak üzere 3 adımdan oluşmaktadır. İlk adım olarak, ön işlemde sinyallerdeki gürültü giderilir ve veri setindeki tüm sinyaller eşit örnekleme frekansına getirilir. İkinci adımda, müziğin duygusal içeriğini yansıtan her sinyalden 1x34 boyutunda bir öznitelik vektörü çıkarılır. Sınıflandırıcılar eğitilmeden önce öznitelikler normalleştirilir. Son adımda, veriler Destek Vektör Makineleri (SVM), K-En Yakın Komşu (K-NN) ve Yapay Sinir Ağı (YSA) kullanılarak sınıflandırılır. Doğruluk, kesinlik, duyarlılık ve F-skoru sınıflandırma ölçütleri olarak kullanılır. Model, Türk müziği verilerinden oluşan 4 sınıflı yeni bir veri seti üzerinde test edildi. Önerilen modelden %79.30 Doğruluk, %78.77 duyarlılık, %78.94 özgüllük ve %79.03 F skoru elde edilmiştir.

Kaynakça

  • Chauhan, P. M., & Desai, N. (2014). "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), (s. 1-5). Coimbatore.
  • Chien, S., ASCE, M., Ding, Y., & Wei, C. (2002). Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. Journal of Transportation Engineering, 128(5), 429-438.
  • Cristianini, N., & Ricci, E. (1992). Support Vector Machines. Encyclopedia of Algorithms.
  • Davies, M. E., & Plumbley , M. (2007). Context-Dependent Beat Tracking of Musical Audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.
  • Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. ISMIR 2018.
  • Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. The American Journal of Psychology, 48(2), 246-268.
  • Huq, A., Bello, J., & Rowe, R. (2010). Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3), 227-244.
  • Jégou, H., Matthijs, D., & Schmid, C. (2010). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117-1128.
  • Lartillot, O. (2018). MIRtoolbox 1.7.1 User's Manual. Jyväskylä, Finland.
  • Lidy, T., & Rauber, A. (2006). Computing statistical spectrum descriptors for audio music similarity and retrieval. MIREX 2006 - Music Information Retrieval Evaluation.
  • Lin, C., Liu, M., Hsiung, W., & Jhang, J. (2016). MUSIC EMOTION RECOGNITION BASED ON TWO-LEVEL SUPPORTVECTOR CLASSIFICATION. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), (s. 375-386).
  • Lu, L., Liu, D., & Zhang , H.-J. (2006). Automatic mood detection and tracking of music audio signals. in IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 5-18.
  • On, C. K., Pandiyan, P., Yaacob, S., & Saudi, A. (2006). Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatic, (s. 1-5). Kuala Lumpur.
  • Panda, R., Rocha, B., & Paiva, R. (2015). Music Emotion Recognition with Standard and Melodic Audio Features. Applied Artificial Intelligence, 29(4), 313-334.
  • Ren, J.-M., Wu , M.-J., & Jang , J.-S. (2015). Automatic Music Mood Classification Based on Timbre and Modulation Features. IEEE Transactions on Affective Computing, 6(3), 236-246.
  • Schmidt, E. M., & Kim, Y. (2011). Learning emotion-based acoustic features with deep belief networks. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (s. 65-68). New Paltz.
  • Schölkopf, B., & Smola, A. (1990). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
  • Song, Y., Dixon, S., & Pearce, M. (2012). Evaluation of musical features for emotion classification. Proc. ISMIR, (s. 523-528).
  • Toh, A. M., Togneri, R., & Nordholm, S. (2005). pectral entropy as speech features for speech. In Proceedings of PEECS, (s. 22-25).
  • Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
  • Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560-2574.
  • Yang, Q., Blond, S., Aggarwal, R., Wang, Y., & Li, J. (2017). New ANN method for multi-terminal HVDC protection relayin. Electric Power Systems Research , 148, 191-201.
  • Yang, Y.-H., & Chen, H.-H. (2012). Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol, 3(4).
Toplam 23 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgisayar Yazılımı
Bölüm PAPERS
Yazarlar

Mehmet Bilal Er 0000-0002-2074-1776

Emin Murat Esin 0000-0001-7697-3579

Yayımlanma Tarihi 1 Aralık 2021
Gönderilme Tarihi 3 Haziran 2021
Kabul Tarihi 7 Temmuz 2021
Yayımlandığı Sayı Yıl 2021

Kaynak Göster

APA Er, M. B., & Esin, E. M. (2021). Music Emotion Recognition with Machine Learning Based on Audio Features. Computer Science, 6(3), 133-144. https://doi.org/10.53070/bbd.945894

The Creative Commons Attribution 4.0 International License 88x31.png  is applied to all research papers published by JCS and

a Digital Object Identifier (DOI)     Logo_TM.png  is assigned for each published paper.