Research Article
BibTex RIS Cite

Classification Of Music Genres From Turkish Music With Deep Learning

Year 2021, Issue: 24, 176 - 183, 15.04.2021
https://doi.org/10.31590/ejosat.898588

Abstract

In this study, an architecture called Convolutional Long Short-term memory deep neural network (CLDNN) based on deep learning, which has not been used before in this field, is used for music genre classification. In addition, a new Turkish Music Database consisting of 200 music belonging to various music genres has been created. The classification performance of the proposed architecture and commonly used machine learning methods has been evaluated on this database. In addition, new features are obtained by using Convolutional Neural Network (CNN), which is the first part of this architecture. Both Mel Frequency Cepstrum Coefficients (MFCC) and log mel filterbank energies were used as input to the Convolutional Neural Network to obtain these new features. In addition to these features, many standard features have been obtained by using various toolboxes. The most successful classification results for all methods are achieved when standard features are used together with new features. The best results among the compared classifiers were achieved with 99.5% by using the remaining part of the proposed architecture, Long Short Term Memory (LSTM), together with the Deep Neural Network (DNN) consisting of fully connected layers.

References

  • Abidin, D., Öztörk, Ö., & Öztörk, T.Ö., (2017). Using data mining for makam recognition in Turkish traditional art music, J. Fac. Eng. Archit. Gazi Univ. 32 1221–1232. https://doi.org/10.17341/gazimmfd.369557.
  • Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., & Lamere, P., (2011). The million song dataset, in: Proc. 12th Int. Soc. Music Inf. Retr. Conf. ISMIR 2011.
  • Breiman, L., (2001). Random forests, Machine. Learning.
  • Cover, T. M., & Hart, P. E., (1967). Nearest Neighbor Pattern Classification, IEEE Trans. Inf. Theory.
  • Çoban, Ö., (2017). Turkish Music Genre Classification using Audio and Lyrics Features, Süleyman Demirel Üniversitesi Fen Bilim. Enstitüsü Derg. 21 322. https://doi.org/10.19113/sdufbed.88303.
  • Er, M. B., & Çiğ, H., (2020). Türk Müziği Uyaranları Kullanılarak İnsan Duygularının Makine Öğrenmesi Yöntemi İle Tanınması, Gazi Üniversitesi Fen Bilim. Derg. Part C Tasarım ve Teknol. 8 458–474. https://doi.org/10.29109/gujsc.687199.
  • Eyben, F., & Schuller, B., (2015). OpenSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor Florian, ACM SIGMultimedia Rec.
  • Feng, T., (2014). Deep learning for music genre classification, Tech. Rep. Univ. Illinois.
  • Friedman, N., Geiger, D., & Goldszmidt, M., (1997). Bayesian Network Classifiers, Mach. Learn. https://doi.org/10.1023/a:1007465528199.
  • Hall, M., & Smith, L., (1998). Feature subset selection: a correlation based filter approach, in: Proc. Int. Conf. Neural Inf. Process. Intell. Inf. Syst.
  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., &Witten, I. H., (2009). The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl.
  • Hizlisoy, S., Yildirim, S., & Tufekci, Z., (2020). Music emotion recognition using convolutional long short term memory deep neural networks, Eng. Sci. Technol. an Int. J. https://doi.org/10.1016/j.jestch.2020.10.009.
  • Hochreiter, S., & Schmidhuber, J., (1997). Long Short-Term Memory, Neural Comput. https://doi.org/10.1162/neco.1997.9.8.1735.
  • McKay, C., (2005). JAudio: Towards a standardized extensible audio music feature extraction system, Course Pap. McGill Univ. Canada.
  • Lartillot, O., & Toiviainen, P., (2007). Mir in matlab (II): A toolbox for musical feature extraction from audio, in: Proc. 8th Int. Conf. Music Inf. Retrieval, ISMIR.
  • LeCun, Y., Hinton. G., & Bengio, Y., (2015). Deep learning Nature. 28;521(7553):436-44. doi: 10.1038/nature14539. PMID: 26017442.
  • Liu, X., Chen, Q., Wu, X., Liu, Y., & Liu, Y., (2017). CNN based music emotion classification, ArXiv.
  • Karatana, A., & Yildiz, O., (2017). Music Genre Classification using Machine Learning Techniques, Signal Process. Commun. Appl. Conf. (SIU), 2017 25th. (2017) 1–4. https://doi.org/10.1109/siu.2017.7960694.
  • Kingma, D. P., & Ba, J. L., (2015). Adam: A method for stochastic optimization, in: 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc.
  • Platt, J., (1999). Fast Training of Support Vector Machines using Sequential Minimal Optimization, in: Adv. Kernel Methods --- Support Vector Learn.
  • Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K.W., & Vinyals, O., (2015). Learning the speech front-end with raw waveform CLDNNs, in: Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH.
  • Sarkar, R., Choudhury, S., Dutta, S., Roy, A., & Saha, S. K., (2020). Recognition of emotion in music based on deep convolutional neural network, Multimed. Tools Appl. https://doi.org/10.1007/s11042-019-08192-x.
  • Sturm, B. L., (2012). An analysis of the GTZAN music genre dataset, in: MIRUM 2012 - Proc. 2nd Int. ACM Work. Co-Located with ACM Multimed. https://doi.org/10.1145/2390848.2390851.
  • Tzanetakis, G., & Cook, P., (2000). MARSYAS: A framework for audio analysis, Organised Sound. https://doi.org/10.1017/S1355771800003071.
  • Thiruvengatanadhan, R., (2020). Musical Genre Classification using Convolutional Neural Networks, Int. Journal of Innovative Technology and Exploring Engineering, https://doi.org/10.35940/ijitee.a8172.1110120.
  • Wong, K.H., Tang, C. P., Chui, K. L., Yu, Y. K., & Zeng, Z., (2018). Music genre classification using a hierarchical long short term memory (LSTM) model, 7. https://doi.org/10.1117/12.2501763.

Derin Öğrenme İle Türkçe Müziklerden Müzik Türü Sınıflandırması

Year 2021, Issue: 24, 176 - 183, 15.04.2021
https://doi.org/10.31590/ejosat.898588

Abstract

Bu çalışmada, müzik türü sınıflandırma yapmak için bu alanda daha önce kullanılmamış derin öğrenmeye dayalı Evrişimli Uzun Kısa süreli bellek derin sinir ağı (CLDNN) adı verilen bir mimari kullanılmıştır. Ayrıca çeşitli müzik türlerine ait 200 müzikten oluşan yeni bir Türkçe Müzik Veritabanı oluşturulmuştur. Önerilen mimarinin ve yaygın olarak kullanılan makine öğrenme metotlarının sınıflandırma performansı oluşturulan bu veri tabanı üzerinde değerlendirilmiştir. Ek olarak, kullanılan bu mimarinin ilk kısmını oluşturan Evrişimli Sinir Ağı (CNN) kullanılarak, yeni öznitelikler elde edilmiştir. Bu yeni öznitelikleri elde etmek için Evrişimli Sinir Ağı’na girdi olarak hem Mel Frekansı Kepstrum Katsayıları (MFCC) hem de log mel filtre bankası enerjileri kullanıldı. Bu özniteliklere ek olarak çeşitli araçlar kullanılarak çok sayıda standart öznitelik elde edilmiştir. Bütün metotlar için en başarılı sınıflandırma sonuçlarına standart özniteliklerle yeni öznitelikler bir arada kullanıldığında ulaşılmıştır. Karşılaştırılan sınıflandırıcılar içerisinde en iyi sonuçlara ise %99,5 ile önerilen mimarinin kalan kısmı olan Uzun Kısa Süreli Bellek (LSTM) ile tam bağlantılı katmanlardan oluşan Derin Sinir Ağı (DNN) birleşimi ile ulaşılmıştır.

References

  • Abidin, D., Öztörk, Ö., & Öztörk, T.Ö., (2017). Using data mining for makam recognition in Turkish traditional art music, J. Fac. Eng. Archit. Gazi Univ. 32 1221–1232. https://doi.org/10.17341/gazimmfd.369557.
  • Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., & Lamere, P., (2011). The million song dataset, in: Proc. 12th Int. Soc. Music Inf. Retr. Conf. ISMIR 2011.
  • Breiman, L., (2001). Random forests, Machine. Learning.
  • Cover, T. M., & Hart, P. E., (1967). Nearest Neighbor Pattern Classification, IEEE Trans. Inf. Theory.
  • Çoban, Ö., (2017). Turkish Music Genre Classification using Audio and Lyrics Features, Süleyman Demirel Üniversitesi Fen Bilim. Enstitüsü Derg. 21 322. https://doi.org/10.19113/sdufbed.88303.
  • Er, M. B., & Çiğ, H., (2020). Türk Müziği Uyaranları Kullanılarak İnsan Duygularının Makine Öğrenmesi Yöntemi İle Tanınması, Gazi Üniversitesi Fen Bilim. Derg. Part C Tasarım ve Teknol. 8 458–474. https://doi.org/10.29109/gujsc.687199.
  • Eyben, F., & Schuller, B., (2015). OpenSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor Florian, ACM SIGMultimedia Rec.
  • Feng, T., (2014). Deep learning for music genre classification, Tech. Rep. Univ. Illinois.
  • Friedman, N., Geiger, D., & Goldszmidt, M., (1997). Bayesian Network Classifiers, Mach. Learn. https://doi.org/10.1023/a:1007465528199.
  • Hall, M., & Smith, L., (1998). Feature subset selection: a correlation based filter approach, in: Proc. Int. Conf. Neural Inf. Process. Intell. Inf. Syst.
  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., &Witten, I. H., (2009). The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl.
  • Hizlisoy, S., Yildirim, S., & Tufekci, Z., (2020). Music emotion recognition using convolutional long short term memory deep neural networks, Eng. Sci. Technol. an Int. J. https://doi.org/10.1016/j.jestch.2020.10.009.
  • Hochreiter, S., & Schmidhuber, J., (1997). Long Short-Term Memory, Neural Comput. https://doi.org/10.1162/neco.1997.9.8.1735.
  • McKay, C., (2005). JAudio: Towards a standardized extensible audio music feature extraction system, Course Pap. McGill Univ. Canada.
  • Lartillot, O., & Toiviainen, P., (2007). Mir in matlab (II): A toolbox for musical feature extraction from audio, in: Proc. 8th Int. Conf. Music Inf. Retrieval, ISMIR.
  • LeCun, Y., Hinton. G., & Bengio, Y., (2015). Deep learning Nature. 28;521(7553):436-44. doi: 10.1038/nature14539. PMID: 26017442.
  • Liu, X., Chen, Q., Wu, X., Liu, Y., & Liu, Y., (2017). CNN based music emotion classification, ArXiv.
  • Karatana, A., & Yildiz, O., (2017). Music Genre Classification using Machine Learning Techniques, Signal Process. Commun. Appl. Conf. (SIU), 2017 25th. (2017) 1–4. https://doi.org/10.1109/siu.2017.7960694.
  • Kingma, D. P., & Ba, J. L., (2015). Adam: A method for stochastic optimization, in: 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc.
  • Platt, J., (1999). Fast Training of Support Vector Machines using Sequential Minimal Optimization, in: Adv. Kernel Methods --- Support Vector Learn.
  • Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K.W., & Vinyals, O., (2015). Learning the speech front-end with raw waveform CLDNNs, in: Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH.
  • Sarkar, R., Choudhury, S., Dutta, S., Roy, A., & Saha, S. K., (2020). Recognition of emotion in music based on deep convolutional neural network, Multimed. Tools Appl. https://doi.org/10.1007/s11042-019-08192-x.
  • Sturm, B. L., (2012). An analysis of the GTZAN music genre dataset, in: MIRUM 2012 - Proc. 2nd Int. ACM Work. Co-Located with ACM Multimed. https://doi.org/10.1145/2390848.2390851.
  • Tzanetakis, G., & Cook, P., (2000). MARSYAS: A framework for audio analysis, Organised Sound. https://doi.org/10.1017/S1355771800003071.
  • Thiruvengatanadhan, R., (2020). Musical Genre Classification using Convolutional Neural Networks, Int. Journal of Innovative Technology and Exploring Engineering, https://doi.org/10.35940/ijitee.a8172.1110120.
  • Wong, K.H., Tang, C. P., Chui, K. L., Yu, Y. K., & Zeng, Z., (2018). Music genre classification using a hierarchical long short term memory (LSTM) model, 7. https://doi.org/10.1117/12.2501763.
There are 26 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Serhat Hızlısoy 0000-0001-8440-5539

Zekeriya Tüfekci 0000-0001-7835-2741

Publication Date April 15, 2021
Published in Issue Year 2021 Issue: 24

Cite

APA Hızlısoy, S., & Tüfekci, Z. (2021). Derin Öğrenme İle Türkçe Müziklerden Müzik Türü Sınıflandırması. Avrupa Bilim Ve Teknoloji Dergisi(24), 176-183. https://doi.org/10.31590/ejosat.898588