Recognizing Musical Notation Using Convolutional Neural Networks

Ahmad Othman; Cem Direkoğlu

doi:10.31590/ejosat.823266

Research Article

Evrişimli Sinir Ağlarını Kullanarak Müzik Notasyonunu Tanıma

Year 2020, Ejosat Special Issue 2020 (ISMSIT), 283 - 290, 30.11.2020

Ahmad Othman , Cem Direkoğlu

https://doi.org/10.31590/ejosat.823266

Abstract

Müzik notaları, müziğin gelişiminde kritik bir rol oynar. Yüzyıllar boyunca müzik, ister bestecisinin el yazması isterse herhangi bir yazılı versiyon olsun, resim biçiminde tutulmuştur. Bununla birlikte, müzik notalarının resim biçiminde arşiv edilmesi, müzik bilgilerinin alınması için birçok zorluğu doğurmuştur. Müzik notası tanıma, MIDI (çalma için) ve musicXML (sayfa düzeni için) gibi, müzik notalarının düzenlenebilecek veya çalınabilecek şekilde tanınmasına izin veren optik karakter tanıma (OCR) uygulamalarından biridir. Bu yazıda, görüntülerde nota tanıma için Evrişimli Sinir Ağları (CNN) tabanlı bir çerçeve öneriyoruz. Not ve dinlenme görüntülerinin genel özelliklerini çıkarmak için, önceden eğitilmiş popüler bir CNN ağı, yani ResNet-101'i kullanıyoruz. Ardından, eğitim ve sınıflandırma amacıyla bir Destek Vektör Makinesi (SVM) kullanılır. ResNet-101, görüntü tanıma için son teknoloji ürünü önceden eğitilmiş ağlardan biridir, ResNet-101 bir milyondan fazla görüntüyle eğitilmiştir. Hızlı bir doğrusal çözücü kullanan çok sınıflı SVM sınıflandırıcılar da çok güçlü bir sınıflandırıcıdır. Çalışmamızı test etmek için, deneyimizde veri seti Attwenger, P RecordLabel ve OMR-veri setinden türetildi ve ardından müzik teorisi ile manuel olarak etiketlendi. Sonuç olarak, notaları ve dinlenmeleri birbirinden %99.02 oranıda doğru bir şekilde ayırabiliriz. Ayrıca beş farklı not türünü sınıflandırabiliriz. Bu çalışmada, Resnet-101 ve bir SVM'in ile kez birleştirilerek müzik notası tanıma için bir araya getirilmiştir ve sonuçlar çok umut vericidir.

Keywords

Optik müzik tanıma, evrişimli sinir ağları, destek vektör makinesi, nota tanıma

References

Attwenger, P. (2015). RecordLabel, http://homepage. univie.ac.at/a1200595/recordlabel/
Bainbridge, D., & Bell, T. (2001). The challenge of optical music recognition. Comput. Humanit, 35, 95–121, doi:10.1023/A:1002485918032.
Calvo-Zaragoza, J., & Rizo, D. (2018). End-to-End Neural Optical Music Recognition of Monophonic Scores, Appl. Sci, 8, 606, doi:10.3390/app8040606.
Casey, M., & Veltkamp, R., & Goto, M., & Leman, M., & Rhodes, C., & Slaney, M. (2008). Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proc. of IEEE, 668–696, doi:10.1109/JPROC.2008.916370.
Cho, K., & van Merrienboer, B., & Gulcehre, C., & Bahdanau, D., & Bougares, F., & Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, arXiv 2014, arXiv:1406.1078
Dai, J., & Li, Y., & He, K., & Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks, arXiv 2016, arXiv:1605.06409.
Girshick, R. (2015). Fast R-CNN. arXiv 2015, arXiv:1504.08083.
Good, M., & Actor, G. (2003). Using MusicXML for file interchange. International Conference on WEB Delivering of Music, 15–17, doi:10.1109/WDM.2003.1233890.
Hajiˇc, J., & Pecina, P. (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. IAPR International Conference on Document Analysis and Recognition (ICDAR), 39–46, doi:10.1109/ICDAR.2017.16.
Hajiˇc, J., & Dorfer, M., & Widmer, G., Pecina, P. (2018). Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. International Society for Music Information Retrieval Conference, 23–27.
He, K., & Zhang, X., & Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition, arXiv 2015, arXiv:1512.03385.
LeCun, Y., & Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444, doi:10.1038/nature14539.
Lin, T.Y., & Goyal, P., & Girshick, R., & He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection, arXiv 2017, arXiv:1708.02002.
Liu, W., & Anguelov, D., & Erhan, D., & Szegedy, C., & Reed, S., & Fu, C.Y., & Berg, A.C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision; Springer: Cham, Switzerland, 21–37, doi:10.1007/978-3-319-46448-0_2.
Pacha, A., & Hajiˇc, J., & Calvo-Zaragoza, J. (2018). A Baseline for General Music Object Detection with Deep Learning, Appl. Sci., 8, 1488, doi:10.3390/app8091488.
Rebelo, A., & Fujinaga, I., & Paszkiewicz, F., & Marcal, A.R.S., & Guedes, C., & Cardoso, J.S. (2012). Optical music recognition: State-of-the-art and open issues. Int. J. Multimed. Inf. Retr, 1, 173–190, doi:10.1007/s13735-012-0004-6.
Redmon, J., & Divvala, S., & Girshick, R., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection, arXiv 2015, arXiv:1506.02640.
Ren, S., & He, K., & Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497.
ResNet, (2015). https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-classification-localization-detection-e39402bfa5d8
Ronneberger, O., & Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv 2015, arXiv:1505.04597.
Sutskever, I., & Vinyals, O., & Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds., 3104–3112.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Pelillo, M., & Stadelmann, T. (2018). DeepScores—A Dataset for Segmentation, Detection and Classification of Tiny Objects. arXiv 2018, arXiv:1804.00525.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Stadelmann, T. (2018B). Deep Watershed Detector for Music Object Recognition, arXiv 2018, arXiv:1805.10548.
Van der Wel, E., & Ullrich, K. (2017). Optical Music Recognition with Convolutional Sequence-to-Sequence Models, arXiv 2017, arXiv:1707.04877.

Recognizing Musical Notation Using Convolutional Neural Networks

Year 2020, Ejosat Special Issue 2020 (ISMSIT), 283 - 290, 30.11.2020

Ahmad Othman , Cem Direkoğlu

https://doi.org/10.31590/ejosat.823266

Abstract

Musical scores are the essential of music theory and its development. Musical notation was developed by Greeks around 521 BCE, considering that music was developed a long time ago will will find a gap between new musical technology and old scrpits of music theory since they were written in. However, having music scores in written form has rised various kinds of problems for music information retrieval (MIR). Music notation recognition is a type of optical character recognition (OCR) applications, which allow us to recognize musical scores and convert it to a format that can be editied or played on computer such as musicXML (for page layout). In this paper, we introduce a Convolutional Neural Networks (CNN) based framework for musical notation recognition in images. We use a popular pre-trained CNN network, namely ResNet-101 to extract global features of notation and rest images. Then, a Support Vector Machine (SVM) is employed for training and classification purpose. ResNet-101 is one of the state-of-art pre-trained network for image recognition, ResNet-101 trained with more than a million images. Multiclass SVM classifiers using a fast-linear solver is also very powerful classifier. We also evaluated the proposed approach on a dataset that was derived from Attwenger, P RecordLabel and OMR-dataset, and then labeled manually by music theory. As a result, we can separate notes and rests from each other with an average accuracy of 99.02%. We can also classify five different note types. This is the first time that Resnet-101 and a SVM is combined together to perform musical notation recognition, and results are very promising.

Keywords

Optical music recognition, convolutional neural networks, support vector machine, notation recognition

References

Attwenger, P. (2015). RecordLabel, http://homepage. univie.ac.at/a1200595/recordlabel/
Bainbridge, D., & Bell, T. (2001). The challenge of optical music recognition. Comput. Humanit, 35, 95–121, doi:10.1023/A:1002485918032.
Calvo-Zaragoza, J., & Rizo, D. (2018). End-to-End Neural Optical Music Recognition of Monophonic Scores, Appl. Sci, 8, 606, doi:10.3390/app8040606.
Casey, M., & Veltkamp, R., & Goto, M., & Leman, M., & Rhodes, C., & Slaney, M. (2008). Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proc. of IEEE, 668–696, doi:10.1109/JPROC.2008.916370.
Cho, K., & van Merrienboer, B., & Gulcehre, C., & Bahdanau, D., & Bougares, F., & Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, arXiv 2014, arXiv:1406.1078
Dai, J., & Li, Y., & He, K., & Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks, arXiv 2016, arXiv:1605.06409.
Girshick, R. (2015). Fast R-CNN. arXiv 2015, arXiv:1504.08083.
Good, M., & Actor, G. (2003). Using MusicXML for file interchange. International Conference on WEB Delivering of Music, 15–17, doi:10.1109/WDM.2003.1233890.
Hajiˇc, J., & Pecina, P. (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. IAPR International Conference on Document Analysis and Recognition (ICDAR), 39–46, doi:10.1109/ICDAR.2017.16.
Hajiˇc, J., & Dorfer, M., & Widmer, G., Pecina, P. (2018). Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. International Society for Music Information Retrieval Conference, 23–27.
He, K., & Zhang, X., & Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition, arXiv 2015, arXiv:1512.03385.
LeCun, Y., & Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444, doi:10.1038/nature14539.
Lin, T.Y., & Goyal, P., & Girshick, R., & He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection, arXiv 2017, arXiv:1708.02002.
Liu, W., & Anguelov, D., & Erhan, D., & Szegedy, C., & Reed, S., & Fu, C.Y., & Berg, A.C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision; Springer: Cham, Switzerland, 21–37, doi:10.1007/978-3-319-46448-0_2.
Pacha, A., & Hajiˇc, J., & Calvo-Zaragoza, J. (2018). A Baseline for General Music Object Detection with Deep Learning, Appl. Sci., 8, 1488, doi:10.3390/app8091488.
Rebelo, A., & Fujinaga, I., & Paszkiewicz, F., & Marcal, A.R.S., & Guedes, C., & Cardoso, J.S. (2012). Optical music recognition: State-of-the-art and open issues. Int. J. Multimed. Inf. Retr, 1, 173–190, doi:10.1007/s13735-012-0004-6.
Redmon, J., & Divvala, S., & Girshick, R., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection, arXiv 2015, arXiv:1506.02640.
Ren, S., & He, K., & Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497.
ResNet, (2015). https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-classification-localization-detection-e39402bfa5d8
Ronneberger, O., & Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv 2015, arXiv:1505.04597.
Sutskever, I., & Vinyals, O., & Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds., 3104–3112.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Pelillo, M., & Stadelmann, T. (2018). DeepScores—A Dataset for Segmentation, Detection and Classification of Tiny Objects. arXiv 2018, arXiv:1804.00525.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Stadelmann, T. (2018B). Deep Watershed Detector for Music Object Recognition, arXiv 2018, arXiv:1805.10548.
Van der Wel, E., & Ullrich, K. (2017). Optical Music Recognition with Convolutional Sequence-to-Sequence Models, arXiv 2017, arXiv:1707.04877.

There are 24 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Ahmad Othman 0000-0001-8156-8965 Cem Direkoğlu 0000-0001-7709-4082
Publication Date	November 30, 2020
Published in Issue	Year 2020 Ejosat Special Issue 2020 (ISMSIT)

Cite

APA	Othman, A., & Direkoğlu, C. (2020). Recognizing Musical Notation Using Convolutional Neural Networks. Avrupa Bilim Ve Teknoloji Dergisi283-290. https://doi.org/10.31590/ejosat.823266

Download Cover Image

Article Files

Full Text