Araştırma Makalesi
BibTex RIS Kaynak Göster

END-TO-END AUTOMATIC MUSIC TRANSCRIPTION OF POLYPHONIC QANUN AND OUD MUSIC USING DEEP NEURAL NETWORK

Yıl 2024, Cilt: 25 Sayı: 3, 442 - 455, 30.09.2024
https://doi.org/10.18038/estubtda.1467350

Öz

This paper introduces an automatic music transcription model using Deep Neural Networks (DNNs), focusing on simulating the "trained ear" in music. It advances the field of signal processing and music technology, particularly in multi-instrument transcription involving traditional Turkish instruments, Qanun and Oud. Those instruments have unique timbral characteristics with early decay periods. The study involves generating basic combinations of multi-pitch datasets, training the DNN model on this data, and demonstrating its effectiveness in transcribing two-part compositions with high accuracy and F1 measures. The model's training involves understanding the fundamental characteristics of individual instruments, enabling it to identify and isolate complex patterns in mixed compositions. The primary goal is to empower the model to distinguish and analyze individual musical components, thereby enhancing applications in music production, audio engineering, and education

Kaynakça

  • [1] Benetos E, Dixon S, Duan Z, Ewert S. Automatic Music Transcription: An Overview. IEEE Signal Process Mag. 2019;36(1):20-30. doi:10.1109/MSP.2018.2869928
  • [2] Bertin N, Badeau R, Richard G. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07. IEEE; 2007:I-65-I-68. doi:10.1109/ICASSP.2007.366617
  • [3] Ansari S, Alatrany AS, Alnajjar KA, et al. A survey of artificial intelligence approaches in blind source separation. Neurocomputing. 2023;561:126895. doi:https://doi.org/10.1016/j.neucom.2023.126895
  • [4] Munoz-Montoro AJ, Carabias-Orti JJ, Cabanas-Molero P, Canadas-Quesada FJ, Ruiz-Reyes N. Multichannel Blind Music Source Separation Using Directivity-Aware MNMF With Harmonicity Constraints. IEEE Access. 2022;10:17781-17795. doi:10.1109/ACCESS.2022.3150248
  • [5] Uhlich S, Giron F, Mitsufuji Y. Deep neural network based instrument extraction from music. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2015. doi:10.1109/ICASSP.2015.7178348
  • [6] Nishikimi R, Nakamura E, Goto M, Yoshii K. Audio-to-score singing transcription based on a CRNN-HSMM hybrid model. APSIPA Trans Signal Inf Process. 2021;10(1):e7. doi:10.1017/ATSIP.2021.4
  • [7] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, d’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [8] Sigtia S, Benetos E, DIxon S. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans Audio Speech Lang Process. 2016;24(5):927-939. doi:10.1109/TASLP.2016.2533858
  • [9] Seetharaman P, Pishdadian F, Pardo B. Music/Voice separation using the 2D fourier transform. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. ; 2017. doi:10.1109/WASPAA.2017.8169990
  • [10] Huang P Sen, Chen SD, Smaragdis P, Hasegawa-Johnson M. Singing-voice separation from monaural recordings using robust principal component analysis. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2012. doi:10.1109/ICASSP.2012.6287816
  • [11] Tervaniemi M. Musicians - Same or different. In: Annals of the New York Academy of Sciences. Vol 1169. ; 2009. doi:10.1111/j.1749-6632.2009.04591.x
  • [12] Andrianopoulou M. Aural Education: Reconceptualising Ear Training in Higher Music Learning. Taylor & Francis; 2019. https://books.google.com.tr/books?id=p_S2DwAAQBAJ
  • [13] Corey J. Technical ear training: Tools and practical methods. In: Proceedings of Meetings on Acoustics. Vol 19. AIP Publishing; 2013:025016-025016. doi:10.1121/1.4795853
  • [14] Chabriel G, Kleinsteuber M, Moreau E, Shen H, Tichavsky P, Yeredor A. Joint Matrices Decompositions and Blind Source Separation. A Survey of Methods, Identification and Applications. Signal Processing Magazine, IEEE. 2014;31:34-43. doi:10.1109/MSP.2014.2298045
  • [15] Luo Z, Li C, Zhu L. A comprehensive survey on blind source separation for wireless adaptive processing: Principles, perspectives, challenges and new research directions. IEEE Access. Published online 2018. doi:10.1109/ACCESS.2018.2879380
  • [16] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi:10.1038/nature14539
  • [17] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, D’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol 2015-August. Institute of Electrical and Electronics Engineers Inc.; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [18] Brown J. Calculation of a Constant Q Spectral Transform. Journal of the Acoustical Society of America. 1991;89:425. doi:10.1121/1.400476
  • [19] Giannakopoulos T, Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Introduction to Audio Analysis: A MATLAB Approach. Published online 2014:1-266. doi:10.1016/C2012-0-03524-7
  • [20] AnthemScore 5 Music AI 2024. https://www.lunaverus.com/
  • [21] Benetos E, Cherla S, Weyde T. An Efficient Shift-Invariant Model for Polyphonic Music Transcription. https://code.soundsoftware.ac.uk/projects/amt_mssiplca_fast
Yıl 2024, Cilt: 25 Sayı: 3, 442 - 455, 30.09.2024
https://doi.org/10.18038/estubtda.1467350

Öz

Kaynakça

  • [1] Benetos E, Dixon S, Duan Z, Ewert S. Automatic Music Transcription: An Overview. IEEE Signal Process Mag. 2019;36(1):20-30. doi:10.1109/MSP.2018.2869928
  • [2] Bertin N, Badeau R, Richard G. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07. IEEE; 2007:I-65-I-68. doi:10.1109/ICASSP.2007.366617
  • [3] Ansari S, Alatrany AS, Alnajjar KA, et al. A survey of artificial intelligence approaches in blind source separation. Neurocomputing. 2023;561:126895. doi:https://doi.org/10.1016/j.neucom.2023.126895
  • [4] Munoz-Montoro AJ, Carabias-Orti JJ, Cabanas-Molero P, Canadas-Quesada FJ, Ruiz-Reyes N. Multichannel Blind Music Source Separation Using Directivity-Aware MNMF With Harmonicity Constraints. IEEE Access. 2022;10:17781-17795. doi:10.1109/ACCESS.2022.3150248
  • [5] Uhlich S, Giron F, Mitsufuji Y. Deep neural network based instrument extraction from music. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2015. doi:10.1109/ICASSP.2015.7178348
  • [6] Nishikimi R, Nakamura E, Goto M, Yoshii K. Audio-to-score singing transcription based on a CRNN-HSMM hybrid model. APSIPA Trans Signal Inf Process. 2021;10(1):e7. doi:10.1017/ATSIP.2021.4
  • [7] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, d’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [8] Sigtia S, Benetos E, DIxon S. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans Audio Speech Lang Process. 2016;24(5):927-939. doi:10.1109/TASLP.2016.2533858
  • [9] Seetharaman P, Pishdadian F, Pardo B. Music/Voice separation using the 2D fourier transform. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. ; 2017. doi:10.1109/WASPAA.2017.8169990
  • [10] Huang P Sen, Chen SD, Smaragdis P, Hasegawa-Johnson M. Singing-voice separation from monaural recordings using robust principal component analysis. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2012. doi:10.1109/ICASSP.2012.6287816
  • [11] Tervaniemi M. Musicians - Same or different. In: Annals of the New York Academy of Sciences. Vol 1169. ; 2009. doi:10.1111/j.1749-6632.2009.04591.x
  • [12] Andrianopoulou M. Aural Education: Reconceptualising Ear Training in Higher Music Learning. Taylor & Francis; 2019. https://books.google.com.tr/books?id=p_S2DwAAQBAJ
  • [13] Corey J. Technical ear training: Tools and practical methods. In: Proceedings of Meetings on Acoustics. Vol 19. AIP Publishing; 2013:025016-025016. doi:10.1121/1.4795853
  • [14] Chabriel G, Kleinsteuber M, Moreau E, Shen H, Tichavsky P, Yeredor A. Joint Matrices Decompositions and Blind Source Separation. A Survey of Methods, Identification and Applications. Signal Processing Magazine, IEEE. 2014;31:34-43. doi:10.1109/MSP.2014.2298045
  • [15] Luo Z, Li C, Zhu L. A comprehensive survey on blind source separation for wireless adaptive processing: Principles, perspectives, challenges and new research directions. IEEE Access. Published online 2018. doi:10.1109/ACCESS.2018.2879380
  • [16] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi:10.1038/nature14539
  • [17] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, D’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol 2015-August. Institute of Electrical and Electronics Engineers Inc.; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [18] Brown J. Calculation of a Constant Q Spectral Transform. Journal of the Acoustical Society of America. 1991;89:425. doi:10.1121/1.400476
  • [19] Giannakopoulos T, Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Introduction to Audio Analysis: A MATLAB Approach. Published online 2014:1-266. doi:10.1016/C2012-0-03524-7
  • [20] AnthemScore 5 Music AI 2024. https://www.lunaverus.com/
  • [21] Benetos E, Cherla S, Weyde T. An Efficient Shift-Invariant Model for Polyphonic Music Transcription. https://code.soundsoftware.ac.uk/projects/amt_mssiplca_fast
Toplam 21 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgi Sistemleri (Diğer), Elektrik Mühendisliği (Diğer)
Bölüm Makaleler
Yazarlar

Emin Germen 0000-0003-1301-3786

Can Karadoğan 0000-0003-3611-6980

Yayımlanma Tarihi 30 Eylül 2024
Gönderilme Tarihi 11 Nisan 2024
Kabul Tarihi 18 Eylül 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 25 Sayı: 3

Kaynak Göster

AMA Germen E, Karadoğan C. END-TO-END AUTOMATIC MUSIC TRANSCRIPTION OF POLYPHONIC QANUN AND OUD MUSIC USING DEEP NEURAL NETWORK. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering. Eylül 2024;25(3):442-455. doi:10.18038/estubtda.1467350