Research Article
BibTex RIS Cite

END-TO-END AUTOMATIC MUSIC TRANSCRIPTION OF POLYPHONIC QANUN AND OUD MUSIC USING DEEP NEURAL NETWORK

Year 2024, , 442 - 455, 30.09.2024
https://doi.org/10.18038/estubtda.1467350

Abstract

This paper introduces an automatic music transcription model using Deep Neural Networks (DNNs), focusing on simulating the "trained ear" in music. It advances the field of signal processing and music technology, particularly in multi-instrument transcription involving traditional Turkish instruments, Qanun and Oud. Those instruments have unique timbral characteristics with early decay periods. The study involves generating basic combinations of multi-pitch datasets, training the DNN model on this data, and demonstrating its effectiveness in transcribing two-part compositions with high accuracy and F1 measures. The model's training involves understanding the fundamental characteristics of individual instruments, enabling it to identify and isolate complex patterns in mixed compositions. The primary goal is to empower the model to distinguish and analyze individual musical components, thereby enhancing applications in music production, audio engineering, and education

References

  • [1] Benetos E, Dixon S, Duan Z, Ewert S. Automatic Music Transcription: An Overview. IEEE Signal Process Mag. 2019;36(1):20-30. doi:10.1109/MSP.2018.2869928
  • [2] Bertin N, Badeau R, Richard G. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07. IEEE; 2007:I-65-I-68. doi:10.1109/ICASSP.2007.366617
  • [3] Ansari S, Alatrany AS, Alnajjar KA, et al. A survey of artificial intelligence approaches in blind source separation. Neurocomputing. 2023;561:126895. doi:https://doi.org/10.1016/j.neucom.2023.126895
  • [4] Munoz-Montoro AJ, Carabias-Orti JJ, Cabanas-Molero P, Canadas-Quesada FJ, Ruiz-Reyes N. Multichannel Blind Music Source Separation Using Directivity-Aware MNMF With Harmonicity Constraints. IEEE Access. 2022;10:17781-17795. doi:10.1109/ACCESS.2022.3150248
  • [5] Uhlich S, Giron F, Mitsufuji Y. Deep neural network based instrument extraction from music. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2015. doi:10.1109/ICASSP.2015.7178348
  • [6] Nishikimi R, Nakamura E, Goto M, Yoshii K. Audio-to-score singing transcription based on a CRNN-HSMM hybrid model. APSIPA Trans Signal Inf Process. 2021;10(1):e7. doi:10.1017/ATSIP.2021.4
  • [7] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, d’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [8] Sigtia S, Benetos E, DIxon S. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans Audio Speech Lang Process. 2016;24(5):927-939. doi:10.1109/TASLP.2016.2533858
  • [9] Seetharaman P, Pishdadian F, Pardo B. Music/Voice separation using the 2D fourier transform. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. ; 2017. doi:10.1109/WASPAA.2017.8169990
  • [10] Huang P Sen, Chen SD, Smaragdis P, Hasegawa-Johnson M. Singing-voice separation from monaural recordings using robust principal component analysis. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2012. doi:10.1109/ICASSP.2012.6287816
  • [11] Tervaniemi M. Musicians - Same or different. In: Annals of the New York Academy of Sciences. Vol 1169. ; 2009. doi:10.1111/j.1749-6632.2009.04591.x
  • [12] Andrianopoulou M. Aural Education: Reconceptualising Ear Training in Higher Music Learning. Taylor & Francis; 2019. https://books.google.com.tr/books?id=p_S2DwAAQBAJ
  • [13] Corey J. Technical ear training: Tools and practical methods. In: Proceedings of Meetings on Acoustics. Vol 19. AIP Publishing; 2013:025016-025016. doi:10.1121/1.4795853
  • [14] Chabriel G, Kleinsteuber M, Moreau E, Shen H, Tichavsky P, Yeredor A. Joint Matrices Decompositions and Blind Source Separation. A Survey of Methods, Identification and Applications. Signal Processing Magazine, IEEE. 2014;31:34-43. doi:10.1109/MSP.2014.2298045
  • [15] Luo Z, Li C, Zhu L. A comprehensive survey on blind source separation for wireless adaptive processing: Principles, perspectives, challenges and new research directions. IEEE Access. Published online 2018. doi:10.1109/ACCESS.2018.2879380
  • [16] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi:10.1038/nature14539
  • [17] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, D’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol 2015-August. Institute of Electrical and Electronics Engineers Inc.; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [18] Brown J. Calculation of a Constant Q Spectral Transform. Journal of the Acoustical Society of America. 1991;89:425. doi:10.1121/1.400476
  • [19] Giannakopoulos T, Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Introduction to Audio Analysis: A MATLAB Approach. Published online 2014:1-266. doi:10.1016/C2012-0-03524-7
  • [20] AnthemScore 5 Music AI 2024. https://www.lunaverus.com/
  • [21] Benetos E, Cherla S, Weyde T. An Efficient Shift-Invariant Model for Polyphonic Music Transcription. https://code.soundsoftware.ac.uk/projects/amt_mssiplca_fast
Year 2024, , 442 - 455, 30.09.2024
https://doi.org/10.18038/estubtda.1467350

Abstract

References

  • [1] Benetos E, Dixon S, Duan Z, Ewert S. Automatic Music Transcription: An Overview. IEEE Signal Process Mag. 2019;36(1):20-30. doi:10.1109/MSP.2018.2869928
  • [2] Bertin N, Badeau R, Richard G. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07. IEEE; 2007:I-65-I-68. doi:10.1109/ICASSP.2007.366617
  • [3] Ansari S, Alatrany AS, Alnajjar KA, et al. A survey of artificial intelligence approaches in blind source separation. Neurocomputing. 2023;561:126895. doi:https://doi.org/10.1016/j.neucom.2023.126895
  • [4] Munoz-Montoro AJ, Carabias-Orti JJ, Cabanas-Molero P, Canadas-Quesada FJ, Ruiz-Reyes N. Multichannel Blind Music Source Separation Using Directivity-Aware MNMF With Harmonicity Constraints. IEEE Access. 2022;10:17781-17795. doi:10.1109/ACCESS.2022.3150248
  • [5] Uhlich S, Giron F, Mitsufuji Y. Deep neural network based instrument extraction from music. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2015. doi:10.1109/ICASSP.2015.7178348
  • [6] Nishikimi R, Nakamura E, Goto M, Yoshii K. Audio-to-score singing transcription based on a CRNN-HSMM hybrid model. APSIPA Trans Signal Inf Process. 2021;10(1):e7. doi:10.1017/ATSIP.2021.4
  • [7] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, d’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [8] Sigtia S, Benetos E, DIxon S. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans Audio Speech Lang Process. 2016;24(5):927-939. doi:10.1109/TASLP.2016.2533858
  • [9] Seetharaman P, Pishdadian F, Pardo B. Music/Voice separation using the 2D fourier transform. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. ; 2017. doi:10.1109/WASPAA.2017.8169990
  • [10] Huang P Sen, Chen SD, Smaragdis P, Hasegawa-Johnson M. Singing-voice separation from monaural recordings using robust principal component analysis. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. ; 2012. doi:10.1109/ICASSP.2012.6287816
  • [11] Tervaniemi M. Musicians - Same or different. In: Annals of the New York Academy of Sciences. Vol 1169. ; 2009. doi:10.1111/j.1749-6632.2009.04591.x
  • [12] Andrianopoulou M. Aural Education: Reconceptualising Ear Training in Higher Music Learning. Taylor & Francis; 2019. https://books.google.com.tr/books?id=p_S2DwAAQBAJ
  • [13] Corey J. Technical ear training: Tools and practical methods. In: Proceedings of Meetings on Acoustics. Vol 19. AIP Publishing; 2013:025016-025016. doi:10.1121/1.4795853
  • [14] Chabriel G, Kleinsteuber M, Moreau E, Shen H, Tichavsky P, Yeredor A. Joint Matrices Decompositions and Blind Source Separation. A Survey of Methods, Identification and Applications. Signal Processing Magazine, IEEE. 2014;31:34-43. doi:10.1109/MSP.2014.2298045
  • [15] Luo Z, Li C, Zhu L. A comprehensive survey on blind source separation for wireless adaptive processing: Principles, perspectives, challenges and new research directions. IEEE Access. Published online 2018. doi:10.1109/ACCESS.2018.2879380
  • [16] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi:10.1038/nature14539
  • [17] Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, D’Avila Garcez AS, Dixon S. A hybrid recurrent neural network for music transcription. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol 2015-August. Institute of Electrical and Electronics Engineers Inc.; 2015:2061-2065. doi:10.1109/ICASSP.2015.7178333
  • [18] Brown J. Calculation of a Constant Q Spectral Transform. Journal of the Acoustical Society of America. 1991;89:425. doi:10.1121/1.400476
  • [19] Giannakopoulos T, Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Introduction to Audio Analysis: A MATLAB Approach. Published online 2014:1-266. doi:10.1016/C2012-0-03524-7
  • [20] AnthemScore 5 Music AI 2024. https://www.lunaverus.com/
  • [21] Benetos E, Cherla S, Weyde T. An Efficient Shift-Invariant Model for Polyphonic Music Transcription. https://code.soundsoftware.ac.uk/projects/amt_mssiplca_fast
There are 21 citations in total.

Details

Primary Language English
Subjects Information Systems (Other), Electrical Engineering (Other)
Journal Section Articles
Authors

Emin Germen 0000-0003-1301-3786

Can Karadoğan 0000-0003-3611-6980

Publication Date September 30, 2024
Submission Date April 11, 2024
Acceptance Date September 18, 2024
Published in Issue Year 2024

Cite

AMA Germen E, Karadoğan C. END-TO-END AUTOMATIC MUSIC TRANSCRIPTION OF POLYPHONIC QANUN AND OUD MUSIC USING DEEP NEURAL NETWORK. Estuscience - Se. September 2024;25(3):442-455. doi:10.18038/estubtda.1467350