Araştırma Makalesi
BibTex RIS Kaynak Göster

İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması

Yıl 2025, Cilt: 8 Sayı: 2, 105 - 114, 29.09.2025
https://doi.org/10.38016/jista.1563512

Öz

Konuşma işleme, konuşmacı güncesi çıkarımı, konuşma tanıma ve ses olayı tespiti gibi birçok uygulamada önemli bir rol oynamaktadır. Bu çalışma, firma çağrı merkezi kayıtları üzerinde karşılaştırmalı bir analiz sunmak amacıyla PyAnnote adlı güçlü, açık kaynaklı konuşmacı güncesi çıkarım aracını kullanmaktadır. PyAnnote modelleri, firma çağrı merkezi kayıtlarının etiketlenmesiyle elde edilen veri seti ile ince ayar yapılarak değerlendirilmiş ve baz model performansları ile karşılaştırılmıştır. Model performansları, DER metriği kullanılarak değerlendirilmiş ve sonuç olarak, ince ayar yapılan PyAnnote 3.1 modeli üstün performans göstermiştir. İnce ayar sonrası PyAnnote 3.1 versiyonunun DER puanı %21.9’dan %15.6’ya düşmüştür.

Destekleyen Kurum

TÜBİTAK TEYDEB

Proje Numarası

3221247

Teşekkür

Bu çalışma TÜBİTAK TEYDEB 1501 kapsamında desteklenen 3221247 numaralı "Ses Kayıtlarındaki Konuşmacı Sayısının Belirlenmesini ve Konuşmacı Seslerinin Ayrıştırılmasını Sağlayan Derin Yapay Sinir Ağı Mimarisi Tabanlı Sistemin Geliştirilmesi" isimli proje kapsamında gerçekleştirilmiştir. Çalışmamızı desteklediği için TÜBİTAK TEYDEB'e teşekkürlerimizi sunarız.

Kaynakça

  • Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125
  • Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397
  • Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219
  • Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212
  • Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045
  • Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1
  • Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162
  • Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3
  • Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603
  • Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094
  • Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610
  • Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z
  • Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2
  • Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1
  • Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082
  • Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2
  • Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809
  • Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002
  • Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science.
  • Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech & Language, 72(1), 101317. arXiv:2101.09624v4
  • Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585
  • Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2
  • Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477
  • Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596
  • Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256
  • Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2
  • Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750
  • Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3
  • Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719

Enhancing Speaker Diarization Accuracy of PyAnnote by Fine-tuning

Yıl 2025, Cilt: 8 Sayı: 2, 105 - 114, 29.09.2025
https://doi.org/10.38016/jista.1563512

Öz

Speech processing plays a crucial role in various applications such as speaker diarization, speech recognition, and sound event detection. This study utilizes PyAnnote, a powerful open-source speaker diarization tool, to conduct a comparative analysis on company call center recordings. PyAnnote models were fine-tuned using a dataset obtained from the annotation of company call center recordings and evaluated against the baseline model performances. Model performances were assessed using the DER metric, and as a result, the fine-tuned PyAnnote 3.1 model demonstrated superior performance. After fine-tuning, the DER score of PyAnnote 3.1 decreased from 21.9% to 15.6%.

Proje Numarası

3221247

Kaynakça

  • Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125
  • Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397
  • Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219
  • Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212
  • Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045
  • Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1
  • Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162
  • Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3
  • Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603
  • Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094
  • Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610
  • Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z
  • Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2
  • Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1
  • Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082
  • Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2
  • Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809
  • Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002
  • Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science.
  • Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech & Language, 72(1), 101317. arXiv:2101.09624v4
  • Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585
  • Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2
  • Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477
  • Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596
  • Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256
  • Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2
  • Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750
  • Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3
  • Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719
Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Doğal Dil İşleme
Bölüm Araştırma Makalesi
Yazarlar

Amine Gonca Toprak 0000-0003-2425-5342

Sercan Çepni 0000-0002-3405-6059

Şükrü Ozan 0000-0002-3227-348X

Proje Numarası 3221247
Yayımlanma Tarihi 29 Eylül 2025
Gönderilme Tarihi 8 Ekim 2024
Kabul Tarihi 21 Nisan 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 8 Sayı: 2

Kaynak Göster

APA Toprak, A. G., Çepni, S., & Ozan, Ş. (2025). İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. Journal of Intelligent Systems: Theory and Applications, 8(2), 105-114. https://doi.org/10.38016/jista.1563512
AMA Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. Eylül 2025;8(2):105-114. doi:10.38016/jista.1563512
Chicago Toprak, Amine Gonca, Sercan Çepni, ve Şükrü Ozan. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications 8, sy. 2 (Eylül 2025): 105-14. https://doi.org/10.38016/jista.1563512.
EndNote Toprak AG, Çepni S, Ozan Ş (01 Eylül 2025) İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. Journal of Intelligent Systems: Theory and Applications 8 2 105–114.
IEEE A. G. Toprak, S. Çepni, ve Ş. Ozan, “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”, jista, c. 8, sy. 2, ss. 105–114, 2025, doi: 10.38016/jista.1563512.
ISNAD Toprak, Amine Gonca vd. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications 8/2 (Eylül2025), 105-114. https://doi.org/10.38016/jista.1563512.
JAMA Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. 2025;8:105–114.
MLA Toprak, Amine Gonca vd. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications, c. 8, sy. 2, 2025, ss. 105-14, doi:10.38016/jista.1563512.
Vancouver Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. 2025;8(2):105-14.

Zeki Sistemler Teori ve Uygulamaları Dergisi