İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması

Amine Gonca Toprak; Sercan Çepni; Şükrü Ozan

doi:10.38016/jista.1563512

Araştırma Makalesi

İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması

Yıl 2025, Cilt: 8 Sayı: 2, 105 - 114, 29.09.2025

Amine Gonca Toprak , Sercan Çepni , Şükrü Ozan

https://doi.org/10.38016/jista.1563512

Öz

Konuşma işleme, konuşmacı güncesi çıkarımı, konuşma tanıma ve ses olayı tespiti gibi birçok uygulamada önemli bir rol oynamaktadır. Bu çalışma, firma çağrı merkezi kayıtları üzerinde karşılaştırmalı bir analiz sunmak amacıyla PyAnnote adlı güçlü, açık kaynaklı konuşmacı güncesi çıkarım aracını kullanmaktadır. PyAnnote modelleri, firma çağrı merkezi kayıtlarının etiketlenmesiyle elde edilen veri seti ile ince ayar yapılarak değerlendirilmiş ve baz model performansları ile karşılaştırılmıştır. Model performansları, DER metriği kullanılarak değerlendirilmiş ve sonuç olarak, ince ayar yapılan PyAnnote 3.1 modeli üstün performans göstermiştir. İnce ayar sonrası PyAnnote 3.1 versiyonunun DER puanı %21.9’dan %15.6’ya düşmüştür.

Anahtar Kelimeler

konuşmacı güncesi çıkarımı , pyannote , ince ayar

Destekleyen Kurum

TÜBİTAK TEYDEB

Proje Numarası

3221247

Teşekkür

Bu çalışma TÜBİTAK TEYDEB 1501 kapsamında desteklenen 3221247 numaralı "Ses Kayıtlarındaki Konuşmacı Sayısının Belirlenmesini ve Konuşmacı Seslerinin Ayrıştırılmasını Sağlayan Derin Yapay Sinir Ağı Mimarisi Tabanlı Sistemin Geliştirilmesi" isimli proje kapsamında gerçekleştirilmiştir. Çalışmamızı desteklediği için TÜBİTAK TEYDEB'e teşekkürlerimizi sunarız.

Kaynakça

Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125
Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397
Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219
Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212
Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045
Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1
Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3
Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603
Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094
Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610
Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z
Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2
Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1
Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082
Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2
Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809
Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002
Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science.
Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech & Language, 72(1), 101317. arXiv:2101.09624v4
Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585
Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2
Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477
Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596
Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256
Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2
Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750
Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3
Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719

Enhancing Speaker Diarization Accuracy of PyAnnote by Fine-tuning

Yıl 2025, Cilt: 8 Sayı: 2, 105 - 114, 29.09.2025

Amine Gonca Toprak , Sercan Çepni , Şükrü Ozan

https://doi.org/10.38016/jista.1563512

Öz

Speech processing plays a crucial role in various applications such as speaker diarization, speech recognition, and sound event detection. This study utilizes PyAnnote, a powerful open-source speaker diarization tool, to conduct a comparative analysis on company call center recordings. PyAnnote models were fine-tuned using a dataset obtained from the annotation of company call center recordings and evaluated against the baseline model performances. Model performances were assessed using the DER metric, and as a result, the fine-tuned PyAnnote 3.1 model demonstrated superior performance. After fine-tuning, the DER score of PyAnnote 3.1 decreased from 21.9% to 15.6%.

Anahtar Kelimeler

speaker diarization , pyannote , fine-tuning

Proje Numarası

3221247

Kaynakça

Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125
Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397
Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219
Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212
Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045
Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1
Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3
Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603
Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094
Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610
Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z
Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2
Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1
Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082
Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2
Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809
Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002
Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science.
Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech & Language, 72(1), 101317. arXiv:2101.09624v4
Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585
Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2
Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477
Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596
Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256
Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2
Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750
Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3
Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719

Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Doğal Dil İşleme
Bölüm	Araştırma Makalesi
Yazarlar	Amine Gonca Toprak 0000-0003-2425-5342 Sercan Çepni 0000-0002-3405-6059 Şükrü Ozan 0000-0002-3227-348X
Proje Numarası	3221247
Yayımlanma Tarihi	29 Eylül 2025
Gönderilme Tarihi	8 Ekim 2024
Kabul Tarihi	21 Nisan 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Toprak, A. G., Çepni, S., & Ozan, Ş. (2025). İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. Journal of Intelligent Systems: Theory and Applications, 8(2), 105-114. https://doi.org/10.38016/jista.1563512
AMA	Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. Eylül 2025;8(2):105-114. doi:10.38016/jista.1563512
Chicago	Toprak, Amine Gonca, Sercan Çepni, ve Şükrü Ozan. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications 8, sy. 2 (Eylül 2025): 105-14. https://doi.org/10.38016/jista.1563512.
EndNote	Toprak AG, Çepni S, Ozan Ş (01 Eylül 2025) İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. Journal of Intelligent Systems: Theory and Applications 8 2 105–114.
IEEE	A. G. Toprak, S. Çepni, ve Ş. Ozan, “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”, jista, c. 8, sy. 2, ss. 105–114, 2025, doi: 10.38016/jista.1563512.
ISNAD	Toprak, Amine Gonca vd. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications 8/2 (Eylül2025), 105-114. https://doi.org/10.38016/jista.1563512.
JAMA	Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. 2025;8:105–114.
MLA	Toprak, Amine Gonca vd. “İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması”. Journal of Intelligent Systems: Theory and Applications, c. 8, sy. 2, 2025, ss. 105-14, doi:10.38016/jista.1563512.
Vancouver	Toprak AG, Çepni S, Ozan Ş. İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması. jista. 2025;8(2):105-14.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Zeki Sistemler Teori ve Uygulamaları Dergisi