TY - JOUR T1 - İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması TT - Enhancing Speaker Diarization Accuracy of PyAnnote by Fine-tuning AU - Toprak, Amine Gonca AU - Çepni, Sercan AU - Ozan, Şükrü PY - 2025 DA - September Y2 - 2025 DO - 10.38016/jista.1563512 JF - Journal of Intelligent Systems: Theory and Applications JO - JISTA PB - Özer UYGUN WT - DergiPark SN - 2651-3927 SP - 105 EP - 114 VL - 8 IS - 2 LA - tr AB - Konuşma işleme, konuşmacı güncesi çıkarımı, konuşma tanıma ve ses olayı tespiti gibi birçok uygulamada önemli bir rol oynamaktadır. Bu çalışma, firma çağrı merkezi kayıtları üzerinde karşılaştırmalı bir analiz sunmak amacıyla PyAnnote adlı güçlü, açık kaynaklı konuşmacı güncesi çıkarım aracını kullanmaktadır. PyAnnote modelleri, firma çağrı merkezi kayıtlarının etiketlenmesiyle elde edilen veri seti ile ince ayar yapılarak değerlendirilmiş ve baz model performansları ile karşılaştırılmıştır. Model performansları, DER metriği kullanılarak değerlendirilmiş ve sonuç olarak, ince ayar yapılan PyAnnote 3.1 modeli üstün performans göstermiştir. İnce ayar sonrası PyAnnote 3.1 versiyonunun DER puanı %21.9’dan %15.6’ya düşmüştür. KW - konuşmacı güncesi çıkarımı KW - pyannote KW - ince ayar N2 - Speech processing plays a crucial role in various applications such as speaker diarization, speech recognition, and sound event detection. This study utilizes PyAnnote, a powerful open-source speaker diarization tool, to conduct a comparative analysis on company call center recordings. PyAnnote models were fine-tuned using a dataset obtained from the annotation of company call center recordings and evaluated against the baseline model performances. Model performances were assessed using the DER metric, and as a result, the fine-tuned PyAnnote 3.1 model demonstrated superior performance. After fine-tuning, the DER score of PyAnnote 3.1 decreased from 21.9% to 15.6%. CR - Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125 CR - Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397 CR - Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219 CR - Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212 CR - Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045 CR - Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1 CR - Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162 CR - Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3 CR - Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603 CR - Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094 CR - Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610 CR - Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z CR - Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2 CR - Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1 CR - Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082 CR - Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2 CR - Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809 CR - Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002 CR - Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science. CR - Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech & Language, 72(1), 101317. arXiv:2101.09624v4 CR - Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585 CR - Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2 CR - Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477 CR - Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596 CR - Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256 CR - Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2 CR - Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750 CR - Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3 CR - Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719 UR - https://doi.org/10.38016/jista.1563512 L1 - https://dergipark.org.tr/tr/download/article-file/4272622 ER -