TY  - JOUR
T1  - İnce Ayar ile PyAnnote Kullanılarak Konuşmacı Güncesi Çıkarımı Doğruluğunun Artırılması
TT  - Enhancing Speaker Diarization Accuracy of PyAnnote by Fine-tuning
AU  - Toprak, Amine Gonca
AU  - Çepni, Sercan
AU  - Ozan, Şükrü
PY  - 2025
DA  - September
Y2  - 2025
DO  - 10.38016/jista.1563512
JF  - Journal of Intelligent Systems: Theory and Applications
JO  - JISTA
PB  - Özer UYGUN
WT  - DergiPark
SN  - 2651-3927
SP  - 105
EP  - 114
VL  - 8
IS  - 2
LA  - tr
AB  - Konuşma işleme, konuşmacı güncesi çıkarımı, konuşma tanıma ve ses olayı tespiti gibi birçok uygulamada önemli bir rol oynamaktadır. Bu çalışma, firma çağrı merkezi kayıtları üzerinde karşılaştırmalı bir analiz sunmak amacıyla PyAnnote adlı güçlü, açık kaynaklı konuşmacı güncesi çıkarım aracını kullanmaktadır. PyAnnote modelleri, firma çağrı merkezi kayıtlarının etiketlenmesiyle elde edilen veri seti ile ince ayar yapılarak değerlendirilmiş ve baz model performansları ile karşılaştırılmıştır. Model performansları, DER metriği kullanılarak değerlendirilmiş ve sonuç olarak, ince ayar yapılan PyAnnote 3.1 modeli üstün performans göstermiştir. İnce ayar sonrası PyAnnote 3.1 versiyonunun DER puanı %21.9’dan %15.6’ya düşmüştür.
KW  - konuşmacı güncesi çıkarımı
KW  - pyannote
KW  - ince ayar
N2  - Speech processing plays a crucial role in various applications such as speaker diarization, speech recognition, and sound event detection. This study utilizes PyAnnote, a powerful open-source speaker diarization tool, to conduct a comparative analysis on company call center recordings. PyAnnote models were fine-tuned using a dataset obtained from the annotation of company call center recordings and evaluated against the baseline model performances. Model performances were assessed using the DER metric, and as a result, the fine-tuned PyAnnote 3.1 model demonstrated superior performance. After fine-tuning, the DER score of PyAnnote 3.1 decreased from 21.9% to 15.6%.
CR  - Ahmad, R., Zubair, S., 2019. Unsupervised deep feature embeddings for speaker diarization. Turkish Journal of Electrical Engineering and Computer Sciences, 27(4), 3138-3149. DOI:10.3906/elk-1901-125
CR  - Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O., 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 356-370. HAL Id: hal-00733397
CR  - Bonastre, J.-F., Wils, F., Meignier, S., 2005. ALIZE, a free toolkit for speaker recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP &#039;05), 1, I/737-I/740. DOI:10.1109/ICASSP.2005.1415219
CR  - Bredin, H., 2023. pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. 24th Interspeech Conference (INTERSPEECH 2023), 1983-1987. HAL Id: hal-04247212
CR  - Bredin, H., Laurent, A., 2021. End-to-end speaker segmentation for overlap-aware resegmentation. arXiv:2104.04045
CR  - Bredin, H., Yin, R., Coria, J.M., Gelly, G., Korshunov, P., Lavechin, M., Gill, M.P., 2020. Pyannote.audio: Neural building blocks for speaker diarization. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7124-7128. arXiv:1911.01255v1
CR  - Broux, P.-A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., Meignier, S., 2018. S4D: Speaker Diarization Toolkit in Python. Interspeech 2018, 1368-1372. HAL Id: hal-02280162
CR  - Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Wellner, P., 2005. The AMI meeting corpus: A pre-announcement. International Workshop on Machine Learning for Multimodal Interaction, 28-39. DOI:10.1007/11677482_3
CR  - Fu, Y., Cheng, L., Lv, S., Jv, Y., Kong, Y., Chen, Z., Chen, J., 2021. Aishell-4: An open source dataset for speech enhancement, separation, recognition, and speaker diarization in a conference scenario. arXiv:2104.03603
CR  - Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A., 2017. Speaker diarization using deep neural network embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4930-4934. DOI:10.1109/ICASSP.2017.7953094
CR  - Giannakopoulos, T., 2015. pyaudioanalysis: An open-source python library for audio signal analysis. PloS One, 10(12), e0144610. DOI:10.1371/journal.pone.0144610
CR  - Graf, S., Herbig, T., Buck, M., Schmidt, G., 2015. Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 1-15. DOI:10.1186/s13634-015-0277-z
CR  - Horiguchi, S., Garcia, P., Fujita, Y., Watanabe, S., Nagamatsu, K., 2021. End-to-end speaker diarization as post-processing. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7188-7192. arXiv:2012.10055v2
CR  - Huang, Z., Watanabe, S., Fujita, Y., García, P., Shao, Y., Povey, D., Khudanpur, S., 2020. Speaker diarization with region proposal network. ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6514-6518. arXiv:2002.06220v1
CR  - Khoma, V., Khoma, Y., Brydinskyi, V., Konovalov, A., 2023. Development of supervised speaker diarization system based on the PyAnnote Audio Processing Library. Sensors, 23(4), 2082. DOI:10.3390/s23042082
CR  - Landini, F., Glembek, O., Matějka, P., Rohdin, J., Burget, L., Diez, M., Silnova, A., 2021. Analysis of the BUT diarization system for VoxConverse challenge. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5819-5823. arXiv:2010.11718v2
CR  - Madikeri, S., Bourlard, H., 2015. KL-HMM based speaker diarization system for meetings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4435-4439. DOI:10.1109/ICASSP.2015.7178809
CR  - Moattar, M.H., Homayounpour, M.M., 2012. A review on speaker diarization systems and approaches. Speech Communication, 54(10), 1065-1103. DOI:10.1016/j.specom.2012.05.002
CR  - Mul, A., 2023. Enhancing Dutch audio transcription through integration of speaker diarization into the automatic speech recognition model Whisper. Master’s thesis, Utrecht University, Applied Data Science.
CR  - Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S., 2022. A review of speaker diarization: Recent advances with deep learning. Computer Speech &amp; Language, 72(1), 101317. arXiv:2101.09624v4
CR  - Ravanelli, M., Bengio, Y., 2018. Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021-1028. DOI:10.1109/SLT.2018.8639585
CR  - Ravanelli, M., Parcollet, T., Bengio, Y., 2019. The pytorch-kaldi speech recognition toolkit. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6465-6469. arXiv:1811.07453v2
CR  - Ryant, N., Singh, P., Krishnamohan, V., Varma, R., Church, K., Cieri, C., Liberman, M., 2020. The third DIHARD diarization challenge. arXiv:2012.01477
CR  - Sharma, V., Zhang, Z., Neubert, Z., Dyreson, C., 2020. Speaker diarization: Using recurrent neural networks. arXiv:2006.05596
CR  - Tranter, S.E., Reynolds, D.A., 2006. An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557-1565. DOI:10.1109/TASL.2006.878256
CR  - Xiao, X., Kanda, N., Chen, Z., Zhou, T., Yoshioka, T., Chen, S., Gong, Y., 2021. Microsoft speaker diarization system for the VoxCeleb speaker recognition challenge 2020. ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5824-5828. arXiv:2010.11458v2
CR  - Yin, R., Bredin, H., Barras, C., 2018. Neural speech turn segmentation and affinity propagation for speaker diarization. Interspeech 2018, 1393-1397. DOI:10.21437/Interspeech.2018-1750
CR  - Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Bu, H., 2022. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6167-6171. arXiv:2110.07393v3
CR  - Zhang, A., Wang, Q., Zhu, Z., Paisley, J., Wang, C., 2019. Fully supervised speaker diarization. ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6301-6305. arXiv:1810.04719
UR  - https://doi.org/10.38016/jista.1563512
L1  - https://dergipark.org.tr/tr/download/article-file/4272622
ER  -