Robust Spoofed Speech Detection with Denoised I-vectors

Gökay Dişken

doi:10.35378/gujs.1062788

Research Article

Year 2023, Volume: 36 Issue: 4, 1553 - 1561, 01.12.2023

Gökay Dişken

https://doi.org/10.35378/gujs.1062788

Abstract

Project Number

121E057

References

[1] Reynolds, D.A., Quatieri, T.F., and Dunn, R.B., “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, 10: 19–41, (2000).
[2] Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., and Ouellet, P., “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, 19: 788–798, (2011).
[3] Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., and Khudanpur, S., “Deep neural network-based speaker embeddings for end-to-end speaker verification”, in 2016 IEEE Spoken Language Technology Workshop (SLT), 165–170, (2016).
[4] Wu, Z., Chng, E.S., and Li, H., “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition”, in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2: 1698–1701, (2012).
[5] Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., “SAS: A speaker verification spoofing database containing diverse attacks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4440–4444, (2015).
[6] Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., “Spoofing and countermeasures for speaker verification: A survey”, Speech Communication, 66: 130–153, (2015).
[7] Gales, M.F.J., and Young, S.J., “Robust speech recognition in additive and convolutional noise using parallel model combination”, Computer Speech and Language, 9(4): 289–307, (1995).
[8] Fujimoto, M., and Riki, Y.A., “Robust speech recognition in additive and channel noise environments using GMM and EM algorithm”, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1: I-941–4, (2004).
[9] Zhao, X., Wang, Y., and Wang, D., “Robust Speaker Identification in Noisy and Reverberant Conditions”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4): 836–845, (2014).
[10] Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K.A., and Yamagishi, J., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, in Odyssey 2018 The Speaker and Language Recognition Workshop, 296–303, (2018).
[11] Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, and M., Delgado, H., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge”, IEEE Journal of Selected Topics on Signal Processing, 11(4): 588–604, (2017).
[12] Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J., Evans, N., Kinnunen, T., and Lee, K.A, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Interspeech 2019, 1008–1012, (2019).
[13] Sahidullah, M., Kinnunen, T., and Hanilçi, C., “A comparison of features for synthetic speech detection”, in INTERSPEECH 2015, 2087–2091, (2015).
[14] Hanilçi, C., Kinnunen, T., Sahidullah, M., and Sizov, A., “Classifiers for Synthetic Speech Detection: A Comparison”, in INTERSPEECH 2015, 2057–2061, (2015).
[15] Hanilçi, C., Kinnunen, T., Sahidullah, M., and Sizov, A., “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise”, Speech Communication, 85: 83–97, (2016).
[16] Todisco, M., Delgado, H., and Evans, N., “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification”, Computer Speech and Language, 45: 516–535, (2017).
[17] Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A.M., “A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection”, in Interspeech 2019, 1068–1072, (2019).
[18] Zhang, C., Yu, C., and Hansen, J.H.L., “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing”, IEEE Journal of Selected Topics on Signal Processing, 11(4): 684–694, (2017).
[19] Xiao, X., Tian, X., Du, S., Xu, Chng, E.S., and Li, H., “Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge”, in INTERSPEECH 2015, 2052–2056, (2015).
[20] Dua, M., Jain, C., and Kumar, S., “LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems”, Journal of Ambient Intelligence and Humanized Computing, 12(2): 1–16, (2021).
[21] Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A.M., “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12): 1985–1999, (2019).
[22] Qian, Y., Chen, N., Dinkel, H., and Wu, Z., “Deep Feature Engineering for Noise Robust Spoofing Detection”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(10): 1942–1955, (2017).
[23] Loizou, P.C., Speech enhancement: Theory and practice. Taylor & Francis, (2013).
[24] Mahto, S., Yamamoto, H., and Koshinaka, T., “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition”, in Interspeech 2017, 3722–3726, (2017).
[25] Garcia-Romero, D., and Espy-Wilson, C.Y., “Analysis of I-vector Length Normalization in Speaker Recognition Systems”, in INTERSPEECH 2011, 249–252, (2011).
[26] Yamagishi, J., Lee, K.A. and Wang, L., “PLDA in the i-supervector space for text-independent speaker verification”, EURASIP Journal on Audio, Speech, and Music Processing, 2014(29): 1-13, (2014).
[27] Ben Kheder, W., Matrouf, D., Bousquet, P.M., Bonastre, J.F., and Ajili, M., “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition”, Computer Speech and Language, 45: 104–122, (2017).
[28] Ben Kheder, W., Matrouf, D., Ajili, M., and Bonastre, J.F., “A Unified Joint Model to Deal with Nuisance Variabilities in the i-Vector Space”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3): 633–645, (2018).
[29] Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A., “Extracting and composing robust features with denoising autoencoders”, in Proceedings of the 25th international conference on Machine learning, 1096–1103, (2008).
[30] Gómez Alanís, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A., “A Deep Identity Representation for Noise Robust Spoofing Detection”, in Interspeech 2018, 676–680, (2018).
[31] Varga, A., and Steeneken, H.J.M., “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems”, Speech Communication, 12(3): 247–251, (1993).
[32] Dean, D., Kanagasundaram, A., Ghaemmaghami, H., Rahman, M.H., and Sridharan, S., “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition”, in Interspeech 2015, 3456–3460, (2015).
[33] Guo, J., Xu, N., Qian, K., Shi, Y., Xu, K., Wu, Y., and Alwan, A., “Deep neural network based i-vector mapping for speaker verification using short utterances”, Speech Communication, 105: 92–102, (2018).

Robust Spoofed Speech Detection with Denoised I-vectors

Year 2023, Volume: 36 Issue: 4, 1553 - 1561, 01.12.2023

Gökay Dişken

https://doi.org/10.35378/gujs.1062788

Abstract

Spoofed speech detection is recently gaining attention of the researchers as speaker verification is shown to be vulnerable to spoofing attacks such as voice conversion, speech synthesis, replay, and impersonation. Although various different methods have been proposed to detect spoofed speech, their performances decrease dramatically under the mismatched conditions due to the additive or reverberant noises. Conventional speech enhancement methods fail to recover the performance gap, hence more advanced techniques seem to be necessary to solve the noisy spoofed speech detection problem. In this work, Denoising Autoencoder (DAE) is used to obtain clean estimates of i-vectors from their noisy versions. ASVspoof 2015 database is used in the experiments with five different noise types, added to the original utterances at 0, 10, and 20 dB signal-to-noise ratios (SNR). The experimental results verified that the DAE provides a more robust spoof detection, where the conventional methods fail.

Keywords

Denoising autoencoder , Speaker verification , Spoof detection

Supporting Institution

Tübitak

Project Number

121E057

Thanks

This study was supported by TUBITAK under project no. 121E057.

References

[1] Reynolds, D.A., Quatieri, T.F., and Dunn, R.B., “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, 10: 19–41, (2000).
[2] Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., and Ouellet, P., “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, 19: 788–798, (2011).
[3] Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., and Khudanpur, S., “Deep neural network-based speaker embeddings for end-to-end speaker verification”, in 2016 IEEE Spoken Language Technology Workshop (SLT), 165–170, (2016).
[4] Wu, Z., Chng, E.S., and Li, H., “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition”, in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2: 1698–1701, (2012).
[5] Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., “SAS: A speaker verification spoofing database containing diverse attacks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4440–4444, (2015).
[6] Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., “Spoofing and countermeasures for speaker verification: A survey”, Speech Communication, 66: 130–153, (2015).
[7] Gales, M.F.J., and Young, S.J., “Robust speech recognition in additive and convolutional noise using parallel model combination”, Computer Speech and Language, 9(4): 289–307, (1995).
[8] Fujimoto, M., and Riki, Y.A., “Robust speech recognition in additive and channel noise environments using GMM and EM algorithm”, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1: I-941–4, (2004).
[9] Zhao, X., Wang, Y., and Wang, D., “Robust Speaker Identification in Noisy and Reverberant Conditions”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4): 836–845, (2014).
[10] Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K.A., and Yamagishi, J., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, in Odyssey 2018 The Speaker and Language Recognition Workshop, 296–303, (2018).
[11] Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, and M., Delgado, H., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge”, IEEE Journal of Selected Topics on Signal Processing, 11(4): 588–604, (2017).
[12] Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J., Evans, N., Kinnunen, T., and Lee, K.A, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Interspeech 2019, 1008–1012, (2019).
[13] Sahidullah, M., Kinnunen, T., and Hanilçi, C., “A comparison of features for synthetic speech detection”, in INTERSPEECH 2015, 2087–2091, (2015).
[14] Hanilçi, C., Kinnunen, T., Sahidullah, M., and Sizov, A., “Classifiers for Synthetic Speech Detection: A Comparison”, in INTERSPEECH 2015, 2057–2061, (2015).
[15] Hanilçi, C., Kinnunen, T., Sahidullah, M., and Sizov, A., “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise”, Speech Communication, 85: 83–97, (2016).
[16] Todisco, M., Delgado, H., and Evans, N., “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification”, Computer Speech and Language, 45: 516–535, (2017).
[17] Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A.M., “A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection”, in Interspeech 2019, 1068–1072, (2019).
[18] Zhang, C., Yu, C., and Hansen, J.H.L., “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing”, IEEE Journal of Selected Topics on Signal Processing, 11(4): 684–694, (2017).
[19] Xiao, X., Tian, X., Du, S., Xu, Chng, E.S., and Li, H., “Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge”, in INTERSPEECH 2015, 2052–2056, (2015).
[20] Dua, M., Jain, C., and Kumar, S., “LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems”, Journal of Ambient Intelligence and Humanized Computing, 12(2): 1–16, (2021).
[21] Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A.M., “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12): 1985–1999, (2019).
[22] Qian, Y., Chen, N., Dinkel, H., and Wu, Z., “Deep Feature Engineering for Noise Robust Spoofing Detection”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(10): 1942–1955, (2017).
[23] Loizou, P.C., Speech enhancement: Theory and practice. Taylor & Francis, (2013).
[24] Mahto, S., Yamamoto, H., and Koshinaka, T., “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition”, in Interspeech 2017, 3722–3726, (2017).
[25] Garcia-Romero, D., and Espy-Wilson, C.Y., “Analysis of I-vector Length Normalization in Speaker Recognition Systems”, in INTERSPEECH 2011, 249–252, (2011).
[26] Yamagishi, J., Lee, K.A. and Wang, L., “PLDA in the i-supervector space for text-independent speaker verification”, EURASIP Journal on Audio, Speech, and Music Processing, 2014(29): 1-13, (2014).
[27] Ben Kheder, W., Matrouf, D., Bousquet, P.M., Bonastre, J.F., and Ajili, M., “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition”, Computer Speech and Language, 45: 104–122, (2017).
[28] Ben Kheder, W., Matrouf, D., Ajili, M., and Bonastre, J.F., “A Unified Joint Model to Deal with Nuisance Variabilities in the i-Vector Space”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3): 633–645, (2018).
[29] Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A., “Extracting and composing robust features with denoising autoencoders”, in Proceedings of the 25th international conference on Machine learning, 1096–1103, (2008).
[30] Gómez Alanís, A., Peinado, A.M., Gonzalez, J.A., and Gomez, A., “A Deep Identity Representation for Noise Robust Spoofing Detection”, in Interspeech 2018, 676–680, (2018).
[31] Varga, A., and Steeneken, H.J.M., “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems”, Speech Communication, 12(3): 247–251, (1993).
[32] Dean, D., Kanagasundaram, A., Ghaemmaghami, H., Rahman, M.H., and Sridharan, S., “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition”, in Interspeech 2015, 3456–3460, (2015).
[33] Guo, J., Xu, N., Qian, K., Shi, Y., Xu, K., Wu, Y., and Alwan, A., “Deep neural network based i-vector mapping for speaker verification using short utterances”, Speech Communication, 105: 92–102, (2018).

There are 33 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Article
Authors	Gökay Dişken 0000-0002-8680-0636
Project Number	121E057
Publication Date	December 1, 2023
Published in Issue	Year 2023 Volume: 36 Issue: 4

Cite

APA	Dişken, G. (2023). Robust Spoofed Speech Detection with Denoised I-vectors. Gazi University Journal of Science, 36(4), 1553-1561. https://doi.org/10.35378/gujs.1062788
AMA	Dişken G. Robust Spoofed Speech Detection with Denoised I-vectors. Gazi University Journal of Science. December 2023;36(4):1553-1561. doi:10.35378/gujs.1062788
Chicago	Dişken, Gökay. “Robust Spoofed Speech Detection With Denoised I-Vectors”. Gazi University Journal of Science 36, no. 4 (December 2023): 1553-61. https://doi.org/10.35378/gujs.1062788.
EndNote	Dişken G (December 1, 2023) Robust Spoofed Speech Detection with Denoised I-vectors. Gazi University Journal of Science 36 4 1553–1561.
IEEE	G. Dişken, “Robust Spoofed Speech Detection with Denoised I-vectors”, Gazi University Journal of Science, vol. 36, no. 4, pp. 1553–1561, 2023, doi: 10.35378/gujs.1062788.
ISNAD	Dişken, Gökay. “Robust Spoofed Speech Detection With Denoised I-Vectors”. Gazi University Journal of Science 36/4 (December2023), 1553-1561. https://doi.org/10.35378/gujs.1062788.
JAMA	Dişken G. Robust Spoofed Speech Detection with Denoised I-vectors. Gazi University Journal of Science. 2023;36:1553–1561.
MLA	Dişken, Gökay. “Robust Spoofed Speech Detection With Denoised I-Vectors”. Gazi University Journal of Science, vol. 36, no. 4, 2023, pp. 1553-61, doi:10.35378/gujs.1062788.
Vancouver	Dişken G. Robust Spoofed Speech Detection with Denoised I-vectors. Gazi University Journal of Science. 2023;36(4):1553-61.

Article Files

Full Text