Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Gökay Dişken; Zekeriya Tüfekci

doi:10.18466/cbayarfbe.1132319

EN

Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Öz

Audio spoof detection gained attention of the researchers recently, as it is vital to detect spoofed speech for automatic speaker recognition systems. Publicly available datasets also accelerated the studies in this area. Many different features and classifiers have been proposed to overcome the spoofed speech detection problem, and some of them achieved considerably high performances. However, under additive noise, the spoof detection performance drops rapidly. On the other hand, number of studies about robust spoofed speech detection is very limited. The problem becomes more interesting as the conventional speech enhancement methods reportedly performed worse than no enhancement. In this work, i-vectors are used for spoof detection, and discriminative denoising autoencoder (DAE) network is used to obtain enhanced (clean) i-vectors from their noisy counterparts. Once the enhanced i-vectors are obtained, they can be treated as normal i-vectors and can be scored/classified without any modifications in the classifier part. Data from ASVspoof 2015 challenge is used with five different additive noise types, following a similar configuration of previous studies. The DAE is trained with a multicondition manner, using both clean and corrupted i-vectors. Three different noise types at various signal-to-noise ratios are used to create corrupted i-vectors, and two different noise types are used only in the test stage to simulate unknown noise conditions. Experimental results showed that the proposed DAE approach is more effective than the conventional speech enhancement methods.

Anahtar Kelimeler

Destekleyen Kurum

Tübitak

Proje Numarası

121E057

Teşekkür

This work was supported by TUBITAK under project number 121E057.

Kaynakça

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.

[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yazarlar

Gökay Dişken ^*
0000-0002-8680-0636
Türkiye

Zekeriya Tüfekci
0000-0001-7835-2741
Türkiye

Yayımlanma Tarihi

29 Haziran 2023

Gönderilme Tarihi

20 Haziran 2022

Kabul Tarihi

8 Haziran 2023

Yayımlandığı Sayı

Yıl 2023 Cilt: 19 Sayı: 2

DOI

https://doi.org/10.18466/cbayarfbe.1132319

IZ

https://izlik.org/JA23DX59ZP

Kaynak Göster

RIS / Bibtex

APA

Dişken, G., & Tüfekci, Z. (2023). Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science, 19(2), 167-174. https://doi.org/10.18466/cbayarfbe.1132319

AMA

1.Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science. 2023;19(2):167-174. doi:10.18466/cbayarfbe.1132319

Chicago

Dişken, Gökay, ve Zekeriya Tüfekci. 2023. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19 (2): 167-74. https://doi.org/10.18466/cbayarfbe.1132319.

EndNote

Dişken G, Tüfekci Z (01 Haziran 2023) Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science 19 2 167–174.

IEEE

[1]G. Dişken ve Z. Tüfekci, “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”, Celal Bayar University Journal of Science, c. 19, sy 2, ss. 167–174, Haz. 2023, doi: 10.18466/cbayarfbe.1132319.

ISNAD

Dişken, Gökay - Tüfekci, Zekeriya. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19/2 (01 Haziran 2023): 167-174. https://doi.org/10.18466/cbayarfbe.1132319.

JAMA

1.Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science. 2023;19:167–174.

MLA

Dişken, Gökay, ve Zekeriya Tüfekci. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science, c. 19, sy 2, Haziran 2023, ss. 167-74, doi:10.18466/cbayarfbe.1132319.

Vancouver

1.Gökay Dişken, Zekeriya Tüfekci. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science. 01 Haziran 2023;19(2):167-74. doi:10.18466/cbayarfbe.1132319

Cited By

Audio Deepfake Detection Using a Hybrid Model of Convolutional and Bidirectional Long Short-term Memory Networks

Advances in Applied Sciences

https://doi.org/10.11648/j.aas.20261101.11