Research Article

A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition

Volume: 38 Number: 1 March 1, 2025
EN

A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition

Abstract

The essential steps of our study are to quantify and classify the differences between real and fake speech signals. In this scope, the main aim is to use the salient feature learning ability of deep learning in our study. With the use of ensemble classification pipeline, the interpretable logical rules were used for generalized reasoning with the class activation maps to discriminate the different speech classes as correctly. Fake audio samples were generated by using Deep Convolutional Generative Adversarial Neural Network. Our experiments were conducted on three different language dataset such as Turkish, English languages and Bilingual. As a result of higher classification and recognition accuracy with the use of classification pipeline as compiled into a majority voting-based ensemble classifier, the experimental results were obtained for each individual language performance approximately as 90% for training and as 80.33% for testing stages for pipeline, and it reached as 73% for majority voting results considered together with the appropriate test cases as well. To extract semantically rich rules, an interpretable logical rules infrastructure was used to infer the correct fake speech from class activations of deep learning’s generative model. Discussion and conclusion based on scientific findings are included in our study.

Keywords

References

  1. [1] Imran, M., Ali, Z., Bakhsh, S. T., Akram, S., "Blind Detection of Copy-Move Forgery in Digital Audio Forensics", IEEE Access, 5: 12843-12855, (2017).
  2. [2] Mannepalli, K., SubbaRamaiah, V., Raghu, K., "Speech Forgery Detection of Framed Sentences In Audio Recordings Using DTW", European Journal of Molecular & Clinical Medicine, 7(8): 2269-2274, (2020).
  3. [3] Baskoro, A. B., Cahyani, N., Putrada, A. G., "Analysis of Voice Changes in Anti Forensic Activities Case Study: Voice Changer with Telephone Effect", International Journal on Information and Communication Technology (IJoICT), 6(2):64-77, (2020).
  4. [4] Shi, Y., Liu, H., Wang, Y., Cai, M., Xu, W., "Theory and application of audio-based assessment of cough", Journal of Sensors, Article ID: 9845321, 1–7, (2018).
  5. [5] Maher, R. C., "Audio forensic examination", IEEE Signal Processing Magazine, 26(2):84-94, (2009).
  6. [6] Ally, M., Alotaibi, M. S., "A novel deep learning model to detect COVID-19 based on wavelet features extracted from Mel-scale spectrogram of patients’ cough and breathing sounds", Informatics in Medicine Unlocked, 32:(101049), 1-11, (2022).
  7. [7] Lia, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M., "Deep Learning for Generic Object Detection: A Survey", International Journal of Computer Vision, 128, 261-218, (2020).
  8. [8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., "Generative Adversarial Networks", Communications of the ACM, 63(11):139-144, (2020).

Details

Primary Language

English

Subjects

Image Processing, Multimodal Analysis and Synthesis, Audio Processing, Deep Learning, Machine Learning (Other)

Journal Section

Research Article

Early Pub Date

December 11, 2024

Publication Date

March 1, 2025

Submission Date

September 8, 2023

Acceptance Date

November 12, 2024

Published in Issue

Year 2025 Volume: 38 Number: 1

APA
Boztepe, E. B., & Karasulu, B. (2025). A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition. Gazi University Journal of Science, 38(1), 75-97. https://doi.org/10.35378/gujs.1357317
AMA
1.Boztepe EB, Karasulu B. A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition. Gazi University Journal of Science. 2025;38(1):75-97. doi:10.35378/gujs.1357317
Chicago
Boztepe, Emre Beray, and Bahadir Karasulu. 2025. “A Deep Learning Approach Based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition”. Gazi University Journal of Science 38 (1): 75-97. https://doi.org/10.35378/gujs.1357317.
EndNote
Boztepe EB, Karasulu B (March 1, 2025) A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition. Gazi University Journal of Science 38 1 75–97.
IEEE
[1]E. B. Boztepe and B. Karasulu, “A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition”, Gazi University Journal of Science, vol. 38, no. 1, pp. 75–97, Mar. 2025, doi: 10.35378/gujs.1357317.
ISNAD
Boztepe, Emre Beray - Karasulu, Bahadir. “A Deep Learning Approach Based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition”. Gazi University Journal of Science 38/1 (March 1, 2025): 75-97. https://doi.org/10.35378/gujs.1357317.
JAMA
1.Boztepe EB, Karasulu B. A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition. Gazi University Journal of Science. 2025;38:75–97.
MLA
Boztepe, Emre Beray, and Bahadir Karasulu. “A Deep Learning Approach Based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition”. Gazi University Journal of Science, vol. 38, no. 1, Mar. 2025, pp. 75-97, doi:10.35378/gujs.1357317.
Vancouver
1.Emre Beray Boztepe, Bahadir Karasulu. A Deep Learning Approach based on Ensemble Classification Pipeline and Interpretable Logical Rules for Bilingual Fake Speech Recognition. Gazi University Journal of Science. 2025 Mar. 1;38(1):75-97. doi:10.35378/gujs.1357317