Research Article
BibTex RIS Cite

Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance

Year 2025, Volume: 8 Issue: 1, 222 - 247, 17.01.2025
https://doi.org/10.47495/okufbed.1457532

Abstract

This study investigates the effects of audio processing and filtering strategies to enhance the performance of speech recognition systems in noisy environments. The focus is on the Short-Time Fourier Transform (STFT) operations applied to noisy audio files and noise reduction procedures. While STFT operations form the basis for detecting noise and analyzing the speech signal in the frequency domain, noise reduction steps involve threshold-based masking and convolution operations. The results obtained demonstrate a significant improvement in speech recognition accuracy in noisy environments through audio processing and filtering strategies. A detailed analysis of the graphs provides guidance for evaluating the effectiveness of noise reduction procedures and serves as a roadmap for future research. This study emphasizes the critical importance of audio processing and filtering strategies in improving the performance of speech recognition systems in noisy environments, laying a foundation for future studies.

References

  • Ali MH., Jaber MM., Abd SK., Rehman A., Awan MJ., Vitkutė-Adžgauskienė D., Damaševičius R., Bahaj SA. Harris hawks sparse auto-encoder networks for automatic speech recognition system. Applied Sciences 2022; 12(3): 1091.
  • Anggriawan DO., Wahjono E., Sudiharto I., Firdaus AA., Putri DNN., Budikarso A. Identification of short duration voltage variations based on short time Fourier transform and artificial neural network. 2020 International Electronics Symposium 2020; 43-47.
  • Bharti D., Kukana P. A hybrid machine learning model for emotion recognition from speech signals. International Conference on Smart Electronics and Communication (ICOSEC) 2020; 491-496.
  • Garg K., Jain G. A comparative study of noise reduction techniques for automatic speech recognition systems. International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2016; 2098-2103.
  • Hamidi M., Satori H., Zealouk O., Satori K. Amazigh digits through interactive speech recognition system in noisy environment. International Journal of Speech Technology 2020; 23(1): 101-109.
  • Hazrati A., Eftekhari A., Taherian S. A Novel Speech Enhancement Method Based on Deep Residual Network in Low SNR Conditions. 7th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS) 2019; 72-76.
  • Jurado F., Saenz JR. Comparison between discrete STFT and wavelets for the analysis of power quality events. Electric Power Systems Research 2002; 62(3): 183-190.
  • Kalamani M., Krishnamoorthi M. Modified least mean square adaptive filter for speech enhancement. Applied Speech Processing 2021; 47-73.
  • Kara S., Içer S., Erdogan N. Spectral broadening of lower extremity venous Doppler signals using STFT and AR modeling. Digital Signal Processing 2018; 669–676.
  • Kim HD., Kim J., Komatani K., Ogata T., Okuno HG. Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments. IEEE/RSJ International Conference on Intelligent Robots and Systems 2008; 1705-1711.
  • Kitaoka N., Yamamoto K., Kusamizu T., Nakagawa S., Yamada T., Tsuge S., Miyajima C., Nishiura T., Nakayama M., Denda Y., Fujimoto M. Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance. IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) 2007; 607-612.
  • Li J., Wu Y., Gaur Y., Wang C., Zhao R., Liu S. On the comparison of popular end-to-end models for large scale speech recognition. arXiv preprint arXiv:2005.14327 2020.
  • Malik M., Malik MK., Mehmood K., Makhdoom I. Automatic speech recognition: a survey. Multimedia Tools and Applications 2021; 80, 9411-9457.
  • Manhertz G., Bereczky A. STFT spectrogram based hybrid evaluation method for rotating machine transient vibration analysis. Mechanical Systems and Signal Processing 2021; 154, 107583.
  • Manoharan S., Ponraj N. Analysis of complex non-linear environment exploration in speech recognition by hybrid learning technique. Journal of Innovative Image Processing (JIIP) 2020; 2(04): 202-209.
  • Martinek R., Vanus J., Nedoma J., Fridrich M., Frnda J., Kawala-Sterniuk A. Voice communication in noisy environments in a smart house using hybrid LMS+ICA algorithm. Sensors 2020; 20(21): 6022.
  • Nuha HH., Absa AA. Noise Reduction and Speech Enhancement Using Wiener Filter. International Conference on Data Science and Its Applications (ICoDSA) 2022; 177-180.
  • Oh SY., Chung KY. Improvement of speech detection using ERB feature extraction. Wireless Personal Communications 2014; 79(4): 2439-2451.
  • Oh S., Chung K. Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications 2018; 98, 3287-3297.
  • Padmapriya J., Sasilatha T., Aagash G., Bharathi V. Voice extraction from background noise using filter bank analysis for voice communication applications. Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) 2021; 269-273.
  • Phyu WLL., Pa PW. Building Speaker Identification Dataset for Noisy Conditions. IEEE Conference on Computer Applications (ICCA) 2020; 1-6.
  • Premoli M., Baggi D., Bianchetti M., Gnutti A., Bondaschi M., Mastinu A., Migliorati P., Signoroni A., Leonardi R., Memo M., Bonini SA. Automatic classification of mice vocalizations using Machine Learning techniques and Convolutional Neural Networks. PloS One 2021; 16(1): e0244636.
  • Rosca JP., Balan R., Fan NP., Beaugeant C., Gilg V. Multichannel voice detection in adverse environments. 11th European Signal Processing Conference 2002; 1-4.
  • Agram N., Øksendal B. Introduction to white noise, hida-malliavin calculus and applications. arXiv preprint arXiv:1903.02936 2019.
  • Sainburg T. (2022). Noisereduce: Noise Reduction in Python. GitHub. https://github.com/timsainb/noisereduce
  • Sasaki Y., Kagami S., Mizoguchi H., Enomoto T. A predefined command recognition system using a ceiling microphone array in noisy housing environments. IEEE/RSJ International Conference on Intelligent Robots and Systems 2008; 2178-2184.
  • Schroeder MR. Speech processing. NATO ASI Series F Computer and Systems Sciences 1999; 174, 129-136. Seetharaman P. (2022). torch-stft. GitHub. https://github.com/pseeth/torch-stft
  • Shahamiri SR. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2021; 29, 852-861.
  • Wang Y., Fan X., Chen IF., Liu Y., Chen T., Hoffmeister B. End-to-end anchored speech recognition. International Conference on Acoustics Speech and Signal Processing (ICASSP) 2019; 7090-7094.
  • Xing F., Chen H., Xie S., Yao J. Ultrafast three-dimensional surface imaging based on short-time Fourier transform. IEEE Photonics Technology Letters 2015; 27(21): 2264-2267.
  • Zhang H., Hua G., Yu L., Cai Y., & Bi G. Underdetermined blind separation of overlapped speech mixtures in time-frequency domain with estimated number of sources. Speech Communication 2017; 89, 1-16.
  • Zhang WY., Hao T., Chang Y., & Zhao YH. Time-frequency analysis of enhanced GPR detection of RF tagged buried plastic pipes. NDT & E International 2017; 92, 88-96.

Gürültülü Ortamlarda Ses Tanıma Performansı Üzerinde Ses İşleme ve Filtreleme Stratejilerinin Etkinliğinin Araştırılması

Year 2025, Volume: 8 Issue: 1, 222 - 247, 17.01.2025
https://doi.org/10.47495/okufbed.1457532

Abstract

Bu çalışma, gürültülü ortamlarda konuşma tanıma sistemlerinin performansını artırmak için ses işleme ve filtreleme stratejilerinin etkilerini araştırmaktadır. Odak noktası, gürültülü ses dosyalarına uygulanan Kısa Süreli Fourier Dönüşümü (STFT) işlemleri ve gürültü azaltma prosedürleridir. STFT işlemleri, gürültüyü tespit etme ve konuşma sinyalini frekans alanında analiz etme temelini oluştururken, gürültü azaltma adımları eşik tabanlı maskeleme ve konvolüsyon işlemlerini içermektedir. Elde edilen sonuçlar, gürültülü ortamlarda ses tanıma doğruluğunda önemli bir iyileşme sağladığını göstermektedir. Grafiklerin detaylı analizi, gürültü azaltma prosedürlerinin etkinliğini değerlendirmek için rehberlik sağlar ve gelecek araştırmalar için bir yol haritası görevi görür. Bu çalışma, gürültülü ortamlarda konuşma tanıma sistemlerinin performansını artırmada ses işleme ve filtreleme derin öğrenme ve CNN stratejilerinin kritik önemini vurgulayarak, gelecek çalışmalar için bir temel oluşturur.

References

  • Ali MH., Jaber MM., Abd SK., Rehman A., Awan MJ., Vitkutė-Adžgauskienė D., Damaševičius R., Bahaj SA. Harris hawks sparse auto-encoder networks for automatic speech recognition system. Applied Sciences 2022; 12(3): 1091.
  • Anggriawan DO., Wahjono E., Sudiharto I., Firdaus AA., Putri DNN., Budikarso A. Identification of short duration voltage variations based on short time Fourier transform and artificial neural network. 2020 International Electronics Symposium 2020; 43-47.
  • Bharti D., Kukana P. A hybrid machine learning model for emotion recognition from speech signals. International Conference on Smart Electronics and Communication (ICOSEC) 2020; 491-496.
  • Garg K., Jain G. A comparative study of noise reduction techniques for automatic speech recognition systems. International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2016; 2098-2103.
  • Hamidi M., Satori H., Zealouk O., Satori K. Amazigh digits through interactive speech recognition system in noisy environment. International Journal of Speech Technology 2020; 23(1): 101-109.
  • Hazrati A., Eftekhari A., Taherian S. A Novel Speech Enhancement Method Based on Deep Residual Network in Low SNR Conditions. 7th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS) 2019; 72-76.
  • Jurado F., Saenz JR. Comparison between discrete STFT and wavelets for the analysis of power quality events. Electric Power Systems Research 2002; 62(3): 183-190.
  • Kalamani M., Krishnamoorthi M. Modified least mean square adaptive filter for speech enhancement. Applied Speech Processing 2021; 47-73.
  • Kara S., Içer S., Erdogan N. Spectral broadening of lower extremity venous Doppler signals using STFT and AR modeling. Digital Signal Processing 2018; 669–676.
  • Kim HD., Kim J., Komatani K., Ogata T., Okuno HG. Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments. IEEE/RSJ International Conference on Intelligent Robots and Systems 2008; 1705-1711.
  • Kitaoka N., Yamamoto K., Kusamizu T., Nakagawa S., Yamada T., Tsuge S., Miyajima C., Nishiura T., Nakayama M., Denda Y., Fujimoto M. Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance. IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) 2007; 607-612.
  • Li J., Wu Y., Gaur Y., Wang C., Zhao R., Liu S. On the comparison of popular end-to-end models for large scale speech recognition. arXiv preprint arXiv:2005.14327 2020.
  • Malik M., Malik MK., Mehmood K., Makhdoom I. Automatic speech recognition: a survey. Multimedia Tools and Applications 2021; 80, 9411-9457.
  • Manhertz G., Bereczky A. STFT spectrogram based hybrid evaluation method for rotating machine transient vibration analysis. Mechanical Systems and Signal Processing 2021; 154, 107583.
  • Manoharan S., Ponraj N. Analysis of complex non-linear environment exploration in speech recognition by hybrid learning technique. Journal of Innovative Image Processing (JIIP) 2020; 2(04): 202-209.
  • Martinek R., Vanus J., Nedoma J., Fridrich M., Frnda J., Kawala-Sterniuk A. Voice communication in noisy environments in a smart house using hybrid LMS+ICA algorithm. Sensors 2020; 20(21): 6022.
  • Nuha HH., Absa AA. Noise Reduction and Speech Enhancement Using Wiener Filter. International Conference on Data Science and Its Applications (ICoDSA) 2022; 177-180.
  • Oh SY., Chung KY. Improvement of speech detection using ERB feature extraction. Wireless Personal Communications 2014; 79(4): 2439-2451.
  • Oh S., Chung K. Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications 2018; 98, 3287-3297.
  • Padmapriya J., Sasilatha T., Aagash G., Bharathi V. Voice extraction from background noise using filter bank analysis for voice communication applications. Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) 2021; 269-273.
  • Phyu WLL., Pa PW. Building Speaker Identification Dataset for Noisy Conditions. IEEE Conference on Computer Applications (ICCA) 2020; 1-6.
  • Premoli M., Baggi D., Bianchetti M., Gnutti A., Bondaschi M., Mastinu A., Migliorati P., Signoroni A., Leonardi R., Memo M., Bonini SA. Automatic classification of mice vocalizations using Machine Learning techniques and Convolutional Neural Networks. PloS One 2021; 16(1): e0244636.
  • Rosca JP., Balan R., Fan NP., Beaugeant C., Gilg V. Multichannel voice detection in adverse environments. 11th European Signal Processing Conference 2002; 1-4.
  • Agram N., Øksendal B. Introduction to white noise, hida-malliavin calculus and applications. arXiv preprint arXiv:1903.02936 2019.
  • Sainburg T. (2022). Noisereduce: Noise Reduction in Python. GitHub. https://github.com/timsainb/noisereduce
  • Sasaki Y., Kagami S., Mizoguchi H., Enomoto T. A predefined command recognition system using a ceiling microphone array in noisy housing environments. IEEE/RSJ International Conference on Intelligent Robots and Systems 2008; 2178-2184.
  • Schroeder MR. Speech processing. NATO ASI Series F Computer and Systems Sciences 1999; 174, 129-136. Seetharaman P. (2022). torch-stft. GitHub. https://github.com/pseeth/torch-stft
  • Shahamiri SR. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2021; 29, 852-861.
  • Wang Y., Fan X., Chen IF., Liu Y., Chen T., Hoffmeister B. End-to-end anchored speech recognition. International Conference on Acoustics Speech and Signal Processing (ICASSP) 2019; 7090-7094.
  • Xing F., Chen H., Xie S., Yao J. Ultrafast three-dimensional surface imaging based on short-time Fourier transform. IEEE Photonics Technology Letters 2015; 27(21): 2264-2267.
  • Zhang H., Hua G., Yu L., Cai Y., & Bi G. Underdetermined blind separation of overlapped speech mixtures in time-frequency domain with estimated number of sources. Speech Communication 2017; 89, 1-16.
  • Zhang WY., Hao T., Chang Y., & Zhao YH. Time-frequency analysis of enhanced GPR detection of RF tagged buried plastic pipes. NDT & E International 2017; 92, 88-96.
There are 32 citations in total.

Details

Primary Language English
Subjects Deep Learning, Machine Learning (Other)
Journal Section RESEARCH ARTICLES
Authors

Cem Özkurt 0000-0002-1251-7715

Early Pub Date January 15, 2025
Publication Date January 17, 2025
Submission Date March 22, 2024
Acceptance Date August 29, 2024
Published in Issue Year 2025 Volume: 8 Issue: 1

Cite

APA Özkurt, C. (2025). Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 8(1), 222-247. https://doi.org/10.47495/okufbed.1457532
AMA Özkurt C. Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. January 2025;8(1):222-247. doi:10.47495/okufbed.1457532
Chicago Özkurt, Cem. “Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8, no. 1 (January 2025): 222-47. https://doi.org/10.47495/okufbed.1457532.
EndNote Özkurt C (January 1, 2025) Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8 1 222–247.
IEEE C. Özkurt, “Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance”, Osmaniye Korkut Ata University Journal of The Institute of Science and Techno, vol. 8, no. 1, pp. 222–247, 2025, doi: 10.47495/okufbed.1457532.
ISNAD Özkurt, Cem. “Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8/1 (January 2025), 222-247. https://doi.org/10.47495/okufbed.1457532.
JAMA Özkurt C. Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. 2025;8:222–247.
MLA Özkurt, Cem. “Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 8, no. 1, 2025, pp. 222-47, doi:10.47495/okufbed.1457532.
Vancouver Özkurt C. Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. 2025;8(1):222-47.

23487


196541947019414

19433194341943519436 1960219721 197842261021238 23877

*This journal is an international refereed journal 

*Our journal does not charge any article processing fees over publication process.

* This journal is online publishes 5 issues per year (January, March, June, September, December)

*This journal published in Turkish and English as open access. 

19450 This work is licensed under a Creative Commons Attribution 4.0 International License.