Research Article
BibTex RIS Cite

A Supervised Learning Approach With Residual Attention Connections

Year 2024, , 78 - 85, 21.06.2024
https://doi.org/10.53525/jster.1469477

Abstract

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

References

  • [1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
  • [2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
  • [3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
  • [4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
  • [5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
  • [7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
  • [8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
  • [9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
  • [10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
  • [11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
  • [12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
  • [13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
  • [14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
  • [15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

A Supervised Learning Approach With Residual Attention Connections

Year 2024, , 78 - 85, 21.06.2024
https://doi.org/10.53525/jster.1469477

Abstract

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

References

  • [1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
  • [2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
  • [3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
  • [4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
  • [5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
  • [7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
  • [8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
  • [9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
  • [10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
  • [11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
  • [12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
  • [13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
  • [14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
  • [15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).
There are 14 citations in total.

Details

Primary Language English
Subjects Communications Engineering (Other)
Journal Section Research Articles
Authors

Ali Hamza 0009-0002-8662-5249

Fazal Muhammad This is me

Talha Ali This is me

Fazal E-wahab This is me

Muhammad Ismail 0009-0002-7268-5486

Early Pub Date June 15, 2024
Publication Date June 21, 2024
Submission Date April 16, 2024
Acceptance Date May 27, 2024
Published in Issue Year 2024

Cite

APA Hamza, A., Muhammad, F., Ali, T., E-wahab, F., et al. (2024). A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research, 5(1), 78-85. https://doi.org/10.53525/jster.1469477
AMA Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. June 2024;5(1):78-85. doi:10.53525/jster.1469477
Chicago Hamza, Ali, Fazal Muhammad, Talha Ali, Fazal E-wahab, and Muhammad Ismail. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5, no. 1 (June 2024): 78-85. https://doi.org/10.53525/jster.1469477.
EndNote Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M (June 1, 2024) A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research 5 1 78–85.
IEEE A. Hamza, F. Muhammad, T. Ali, F. E-wahab, and M. Ismail, “A Supervised Learning Approach With Residual Attention Connections”, JSTER, vol. 5, no. 1, pp. 78–85, 2024, doi: 10.53525/jster.1469477.
ISNAD Hamza, Ali et al. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5/1 (June 2024), 78-85. https://doi.org/10.53525/jster.1469477.
JAMA Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. 2024;5:78–85.
MLA Hamza, Ali et al. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research, vol. 5, no. 1, 2024, pp. 78-85, doi:10.53525/jster.1469477.
Vancouver Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. 2024;5(1):78-85.
Dergide yayınlanan çalışmalar
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0) Uluslararası Lisansı ile lisanslanmıştır.
by-nc-nd.png

Free counters!