A Supervised Learning Approach With Residual Attention Connections

Ali Hamza; Fazal Muhammad; Talha Ali; Fazal E-wahab; Muhammad Ismail

doi:10.53525/jster.1469477

Research Article

A Supervised Learning Approach With Residual Attention Connections

Year 2024, Volume: 5 Issue: 1, 78 - 85, 21.06.2024

Ali Hamza , Fazal Muhammad Talha Ali Fazal E-wahab Muhammad Ismail

https://doi.org/10.53525/jster.1469477

Abstract

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

Keywords

 Speech enhancement ,  Convolutional Recurrent Network ,  supervised learning

References

[1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
[2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
[3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
[4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
[5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
[7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
[8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
[9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
[10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
[11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
[12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
[13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
[14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
[15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

A Supervised Learning Approach With Residual Attention Connections

Year 2024, Volume: 5 Issue: 1, 78 - 85, 21.06.2024

Ali Hamza , Fazal Muhammad Talha Ali Fazal E-wahab Muhammad Ismail

https://doi.org/10.53525/jster.1469477

Abstract

Keywords

 Speech enhancement ,  Convolutional Recurrent Network ,  supervised learning

References

[1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
[2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
[3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
[4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
[5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
[7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
[8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
[9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
[10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
[11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
[12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
[13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
[14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
[15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

There are 14 citations in total.

Details

Primary Language	English
Subjects	Communications Engineering (Other)
Journal Section	Research Articles
Authors	Ali Hamza 0009-0002-8662-5249 Fazal Muhammad This is me Talha Ali This is me Fazal E-wahab This is me Muhammad Ismail 0009-0002-7268-5486
Early Pub Date	June 15, 2024
Publication Date	June 21, 2024
Submission Date	April 16, 2024
Acceptance Date	May 27, 2024
Published in Issue	Year 2024 Volume: 5 Issue: 1

Cite

APA	Hamza, A., Muhammad, F., Ali, T., … E-wahab, F. (2024). A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research, 5(1), 78-85. https://doi.org/10.53525/jster.1469477
AMA	Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research. June 2024;5(1):78-85. doi:10.53525/jster.1469477
Chicago	Hamza, Ali, Fazal Muhammad, Talha Ali, Fazal E-wahab, and Muhammad Ismail. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5, no. 1 (June 2024): 78-85. https://doi.org/10.53525/jster.1469477.
EndNote	Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M (June 1, 2024) A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research 5 1 78–85.
IEEE	A. Hamza, F. Muhammad, T. Ali, F. E-wahab, and M. Ismail, “A Supervised Learning Approach With Residual Attention Connections”, Journal of Science, Technology and Engineering Research, vol. 5, no. 1, pp. 78–85, 2024, doi: 10.53525/jster.1469477.
ISNAD	Hamza, Ali et al. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5/1 (June2024), 78-85. https://doi.org/10.53525/jster.1469477.
JAMA	Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research. 2024;5:78–85.
MLA	Hamza, Ali et al. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research, vol. 5, no. 1, 2024, pp. 78-85, doi:10.53525/jster.1469477.
Vancouver	Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research. 2024;5(1):78-85.

Download Cover Image

Article Files

Full Text

Studies published in the journal are licensed under a

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0) International License.