A Supervised Learning Approach With Residual Attention Connections

Ali Hamza; Fazal Muhammad; Talha Ali; Fazal E-wahab; Muhammad Ismail

doi:10.53525/jster.1469477

Araştırma Makalesi

A Supervised Learning Approach With Residual Attention Connections

Yıl 2024, Cilt: 5 Sayı: 1, 78 - 85, 21.06.2024

Ali Hamza , Fazal Muhammad Talha Ali Fazal E-wahab Muhammad Ismail

https://doi.org/10.53525/jster.1469477

https://izlik.org/JA98CD66CD

Öz

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

Anahtar Kelimeler

 Speech enhancement ,  Convolutional Recurrent Network ,  supervised learning

Kaynakça

[1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
[2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
[3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
[4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
[5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
[7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
[8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
[9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
[10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
[11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
[12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
[13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
[14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
[15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

A Supervised Learning Approach With Residual Attention Connections

Yıl 2024, Cilt: 5 Sayı: 1, 78 - 85, 21.06.2024

Ali Hamza , Fazal Muhammad Talha Ali Fazal E-wahab Muhammad Ismail

https://doi.org/10.53525/jster.1469477

https://izlik.org/JA98CD66CD

Öz

Anahtar Kelimeler

 Speech enhancement ,  Convolutional Recurrent Network ,  supervised learning

Kaynakça

[1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
[2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
[3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
[4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
[5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
[7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
[8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
[9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
[10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
[11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
[12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
[13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
[14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
[15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

Toplam 14 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	İletişim Mühendisliği (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Ali Hamza 0009-0002-8662-5249 Fazal Muhammad Bu kişi benim Talha Ali Bu kişi benim Fazal E-wahab Bu kişi benim Muhammad Ismail 0009-0002-7268-5486
Gönderilme Tarihi	16 Nisan 2024
Kabul Tarihi	27 Mayıs 2024
Erken Görünüm Tarihi	15 Haziran 2024
Yayımlanma Tarihi	21 Haziran 2024
DOI	https://doi.org/10.53525/jster.1469477
IZ	https://izlik.org/JA98CD66CD
Yayımlandığı Sayı	Yıl 2024 Cilt: 5 Sayı: 1

Kaynak Göster

IEEE	[1]A. Hamza, F. Muhammad, T. Ali, F. E-wahab, ve M. Ismail, “A Supervised Learning Approach With Residual Attention Connections”, Journal of Science, Technology and Engineering Research, c. 5, sy 1, ss. 78–85, Haz. 2024, doi: 10.53525/jster.1469477.

Makale Dosyaları

Tam Metin

Dergide yayınlanan çalışmalar

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0) Uluslararası Lisansı ile lisanslanmıştır.