Araştırma Makalesi
BibTex RIS Kaynak Göster

A Supervised Learning Approach With Residual Attention Connections

Yıl 2024, , 78 - 85, 21.06.2024
https://doi.org/10.53525/jster.1469477

Öz

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

Kaynakça

  • [1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
  • [2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
  • [3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
  • [4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
  • [5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
  • [7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
  • [8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
  • [9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
  • [10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
  • [11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
  • [12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
  • [13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
  • [14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
  • [15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).

A Supervised Learning Approach With Residual Attention Connections

Yıl 2024, , 78 - 85, 21.06.2024
https://doi.org/10.53525/jster.1469477

Öz

Our study aims to improve speech quality despite background noise, which often disrupts clear communication. We focus on developing efficient and effective models that work well on devices with limited resources. We draw inspiration from computational auditory scene analysis techniques to train our models to differentiate speech from background noise while keeping computational demands low. We introduce two models: CRN-WRC (Convolutional Recurrent Network without Residual Connections) and CRN-RCAG (Convolutional Recurrent Network with Residual Connections and Attention Gates). Our thorough testing shows that our models significantly enhance speech quality and understanding, even in noisy environments with varying background noise levels. Notably, the CRN-RCAG model consistently outperforms the CRN-WRC, particularly in handling untrained noise types. We achieve impressive results by integrating residual connections and attention gates into our models while maintaining computational efficiency.

Kaynakça

  • [1] Kheddar, Hamza, et al. "Deep transfer learning for automatic speech recognition: Towards better generalization." Knowledge-Based Systems 277 (2023): 110851.
  • [2] Kwak, Chanbeom, and Woojae Han. "Towards size of scene in auditory scene analysis: A systematic review." Journal of Audiology & Otology 24.1 (2020): 1.
  • [3] Wang, DeLiang, and Jitong Chen. "Supervised speech separation based on deep learning: An overview." IEEE/ACM transactions on audio, speech, and language processing 26.10 (2018): 1702-1726.
  • [4] Nossier, Soha A., et al. "Mapping and masking targets comparison using different deep learning based speech enhancement architectures." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
  • [5] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023). [6] Hsieh, Tsun-An, et al. "Wavecrn: An efficient convolutional recurrent neural network for end-to-end speech enhancement." IEEE Signal Processing Letters 27 (2020): 2149-2153.
  • [7] Wang, Kai. Novel Deep Learning Approaches for Single-Channel Speech Enhancement. Diss. Concordia University, 2022.
  • [8] Haar, Lynn Vonder, Timothy Elvira, and Omar Ochoa. "An analysis of explainability methods for convolutional neural networks." Engineering Applications of Artificial Intelligence 117 (2023): 105606.
  • [9] Le, Xiaohuai, et al. "DPCRN: Dual-path convolution recurrent network for single channel speech enhancement." arXiv preprint arXiv:2107.05429 (2021).
  • [10] Marcu, David C., and Cristian Grava. "The impact of activation functions on training and performance of a deep neural network." 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021.
  • [11] Ye, Zhongfu, Nasir Saleem, and Hamza Ali. "Efficient gated convolutional recurrent neural networks for real-time speech enhancement." (2023).
  • [12] Ketkar, Nikhil, et al. "Convolutional neural networks." Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch (2021): 197-242.
  • [13] Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. "Recurrent neural networks for time series forecasting: Current status and future directions." International Journal of Forecasting 37.1 (2021): 388-427.
  • [14] Wahab, Fazal E., et al. "Compact deep neural networks for real-time speech enhancement on resource-limited devices." Speech Communication 156 (2024): 103008.
  • [15] Galvez, Daniel, et al. "The people's speech: A large-scale diverse english speech recognition dataset for commercial usage." arXiv preprint arXiv:2111.09344 (2021).
Toplam 14 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular İletişim Mühendisliği (Diğer)
Bölüm Araştırma Makaleleri
Yazarlar

Ali Hamza 0009-0002-8662-5249

Fazal Muhammad Bu kişi benim

Talha Ali Bu kişi benim

Fazal E-wahab Bu kişi benim

Muhammad Ismail 0009-0002-7268-5486

Erken Görünüm Tarihi 15 Haziran 2024
Yayımlanma Tarihi 21 Haziran 2024
Gönderilme Tarihi 16 Nisan 2024
Kabul Tarihi 27 Mayıs 2024
Yayımlandığı Sayı Yıl 2024

Kaynak Göster

APA Hamza, A., Muhammad, F., Ali, T., E-wahab, F., vd. (2024). A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research, 5(1), 78-85. https://doi.org/10.53525/jster.1469477
AMA Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. Haziran 2024;5(1):78-85. doi:10.53525/jster.1469477
Chicago Hamza, Ali, Fazal Muhammad, Talha Ali, Fazal E-wahab, ve Muhammad Ismail. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5, sy. 1 (Haziran 2024): 78-85. https://doi.org/10.53525/jster.1469477.
EndNote Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M (01 Haziran 2024) A Supervised Learning Approach With Residual Attention Connections. Journal of Science, Technology and Engineering Research 5 1 78–85.
IEEE A. Hamza, F. Muhammad, T. Ali, F. E-wahab, ve M. Ismail, “A Supervised Learning Approach With Residual Attention Connections”, JSTER, c. 5, sy. 1, ss. 78–85, 2024, doi: 10.53525/jster.1469477.
ISNAD Hamza, Ali vd. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research 5/1 (Haziran 2024), 78-85. https://doi.org/10.53525/jster.1469477.
JAMA Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. 2024;5:78–85.
MLA Hamza, Ali vd. “A Supervised Learning Approach With Residual Attention Connections”. Journal of Science, Technology and Engineering Research, c. 5, sy. 1, 2024, ss. 78-85, doi:10.53525/jster.1469477.
Vancouver Hamza A, Muhammad F, Ali T, E-wahab F, Ismail M. A Supervised Learning Approach With Residual Attention Connections. JSTER. 2024;5(1):78-85.
Dergide yayınlanan çalışmalar
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0) Uluslararası Lisansı ile lisanslanmıştır.
by-nc-nd.png

Free counters!