TR
EN
A Review on Deep Learning Architectures for Speech Recognition
Öz
Deep learning is a branch of machine learning that uses several algorithms which tries to model datasets by using deep architectures with many processing layers. With the popularity and successful applications of deep learning architectures, they are being used in speech recognition, as well. Researchers utilized these architectures for speech recognition and its applications, such as speech emotion recognition, voice activity detection, and speaker recognition and verification to better model speech inputs with outputs and to reduce error rates of speech recognition systems. Many studies are performed in the literature that use deep learning architectures for speech recognition systems. The literature studies show that using deep learning architectures for speech recognition and its applications provide benefits for many speech recognition areas and have ability to reduce error rates and provide better performance. In this study, first of all, we explained speech recognition problem and the steps of speech recognition. Then, we analyzed the studies related to deep learning based speech recognition. In particular, deep learning architectures of Deep Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks and hybrid approaches that use these architectures are evaluated and the literature studies related to these architectures for speech recognition and the application areas of speech recognition are investigated. As a result, we observed that RNNs are the most utilized and powerful deep learning architecture among all of the deep learning architectures in terms of error rates and speech recognition performance. CNNs are other successful deep learning architectures and have closer results with RNN in terms of error rates and speech recognition performance. Also, we observed that new deep architectures that use either hybrid of DNNs, CNNs, and RNNs or other deep learning architectures are getting attention and have increasing performance and could reduce error rates in speech recognition.
Anahtar Kelimeler
Kaynakça
- Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10), 1533-1545.
- Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017, February). Speech emotion recognition from spectrograms with deep convolutional neural network. In 2017 international conference on platform technology and service (PlatCon) (pp. 1-5). IEEE.
- Chan, W., Ke, N. R., & Lane, I. (2015). Transferring knowledge from a RNN to a DNN. arXiv preprint arXiv:1504.01483.
- Chen, M., He, X., Yang, J., & Zhang, H. (2018). 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters, 25(10), 1440-1444.
- Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30-42.
- Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectified linear units and dropout. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8609-8613). IEEE.
- Fu, S. W., Tsao, Y., & Lu, X. (2016, September). SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement. In INTERSPEECH 2016, San Francisco, USA (pp. 3768-3772).
- Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Derleme
Yayımlanma Tarihi
1 Nisan 2020
Gönderilme Tarihi
15 Mart 2020
Kabul Tarihi
28 Mart 2020
Yayımlandığı Sayı
Yıl 2020
APA
Dokuz, Y., & Tüfekci, Z. (2020). A Review on Deep Learning Architectures for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi, 169-176. https://doi.org/10.31590/ejosat.araconf22
AMA
1.Dokuz Y, Tüfekci Z. A Review on Deep Learning Architectures for Speech Recognition. EJOSAT. Published online 01 Nisan 2020:169-176. doi:10.31590/ejosat.araconf22
Chicago
Dokuz, Yeşim, ve Zekeriya Tüfekci. 2020. “A Review on Deep Learning Architectures for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi, Nisan 1, 169-76. https://doi.org/10.31590/ejosat.araconf22.
EndNote
Dokuz Y, Tüfekci Z (01 Nisan 2020) A Review on Deep Learning Architectures for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi 169–176.
IEEE
[1]Y. Dokuz ve Z. Tüfekci, “A Review on Deep Learning Architectures for Speech Recognition”, EJOSAT, ss. 169–176, Nis. 2020, doi: 10.31590/ejosat.araconf22.
ISNAD
Dokuz, Yeşim - Tüfekci, Zekeriya. “A Review on Deep Learning Architectures for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi. 01 Nisan 2020. 169-176. https://doi.org/10.31590/ejosat.araconf22.
JAMA
1.Dokuz Y, Tüfekci Z. A Review on Deep Learning Architectures for Speech Recognition. EJOSAT. 2020;:169–176.
MLA
Dokuz, Yeşim, ve Zekeriya Tüfekci. “A Review on Deep Learning Architectures for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi, Nisan 2020, ss. 169-76, doi:10.31590/ejosat.araconf22.
Vancouver
1.Yeşim Dokuz, Zekeriya Tüfekci. A Review on Deep Learning Architectures for Speech Recognition. EJOSAT. 01 Nisan 2020;169-76. doi:10.31590/ejosat.araconf22