Araştırma Makalesi

Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition

Sayı: 36 31 Mayıs 2022
PDF İndir
TR EN

Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition

Öz

Listen, Attend and Spell (LAS) network is one of the end-to-end approaches for speech recognition, which does not require an explicit language model. It consists of two parts; the encoder part which receives acoustic features as inputs, and the decoder network which produces one character at a time step, based on the encoder output and an attention mechanism. Multi-layer recurrent neural networks (RNN) are used in both decoder and encoder parts. Hence, the LAS architecture can be simplified as one RNN for the decoder, and another RNN for the encoder. Their shapes and layer sizes can be different. In this work, we examined the performance of using multi RNNs for the encoder part. Our baseline LAS network uses an RNN with a hidden size of 256. We used 2 and 4 RNNs with hidden sizes of 128 and 64 for each case. The main idea behind the proposed approach is to focus the RNNs to different patterns (phonemes in this case) in the data. At the output of the encoder, their outputs are concatenated and fed to the decoder. TIMIT database is used to compare the performance of the mentioned networks, using phoneme error rate as the performance metric. The experimental results showed that proposed approach can achieve a better performance than the baseline network. However, increasing the number of RNNs does not guarantee further improvements.

Anahtar Kelimeler

Kaynakça

  1. C. Kim et al., “A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms,” in 2020 54th Asilomar Conference on Signals, Systems, and Computers, 2020, pp. 277–283.
  2. A. P. Varga and R. K. Moore, “Hidden Markov model decomposition of speech and noise,” in International Conference on Acoustics, Speech, and Signal Processing, 1990, pp. 845–848.
  3. G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.
  4. Yiğit, E., Özkaya, U., Öztürk, Ş., Singh, D. and Gritli, H. “Automatic detection of power quality disturbance using convolutional neural network structure with gated recurrent unit”, Mobile Information Systems, 2021.
  5. A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645–6649.
  6. W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960–4964.
  7. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in 27th International Neural Information Processing Systems, 2014, pp. 3104–3112.
  8. D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” in 3rd International Conference on Learning Representations, 2015, pp. 1–15.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Mayıs 2022

Gönderilme Tarihi

14 Nisan 2022

Kabul Tarihi

21 Nisan 2022

Yayımlandığı Sayı

Yıl 2022 Sayı: 36

Kaynak Göster

APA
Tüfekci, Z., & Dişken, G. (2022). Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi, 36, 87-90. https://doi.org/10.31590/ejosat.1103714
AMA
1.Tüfekci Z, Dişken G. Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition. EJOSAT. 2022;(36):87-90. doi:10.31590/ejosat.1103714
Chicago
Tüfekci, Zekeriya, ve Gökay Dişken. 2022. “Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi, sy 36: 87-90. https://doi.org/10.31590/ejosat.1103714.
EndNote
Tüfekci Z, Dişken G (01 Mayıs 2022) Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi 36 87–90.
IEEE
[1]Z. Tüfekci ve G. Dişken, “Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition”, EJOSAT, sy 36, ss. 87–90, May. 2022, doi: 10.31590/ejosat.1103714.
ISNAD
Tüfekci, Zekeriya - Dişken, Gökay. “Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi. 36 (01 Mayıs 2022): 87-90. https://doi.org/10.31590/ejosat.1103714.
JAMA
1.Tüfekci Z, Dişken G. Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition. EJOSAT. 2022;:87–90.
MLA
Tüfekci, Zekeriya, ve Gökay Dişken. “Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition”. Avrupa Bilim ve Teknoloji Dergisi, sy 36, Mayıs 2022, ss. 87-90, doi:10.31590/ejosat.1103714.
Vancouver
1.Zekeriya Tüfekci, Gökay Dişken. Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition. EJOSAT. 01 Mayıs 2022;(36):87-90. doi:10.31590/ejosat.1103714