Research Article

Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)

Volume: 34 Number: 4 December 1, 2021
EN

Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)

Abstract

A typical solution of Automatic Speech Recognition (ASR) problems is realized by feature extraction, feature classification, acoustic modeling and language modeling steps. In classification and modeling steps, Deep Learning Methods have become popular and give more successful recognition results than conventional methods. In this study, an application for solving ASR problem in Turkish Language has been developed. The data sets and studies related to Turkish Language ASR problem are examined. Language models in the ASR problems of agglutative language groups such as Turkish, Finnish and Hungarian are examined. Subword based model is chosen in order not to decrease recognition performance and prevent large vocabulary. The recogniton performance is increased by Deep Learning Methods called Long-Short Term Memory (LSTM) Neural Networks and Gated Recurrent Unit (GRU) in the classification and acoustic modeling steps. The recognition performances of systems including LSTM and GRU are compared with the the previous studies using traditional methods and Deep Neural Networks. When the results were evaluated, it is seen that LSTM and GRU based Speech Recognizers performs better than the recognizers with previous methods. Final Word Error Rate (WER) values were obtained for LSTM and GRU as 10,65% and 11,25%, respectively. GRU based systems have similar performance when compared to LSTM based systems. However, it has been observed that the training periods are short. Computation times are 73.518 and 61.020 seconds respectively. The study gave detailed information about the applicability of the latest methods to Turkish ASR research and applications.

Keywords

References

  1. [1] Shewalkar, N., Nyavanandi, D., Ludwig, S. A., “Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU”, Journal of Artificial Intelligence and Soft Computing Research, 9(4): 235-245, (2019).
  2. [2] Kang J., Zhang, W., Liu, J., “Gated Recurrent Units Based Hybrid Acoustic Models for Robust Speech Recognition”, 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), Conference, Tianjin, (2016).
  3. [3] Dridi, H., Ouni, K., “Towards Robust Combined Deep Architecture for Speech Recognition : Experiments on TIMIT”, International Journal of Advanced Computer Science and Applications (IJACSA), 11(4): 525-534, (2020).
  4. [4] Tombaloğlu B., Erdem H., “Deep Learning Based Automatic Speech Recognition for Turkish”, Sakarya University Journal of Science, 24(4): 725 – 739, (2020).
  5. [5] Kimanuka, U , Buyuk, O . "Turkish Speech Recognition Based On Deep Neural Networks" . Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22: 319-329, (2018).
  6. [6]Graves, A., Mohamed, A. R., Hinton, G., “Speech Recognition with Deep Recurrent Neural Networks”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Conference, Vancouver, 6645- 6649, (2013).
  7. [7]Arslan, R., S., Barışçı, N., “A Detailed Survey of Turkish Automatic Speech Recognition”, Turkish Journal of Electrical Engineering & Computer Sciences, 28: 3253-3269, (2020).
  8. [8]Siri Team, “Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant”, machinelearning.apple.com, https://machinelearning.apple.com/research/hey-siri, (Accessed: 01.07. 2021).

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

December 1, 2021

Submission Date

October 27, 2020

Acceptance Date

January 21, 2021

Published in Issue

Year 2021 Volume: 34 Number: 4

APA
Tombaloğlu, B., & Erdem, H. (2021). Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU). Gazi University Journal of Science, 34(4), 1035-1049. https://doi.org/10.35378/gujs.816499
AMA
1.Tombaloğlu B, Erdem H. Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU). Gazi University Journal of Science. 2021;34(4):1035-1049. doi:10.35378/gujs.816499
Chicago
Tombaloğlu, Burak, and Hamit Erdem. 2021. “Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)”. Gazi University Journal of Science 34 (4): 1035-49. https://doi.org/10.35378/gujs.816499.
EndNote
Tombaloğlu B, Erdem H (December 1, 2021) Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU). Gazi University Journal of Science 34 4 1035–1049.
IEEE
[1]B. Tombaloğlu and H. Erdem, “Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)”, Gazi University Journal of Science, vol. 34, no. 4, pp. 1035–1049, Dec. 2021, doi: 10.35378/gujs.816499.
ISNAD
Tombaloğlu, Burak - Erdem, Hamit. “Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)”. Gazi University Journal of Science 34/4 (December 1, 2021): 1035-1049. https://doi.org/10.35378/gujs.816499.
JAMA
1.Tombaloğlu B, Erdem H. Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU). Gazi University Journal of Science. 2021;34:1035–1049.
MLA
Tombaloğlu, Burak, and Hamit Erdem. “Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)”. Gazi University Journal of Science, vol. 34, no. 4, Dec. 2021, pp. 1035-49, doi:10.35378/gujs.816499.
Vancouver
1.Burak Tombaloğlu, Hamit Erdem. Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU). Gazi University Journal of Science. 2021 Dec. 1;34(4):1035-49. doi:10.35378/gujs.816499

Cited By