Research Article

Transformer-Based Turkish Automatic Speech Recognition

Volume: 8 Number: 1 June 28, 2024
EN

Transformer-Based Turkish Automatic Speech Recognition

Abstract

Today, businesses use Automatic Speech Recognition (ASR) technology more frequently to increase efficiency and productivity while performing many business functions. Due to the increased prevalence of online meetings in remote working and learning environments after the COVID-19 pandemic, speech recognition systems have seen more frequent utilization, exhibiting the significance of these systems. While English, Spanish or French languages have a lot of labeled data, there is very little labeled data for the Turkish language. This directly affects the accuracy of the ASR system negatively. Therefore, this study utilizes unlabeled audio data by learning general data representations with self-supervised learning end-to-end modeling. This study employed a transformer-based machine learning model with improved performance through transfer learning to convert speech recordings to text. The model adopted within the scope of the study is the Wav2Vec 2.0 architecture, which masks the audio inputs and solves the related task. The XLSR-Wav2Vec 2.0 model was pre-trained on speech data in 53 languages and fine-tuned with the Mozilla Common Voice Turkish data set. According to the empirical results obtained within the scope of the study, a 0.23 word error rate was reached in the test set of the same data set.

Keywords

References

  1. Akhilesh, A., Brinda, P., Keerthana, S., Gupta, D., & Vekkot, S. (2022). Tamil speech recognition using XLSR Wav2Vec2.0 & CTC algorithm. 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-6. https://doi.org/10.1109/ICCCNT54827.2022.9984422 google scholar
  2. Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. ICML’16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, 173-182. https://dl.acm.org/doi/10.5555/3045390.3045410 google scholar
  3. Annam, S. V., Neelima, N., Parasa, N., & Chinamuttevi, D. (2023, March). Automated Home Life using IoT and Speech Recognition. In 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA) (pp. 809-813). IEEE. google scholar
  4. Baevski, A., Schneider, S., & Auli, M. (2019). vq-wav2vec: Self-supervised learning of discrete speech representations. arXiv. https://doi.org/10.48550/arXiv.1910.05453 google scholar
  5. Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representa-tions. Advances in neural information processing systems: 34th conference on neural information processing systems (NeurIPS 2020), https://proceedings.neurips.cc/paper_files/paper/2020 google scholar
  6. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., ... & Wellekens, C. (2007). Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786. https://doi.org/10.1016/j.specom.2007.02.006 google scholar
  7. Chi, P. H., Chung, P. H., Wu, T. H., Hsieh, C. C., Chen, Y. H., Li, S. W., & Lee, H. Y. (2021). Audio albert: A lite bert for self-supervised learning of audio representation. 2021 IEEE Spoken Language Technology Workshop (SLT), 344-350. https://doi.org/10.1109/SLT48900.2021.9383575 google scholar
  8. Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., ... & Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), 4774-4778. https://doi.org/10.1109/ICASSP.2018.8462105 google scholar

Details

Primary Language

English

Subjects

Software Engineering (Other)

Journal Section

Research Article

Publication Date

June 28, 2024

Submission Date

August 6, 2023

Acceptance Date

November 30, 2023

Published in Issue

Year 2024 Volume: 8 Number: 1

APA
Taşar, D. E., Koruyan, K., & Çılgın, C. (2024). Transformer-Based Turkish Automatic Speech Recognition. Acta Infologica, 8(1), 1-10. https://doi.org/10.26650/acin.1338604
AMA
1.Taşar DE, Koruyan K, Çılgın C. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 2024;8(1):1-10. doi:10.26650/acin.1338604
Chicago
Taşar, Davut Emre, Kutan Koruyan, and Cihan Çılgın. 2024. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica 8 (1): 1-10. https://doi.org/10.26650/acin.1338604.
EndNote
Taşar DE, Koruyan K, Çılgın C (June 1, 2024) Transformer-Based Turkish Automatic Speech Recognition. Acta Infologica 8 1 1–10.
IEEE
[1]D. E. Taşar, K. Koruyan, and C. Çılgın, “Transformer-Based Turkish Automatic Speech Recognition”, ACIN, vol. 8, no. 1, pp. 1–10, June 2024, doi: 10.26650/acin.1338604.
ISNAD
Taşar, Davut Emre - Koruyan, Kutan - Çılgın, Cihan. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica 8/1 (June 1, 2024): 1-10. https://doi.org/10.26650/acin.1338604.
JAMA
1.Taşar DE, Koruyan K, Çılgın C. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 2024;8:1–10.
MLA
Taşar, Davut Emre, et al. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica, vol. 8, no. 1, June 2024, pp. 1-10, doi:10.26650/acin.1338604.
Vancouver
1.Davut Emre Taşar, Kutan Koruyan, Cihan Çılgın. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 2024 Jun. 1;8(1):1-10. doi:10.26650/acin.1338604