Araştırma Makalesi

Transformer-Based Turkish Automatic Speech Recognition

Cilt: 8 Sayı: 1 28 Haziran 2024
PDF İndir
EN

Transformer-Based Turkish Automatic Speech Recognition

Öz

Today, businesses use Automatic Speech Recognition (ASR) technology more frequently to increase efficiency and productivity while performing many business functions. Due to the increased prevalence of online meetings in remote working and learning environments after the COVID-19 pandemic, speech recognition systems have seen more frequent utilization, exhibiting the significance of these systems. While English, Spanish or French languages have a lot of labeled data, there is very little labeled data for the Turkish language. This directly affects the accuracy of the ASR system negatively. Therefore, this study utilizes unlabeled audio data by learning general data representations with self-supervised learning end-to-end modeling. This study employed a transformer-based machine learning model with improved performance through transfer learning to convert speech recordings to text. The model adopted within the scope of the study is the Wav2Vec 2.0 architecture, which masks the audio inputs and solves the related task. The XLSR-Wav2Vec 2.0 model was pre-trained on speech data in 53 languages and fine-tuned with the Mozilla Common Voice Turkish data set. According to the empirical results obtained within the scope of the study, a 0.23 word error rate was reached in the test set of the same data set.

Anahtar Kelimeler

Kaynakça

  1. Akhilesh, A., Brinda, P., Keerthana, S., Gupta, D., & Vekkot, S. (2022). Tamil speech recognition using XLSR Wav2Vec2.0 & CTC algorithm. 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-6. https://doi.org/10.1109/ICCCNT54827.2022.9984422 google scholar
  2. Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. ICML’16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, 173-182. https://dl.acm.org/doi/10.5555/3045390.3045410 google scholar
  3. Annam, S. V., Neelima, N., Parasa, N., & Chinamuttevi, D. (2023, March). Automated Home Life using IoT and Speech Recognition. In 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA) (pp. 809-813). IEEE. google scholar
  4. Baevski, A., Schneider, S., & Auli, M. (2019). vq-wav2vec: Self-supervised learning of discrete speech representations. arXiv. https://doi.org/10.48550/arXiv.1910.05453 google scholar
  5. Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representa-tions. Advances in neural information processing systems: 34th conference on neural information processing systems (NeurIPS 2020), https://proceedings.neurips.cc/paper_files/paper/2020 google scholar
  6. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., ... & Wellekens, C. (2007). Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786. https://doi.org/10.1016/j.specom.2007.02.006 google scholar
  7. Chi, P. H., Chung, P. H., Wu, T. H., Hsieh, C. C., Chen, Y. H., Li, S. W., & Lee, H. Y. (2021). Audio albert: A lite bert for self-supervised learning of audio representation. 2021 IEEE Spoken Language Technology Workshop (SLT), 344-350. https://doi.org/10.1109/SLT48900.2021.9383575 google scholar
  8. Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., ... & Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), 4774-4778. https://doi.org/10.1109/ICASSP.2018.8462105 google scholar

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yazılım Mühendisliği (Diğer)

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

28 Haziran 2024

Gönderilme Tarihi

6 Ağustos 2023

Kabul Tarihi

30 Kasım 2023

Yayımlandığı Sayı

Yıl 2024 Cilt: 8 Sayı: 1

Kaynak Göster

APA
Taşar, D. E., Koruyan, K., & Çılgın, C. (2024). Transformer-Based Turkish Automatic Speech Recognition. Acta Infologica, 8(1), 1-10. https://doi.org/10.26650/acin.1338604
AMA
1.Taşar DE, Koruyan K, Çılgın C. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 2024;8(1):1-10. doi:10.26650/acin.1338604
Chicago
Taşar, Davut Emre, Kutan Koruyan, ve Cihan Çılgın. 2024. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica 8 (1): 1-10. https://doi.org/10.26650/acin.1338604.
EndNote
Taşar DE, Koruyan K, Çılgın C (01 Haziran 2024) Transformer-Based Turkish Automatic Speech Recognition. Acta Infologica 8 1 1–10.
IEEE
[1]D. E. Taşar, K. Koruyan, ve C. Çılgın, “Transformer-Based Turkish Automatic Speech Recognition”, ACIN, c. 8, sy 1, ss. 1–10, Haz. 2024, doi: 10.26650/acin.1338604.
ISNAD
Taşar, Davut Emre - Koruyan, Kutan - Çılgın, Cihan. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica 8/1 (01 Haziran 2024): 1-10. https://doi.org/10.26650/acin.1338604.
JAMA
1.Taşar DE, Koruyan K, Çılgın C. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 2024;8:1–10.
MLA
Taşar, Davut Emre, vd. “Transformer-Based Turkish Automatic Speech Recognition”. Acta Infologica, c. 8, sy 1, Haziran 2024, ss. 1-10, doi:10.26650/acin.1338604.
Vancouver
1.Davut Emre Taşar, Kutan Koruyan, Cihan Çılgın. Transformer-Based Turkish Automatic Speech Recognition. ACIN. 01 Haziran 2024;8(1):1-10. doi:10.26650/acin.1338604