Araştırma Makalesi

DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE

Cilt: 9 Sayı: 16 14 Nisan 2022
PDF İndir
TR EN

DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE

Abstract

The most fundamental problem in the automatic speech recognition systems is not the development of a domainspecific automatic speech recognition system, but the development of an automatic speech recognition system with a large vocabulary. Developed automatic speech recognition systems should be tested with a large vocabulary test dataset. For this reason, an automatic speech recognition test corpus was prepared within the scope of the study. Prepared automatic speech recognition test corpus includes conversations from 20 different areas and text files of these conversations. The test procedure presented in the study was also tested on Turkish automatic speech recognition systems with a large vocabulary. It has been observed that the word error rate results ranged between 14-21%. The test corpus and test procedure with a large vocabulary prepared are guiding for the success of automatic speech recognition systems in future studies to be revealed more clearly.

Keywords

Speech recognition , Turkish speech recognition , speech corpus , test corpus , Turkish speech corpus

Kaynakça

  1. Prakoso H, Ferdiana R, Hartanto R. Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset. International Symposium Electronic Smart Devices 2016: 283-286.
  2. Miao Y. Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN. arXiv CoRR, 2014;1401.6:1-4, 2014.
  3. Yang X, Audhkhasi K, Rosenberg A, Thomas S, Ramabhadran B, Hasegawa-Johnson M. Joint modeling of accents and acoustics for multi-accent speech recognition. IEEE International Conference Acoustic Speech Signal Processing. 2018:5989-5993.
  4. Rebai I, Benayed Y, Mahdi W, Lorré J.P. Improving speech recognition using data augmentation and acoustic model fusion. Procedia Computer Science. 2017; 112:316-322.
  5. Jain A, Singh V.P, Rath S.P. A multi-accent acoustic model using mixture of experts for speech recognition. Annual Conference International Speech Communication Association. 2019: 779-783.
  6. Zeineldeen M, Glushko A, Michel W, Zeyer A, Schlüter R, Ney H. Investigating methods to improve language model integration for attention-based encoder-decoder ASR models. Annual Conference of the International Speech Communication Association. 2021: 2856-2860.
  7. Gandhe A, Rastrow A. Audio-attention discriminative language model for ASR rescoring. International Conference Acoustic Speech Signal Processing. 2020: 7944-7948.
  8. Anusuya M.A, Katti S.K. Speech recognition by machine, a review. International Journal of Computer Science and Information Security. 2009; 6:181-205.
  9. Dikici E, Saraçlar M. Semi-supervised and unsupervised discriminative language model training for automatic speech recognition. Speech Communication. 2016; 83:54-63.
  10. Irie K, Tüske Z, Alkhouli T, Schlüter R, Ney H. LSTM, GRU, highway and a bit of attention: An empirical overview for language modeling in speech recognition. Annual Conference of the International Speech Communication Association. 2016: 08-12.

Kaynak Göster

APA
Oyucu, S. (2022). DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, 9(16), 156-164. https://doi.org/10.54365/adyumbd.1038766
AMA
1.Oyucu S. DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2022;9(16):156-164. doi:10.54365/adyumbd.1038766
Chicago
Oyucu, Saadin. 2022. “DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 9 (16): 156-64. https://doi.org/10.54365/adyumbd.1038766.
EndNote
Oyucu S (01 Nisan 2022) DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 9 16 156–164.
IEEE
[1]S. Oyucu, “DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE”, Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, c. 9, sy 16, ss. 156–164, Nis. 2022, doi: 10.54365/adyumbd.1038766.
ISNAD
Oyucu, Saadin. “DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 9/16 (01 Nisan 2022): 156-164. https://doi.org/10.54365/adyumbd.1038766.
JAMA
1.Oyucu S. DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2022;9:156–164.
MLA
Oyucu, Saadin. “DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, c. 9, sy 16, Nisan 2022, ss. 156-64, doi:10.54365/adyumbd.1038766.
Vancouver
1.Saadin Oyucu. DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 01 Nisan 2022;9(16):156-64. doi:10.54365/adyumbd.1038766