Turkish Speech Recognition Based On Deep Neural Networks

Cilt: 22 5 Ekim 2018
PDF İndir

Turkish Speech Recognition Based On Deep Neural Networks

Öz

In this paper we develop a Turkish speech recognition (SR) system  using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus. Nowadays most SR systems deployed worldwide and particularly in Turkey use Hidden Markov Models to deal with the speech temporal variations. Gaussian mixture models are used to estimate the amount at which each state of each HMM fits a short frame of coefficients which is the representation of an acoustic input. A deep neural network consisting of feed-forward neural network is another way to estimate the fit; this neural network takes as input several frames of coefficients and gives as output posterior probabilities over HMM states. It has been shown that the use of deep neural networks can outperform the traditional GMM-HMM in other languages such as English and German. The fact that Turkish language is an agglutinative language and the lack of a huge amount of speech data complicate the design of a performant SR system. By making use of deep neural networks we will obviously improve the performance but still we will not achieve better result than English language due to the difference in the availability of speech data. We present various architectural and training techniques for the Turkish DNN-based models. The models are tested using a Turkish database collected from mobile devices. In the experiments, we observe that the Turkish DNN-HMM system have decreased the word error rate approximately 2.5% when compared to the GMM-HMM traditional system.

Anahtar Kelimeler

Kaynakça

  1. [1] Baker, J.M., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shaugnessy, D. 2009. Research developments and directions in speech recognition and understanding, part 1. IEEE Signal Processing Magazine, vol. 26, no. 3, 75–80.
  2. [2] Baker, J.M., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shaugnessy, D. 2009. Research developments and directions in speech recognition and understanding, part 2. IEEE Signal Processing Magazine, vol. 26, no. 4, 78–85.
  3. [3] He, X., Deng, L., Chou., W. 2008. Discriminative learning in sequential pattern recognition. IEEE Signal Processing Magazine, vol. 25, no.5, 14– 36.
  4. [4] Valtchev, V., Young, S. J., Kapadia, S. 1993. MMI training for continuous phoneme recognition on the TIMIT database. In Proc. ICASSP, vol.2, 491–494.
  5. [5] Juang, B. H., Hou, W., Lee, C.H. 1997. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, 257–265.
  6. [6] McDermott, E., Nakamura, A., Hazen, T.J. 2007. Discriminative training for large vocabulary speech recognition using minimum classification error. IEEE Transactions on Speech and Audio Processing, vol. 15, no. 1, 203–223.
  7. [7] Povey, D., Woodland, P. 2002. Minimum phone error and i-smoothing for improved discriminative training. In Proc. ICASSP, vol. 1, 105–108.
  8. [8] Povey, D. 2003. Discriminative training for large vocabulary speech recognition Ph.D. dissertation, Cambridge University Engineering Dept, 13-21.

Ayrıntılar

Birincil Dil

Türkçe

Konular

-

Bölüm

-

Yayımlanma Tarihi

5 Ekim 2018

Gönderilme Tarihi

28 Aralık 2017

Kabul Tarihi

-

Yayımlandığı Sayı

Yıl 2018 Cilt: 22

Kaynak Göster

APA
Kımanuka, U. A., & Buyuk, O. (2018). Turkish Speech Recognition Based On Deep Neural Networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22, 319-329. https://izlik.org/JA63BD74HJ
AMA
1.Kımanuka UA, Buyuk O. Turkish Speech Recognition Based On Deep Neural Networks. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. 2018;22:319-329. https://izlik.org/JA63BD74HJ
Chicago
Kımanuka, Ussen Abre, ve Osman Buyuk. 2018. “Turkish Speech Recognition Based On Deep Neural Networks”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22 (Ekim): 319-29. https://izlik.org/JA63BD74HJ.
EndNote
Kımanuka UA, Buyuk O (01 Ekim 2018) Turkish Speech Recognition Based On Deep Neural Networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22 319–329.
IEEE
[1]U. A. Kımanuka ve O. Buyuk, “Turkish Speech Recognition Based On Deep Neural Networks”, Süleyman Demirel Üniv. Fen Bilim. Enst. Derg., c. 22, ss. 319–329, Eki. 2018, [çevrimiçi]. Erişim adresi: https://izlik.org/JA63BD74HJ
ISNAD
Kımanuka, Ussen Abre - Buyuk, Osman. “Turkish Speech Recognition Based On Deep Neural Networks”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22 (01 Ekim 2018): 319-329. https://izlik.org/JA63BD74HJ.
JAMA
1.Kımanuka UA, Buyuk O. Turkish Speech Recognition Based On Deep Neural Networks. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. 2018;22:319–329.
MLA
Kımanuka, Ussen Abre, ve Osman Buyuk. “Turkish Speech Recognition Based On Deep Neural Networks”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, c. 22, Ekim 2018, ss. 319-2, https://izlik.org/JA63BD74HJ.
Vancouver
1.Ussen Abre Kımanuka, Osman Buyuk. Turkish Speech Recognition Based On Deep Neural Networks. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. [Internet]. 01 Ekim 2018;22:319-2. Erişim adresi: https://izlik.org/JA63BD74HJ

e-ISSN :1308-6529
Linking ISSN (ISSN-L): 1300-7688

Dergide yayımlanan tüm makalelere ücretiz olarak erişilebilinir ve Creative Commons CC BY-NC Atıf-GayriTicari lisansı ile açık erişime sunulur. Tüm yazarlar ve diğer dergi kullanıcıları bu durumu kabul etmiş sayılırlar. CC BY-NC lisansı hakkında detaylı bilgiye erişmek için tıklayınız.