Research Article
BibTex RIS Cite

A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems

Year 2021, Issue: 24, 87 - 92, 15.04.2021
https://doi.org/10.31590/ejosat.900422

Abstract

Speech recognition is the transformation of spoken words and sentences into text. There have been many studies on speech recognition in many countries recently. However, studies on speech recognition applications in our country are very few, one of the reasons is the lack of voice dataset. In this study, a Turkish speech database has been developed for Turkish speech recognition based systems. Sound recordings were obtained from news broadcasted by Turkish news tv channels at different times. The created data set was shared on the web in a way that everyone can access in order to set a precedent for other studies. Additionally, the effects of number of layers and number of cells hyperparameters of Long Short Term Memory (LSTM) and Deep Neural Network (DNN) models were investigated on the Turkish Broadcast News Speech Database.

References

  • Bengio, Y., 2009. "Learning Deep Architectures for AI" (PDF). Foundations and Trends in Machine Learning. 1–127.
  • Gaikwad, S., Gawali, B. W., & Yannawar, P. 2010. A review on Speech Recognition Technique. , pp. 16-24
  • Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE.
  • Graves, A., Jaitly, N., & Mohamed, A. R. (2013b, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
  • Hizlisoy, S., 2020. Music Emotion Recognition Using Convolutional Long Short Memory Deep Neural Networks.
  • Patlar, F., 2009. A Continuous Speech Recognition System For Turkish Language Based On Triphone Model.
  • Sepp Hochreiter; Jürgen Schmidhuber (1997). "LSTM can Solve Hard Long Time Lag Problems". Advances in Neural Information Processing Systems 9. Advances in Neural Information Processing Systems. Wikidata Q77698282.
  • Tüfekci, Z., and Dokuz, Y., 2020. Investigation of the Effect of LSTM Hyperparameters on Speech Recognition Performance , European Journal of Science and Technology: p. 165.
  • Yu, D., & Deng, L. (2016). Automatic Speech Recognition: A Deep Learning Approach. Springer
  • Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B. (2016). Learning Contextual Dependence with Convolutional Hierarchical Recurrent Neural Networks. IEEE Transactions on Image Processing, 25, 2983-2996.

Derin Sinir Ağları ve Uzun Kısa Süreli Bellek Hiperparametrelerinin Konuşma Tanıma Tabanlı Sistemler Üzerindeki Etkisinin İncelenmesi için Türkçe Yayın Haberleri Konuşma Veri Tabanı

Year 2021, Issue: 24, 87 - 92, 15.04.2021
https://doi.org/10.31590/ejosat.900422

Abstract

Konuşma tanıma, söylenen kelime ve cümlelerin metne dönüştürülmesidir. Son zamanlarda birçok ülkede konuşma tanıma ile ilgili birçok çalışma yapılmıştır, fakat ülkemizde konuşma tanıma uygulamaları ile ilgili yapılan çalışmalar çok azdır, bunun nedenlerinden biri ses veri seti eksikliğidir. Bu çalışmada, Türkçe konuşma tanıma tabanlı sistemler için bir Türkçe konuşma veri tabanı geliştirilmiştir. Ses kayıtları Türkçe haber tv kanallarının farklı zamanlarda yayınladıkları haberlerden elde edilmiştir. Oluşturulan veri seti diğer çalışmalara da emsal teşkil etmesi açısından herkesin erişebileceği şekilde web ortamında paylaşılmıştır. Ek olarak, katman sayısı ve hücre sayısı hiper parametrelerinin Uzun Kısa Süreli Hafıza (LSTM) ve Derin Sinir Ağı (DNN) modelleri üzerindeki etkisi oluşturduğumuz Türkçe Yayın Haberleri Konuşma veri seti üzerinde incelendi ve karşılaştırıldı.

References

  • Bengio, Y., 2009. "Learning Deep Architectures for AI" (PDF). Foundations and Trends in Machine Learning. 1–127.
  • Gaikwad, S., Gawali, B. W., & Yannawar, P. 2010. A review on Speech Recognition Technique. , pp. 16-24
  • Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE.
  • Graves, A., Jaitly, N., & Mohamed, A. R. (2013b, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
  • Hizlisoy, S., 2020. Music Emotion Recognition Using Convolutional Long Short Memory Deep Neural Networks.
  • Patlar, F., 2009. A Continuous Speech Recognition System For Turkish Language Based On Triphone Model.
  • Sepp Hochreiter; Jürgen Schmidhuber (1997). "LSTM can Solve Hard Long Time Lag Problems". Advances in Neural Information Processing Systems 9. Advances in Neural Information Processing Systems. Wikidata Q77698282.
  • Tüfekci, Z., and Dokuz, Y., 2020. Investigation of the Effect of LSTM Hyperparameters on Speech Recognition Performance , European Journal of Science and Technology: p. 165.
  • Yu, D., & Deng, L. (2016). Automatic Speech Recognition: A Deep Learning Approach. Springer
  • Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B. (2016). Learning Contextual Dependence with Convolutional Hierarchical Recurrent Neural Networks. IEEE Transactions on Image Processing, 25, 2983-2996.
There are 10 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Serhat Ok 0000-0002-9764-2952

Zekeriya Tüfekci 0000-0001-7835-2741

Publication Date April 15, 2021
Published in Issue Year 2021 Issue: 24

Cite

APA Ok, S., & Tüfekci, Z. (2021). A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems. Avrupa Bilim Ve Teknoloji Dergisi(24), 87-92. https://doi.org/10.31590/ejosat.900422