Research Article
BibTex RIS Cite

Mel-Frekans Kepstral Katsayılar ve Gizli Markov Model Kullanılarak Türkçe Konuşma Tanıma

Year 2019, Volume: 2 Issue: 2, 39 - 44, 30.12.2019

Abstract

Bu makalede, Türkçe söylenen sayıların tanınmasına yönelik yeni bir sistem önerilmiştir. Özellik çıkarımı yöntemi olarak Mel frekanslı Kepstral Katsayıları (MFKK) algoritması, her fonetik modelleme olarak ise Gaussian Gizli Markov modeli kullanılmıştır. 7 kadın ve 13 erkekten oluşan 20 denekten toplanan eğitim veri setinde Türkçe rakamların 0'dan 10'a kadar olduğunu söyleyen ses dosyaları vardır. Her dosyada yalıtılmış bir ortamda kaydedilen saniyede 8000 Hz'de örneklenen ve 1 saniye uzunluğunda ses bulunmaktadır. Sistem, farklı kişilerden alınan rastgele kayıtlar kullanarak test edilmiştir. Eğitim dosyaları 220, test dosyaları ise 18 ses içermektedir. Sistem testlerde % 83.3 doğruluk,% 86 hassasiyet ve% 83 hatırlama oranlarına ulaşmıştır.

References

  • Rabiner, Lawrence R., and Biing-Hwang Juang. Fundamentals of speech recognition. Vol. 14. Englewood Cliffs: PTR Prentice Hall, 1993.
  • Deller, John R., John HL Hansen, and John G. Proakis. "Discrete-time processing of speech signals." (2000): 595-602
  • Motlıcek, Petr. Feature extraction in speech coding and recognition. Technical Report of PhD research internship in ASP Group, OGI-OHSU, http://www. fit. vutbr. cz/∼ motlicek/publi/2002/rep ogi. pdf, 2002
  • Hagen, Andreas, Daniel A. Connors, and Bryan L. Pellom. "The analysis and design of architecture systems for speech recognition on modern handheld-computing devices." Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/ Software codesign and system synthesis. ACM, 2003
  • Ishizuka, Kentaro, and Tomohiro Nakatani. "A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition." Speech communication 48.11 (2006): 1447-1457
  • Xu, Min, et al. "HMM-based audio keyword generation." Pacific-Rim Conference on Multimedia. Springer, Berlin, Heidelberg, 2004
  • G. Evermann, H. Y. Chan, M. J. F. Gales, T. Hain, X. Liu, D. Mrva, L. Wang,and P. Woodland, “Development of the 2003 CU-HTK conversational telephone speech transcription system,” in Proceedings of ICASSP, Montreal, Canada, 2004
  • S. Matsoukas, J.-L. Gauvain, A. Adda, T. Colthurst, C. I. Kao, O. Kimball, L. Lamel, F. Lefevre, J. Z. Ma, J. Makhoul, L. Nguyen, R. Prasad, R. Schwartz, H. Schwenk, and B. Xiang, “Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1541–1556, September 2006
  • H. Soltau, B. Kingsbury, L. Mangu, D. Povey, G. Saon, and G. Zweig, “The IBM 2004 conversational telephony system for rich transcription,” in Proceedings of ICASSP, Philadelphia, PA, 2005
  • J. K. Baker, “The Dragon system — An overview,” IEEE Transactions on Acoustics Speech and Signal Processing, vol. 23, no. 1, pp. 24–29, 1975
  • F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of IEEE, vol. 64, no. 4, pp. 532–556, 1976
  • B. T. Lowerre, The Harpy Speech Recognition System. PhD thesis, Carnegie Mellon, 1976
  • B.-H. Juang, “On the hidden Markov model and dynamic time warping for speech recognition — A unified view,” AT and T Technical Journal, vol. 63,no. 7, pp. 1213–1243, 1984
  • B.-H. Juang, “Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains,” AT and T Technical Journal,vol. 64, no. 6, pp. 1235–1249, 1985
  • S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Systems Technical Journal, vol. 62, no. 4,pp. 1035–1074, 1983

Turkish Speech recognition using Mel-frequency cepstral coefficients(MFCC) and Hidden Markov Model (HMM)

Year 2019, Volume: 2 Issue: 2, 39 - 44, 30.12.2019

Abstract

In this paper, a new Turkish
spoken number recognition system proposed. The Mel-frequency cepstral
coefficients (MFCC) algorithm used as a feature extraction method, the Gaussian
Hidden Markov model, used for numbers phonemes modeling where each number has a
Markov model. The system trained on a dataset collected from 20 subjects that
includes 7 females and 13 males. Each one says the Turkish numbers from “zero”
to “ten”. Audio files sampled at 8000Hz at each second and each file has
one-second length and recorded in an isolated environment. We tested the system
using random records for different people. The training files include 220 audio
record and testing files include 18 audio record. The system achieves %83.3
accuracy, %86 precision, and %83 recall rates.

References

  • Rabiner, Lawrence R., and Biing-Hwang Juang. Fundamentals of speech recognition. Vol. 14. Englewood Cliffs: PTR Prentice Hall, 1993.
  • Deller, John R., John HL Hansen, and John G. Proakis. "Discrete-time processing of speech signals." (2000): 595-602
  • Motlıcek, Petr. Feature extraction in speech coding and recognition. Technical Report of PhD research internship in ASP Group, OGI-OHSU, http://www. fit. vutbr. cz/∼ motlicek/publi/2002/rep ogi. pdf, 2002
  • Hagen, Andreas, Daniel A. Connors, and Bryan L. Pellom. "The analysis and design of architecture systems for speech recognition on modern handheld-computing devices." Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/ Software codesign and system synthesis. ACM, 2003
  • Ishizuka, Kentaro, and Tomohiro Nakatani. "A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition." Speech communication 48.11 (2006): 1447-1457
  • Xu, Min, et al. "HMM-based audio keyword generation." Pacific-Rim Conference on Multimedia. Springer, Berlin, Heidelberg, 2004
  • G. Evermann, H. Y. Chan, M. J. F. Gales, T. Hain, X. Liu, D. Mrva, L. Wang,and P. Woodland, “Development of the 2003 CU-HTK conversational telephone speech transcription system,” in Proceedings of ICASSP, Montreal, Canada, 2004
  • S. Matsoukas, J.-L. Gauvain, A. Adda, T. Colthurst, C. I. Kao, O. Kimball, L. Lamel, F. Lefevre, J. Z. Ma, J. Makhoul, L. Nguyen, R. Prasad, R. Schwartz, H. Schwenk, and B. Xiang, “Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1541–1556, September 2006
  • H. Soltau, B. Kingsbury, L. Mangu, D. Povey, G. Saon, and G. Zweig, “The IBM 2004 conversational telephony system for rich transcription,” in Proceedings of ICASSP, Philadelphia, PA, 2005
  • J. K. Baker, “The Dragon system — An overview,” IEEE Transactions on Acoustics Speech and Signal Processing, vol. 23, no. 1, pp. 24–29, 1975
  • F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of IEEE, vol. 64, no. 4, pp. 532–556, 1976
  • B. T. Lowerre, The Harpy Speech Recognition System. PhD thesis, Carnegie Mellon, 1976
  • B.-H. Juang, “On the hidden Markov model and dynamic time warping for speech recognition — A unified view,” AT and T Technical Journal, vol. 63,no. 7, pp. 1213–1243, 1984
  • B.-H. Juang, “Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains,” AT and T Technical Journal,vol. 64, no. 6, pp. 1235–1249, 1985
  • S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Systems Technical Journal, vol. 62, no. 4,pp. 1035–1074, 1983
There are 15 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Hasan Erdinc Kocer 0000-0002-0799-2140

Mustafa Cumaah Ahmed This is me 0000-0002-6014-6007

Publication Date December 30, 2019
Published in Issue Year 2019 Volume: 2 Issue: 2

Cite

APA Kocer, H. E., & Ahmed, M. C. (2019). Turkish Speech recognition using Mel-frequency cepstral coefficients(MFCC) and Hidden Markov Model (HMM). Veri Bilimi, 2(2), 39-44.



Dergimizin Tarandığı Dizinler (İndeksler)


Academic Resource Index

logo.png

journalseeker.researchbib.com

Google Scholar

scholar_logo_64dp.png

ASOS Index

asos-index.png

Rooting Index

logo.png

www.rootindexing.com

The JournalTOCs Index

journal-tocs-logo.jpg?w=584

www.journaltocs.ac.uk

General Impact Factor (GIF) Index

images?q=tbn%3AANd9GcQ0CrEQm4bHBnwh4XJv9I3ZCdHgQarj_qLyPTkGpeoRRmNh10eC

generalif.com

Directory of Research Journals Indexing

DRJI_Logo.jpg

olddrji.lbp.world/indexedJournals.aspx

I2OR Index

8c492a0a466f9b2cd59ec89595639a5c?AccessKeyId=245B99561176BAE11FEB&disposition=0&alloworigin=1

http://www.i2or.com/8.html



logo.png