Araştırma Makalesi
BibTex RIS Kaynak Göster

VOICE RECOGNITION WITH MACHINE LEARNING TECHNIQUES: LITERATURE REVIEW

Yıl 2025, Cilt: 9 Sayı: 2, 251 - 270, 31.12.2025
https://izlik.org/JA36DG22JW

Öz

This study aims to compare different machine learning methods and determine the most effective method by reviewing the literature on machine learning methods used in the field of voice recognition. Within the scope of the study, 30 studies were analyzed in detail by reviewing the current literature published between January 2023 and March 2025. The study concluded that deep learning-based approaches, especially Transformer architectures and end-to-end learning models, exhibit higher accuracy and robustness compared to traditional methods. However, instead of a single “best” approach for an ideal voice recognition system, a combination of different approaches may be more appropriate depending on the application scenario, available resources and target user group. Despite the significant progress made in the development of voice recognition systems, several problems and limitations still exist, such as the need for large amounts of labeled data and performance degradation in noisy environments. For future research, semi-supervised and self-supervised learning approaches for low-resource languages, hybrid model architectures, multi-task and multi-modal learning approaches, neuromorphological computational approaches, and standardized evaluation metrics are proposed

Kaynakça

  • Accou, B., Vanthornhout, J., Van hamme, H., & Francart, T. (2023). Decoding of the speech envelope from EEG using the VLAAI deep neural network. Scientific Reports, 13(812), 1-14. https://doi.org/10.1038/s41598-022-27332-2
  • Ahmed, I., Irfan, M. A., Iqbal, A., Khalil, A., & Siddiqui, S. I. (2024). Efficient feature extraction and classification for the development of Pashto speech recognition system. Multimedia Tools and Applications, 83, 54081-54096. https://doi.org/10.1007/s11042-023-17684-w
  • Alzaabi, S., Alzahmi, A., Almheiri, M., Al-Ali, N., Ali, N., Poon, K., & Alteneiji, A. (2024). Bilingual Speech Recognition On the Edge Using Machine Learning. 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), 1-6. https://doi.org/10.1109/ICSPIS63676.2024.10812600
  • Ameen, Z. J. M., & Kadhim, A. A. (2023). Machine learning for Arabic phonemes recognition using electrolarynx speech. International Journal of Electrical and Computer Engineering (IJECE), 13(1), 400-412. https://doi.org/10.11591/ijece.v13i1.pp400-412
  • Ayall, T. A., Zhou, C., Liu, H., Brhanemeskel, G. M., Abate, S. T., & Adjeisah, M. (2024). Amharic spoken digits recognition using convolutional neural network. Journal of Big Data, 11(64). https://doi.org/10.1186/s40537-024-00910-z Barhoush, M., Hallawa, A., & Schmeink, A. (2023). Speaker identification and localization using shuffled MFCC features and deep learning. International Journal of Speech Technology, 26, 185-196. https://doi.org/10.1007/s10772-023-10023-2
  • Chen, J. (2023). Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-01995-0
  • Chen, Z., Huang, H., Andrusenko, A., Hrinchuk, O., Puvvada, K. C., Li, J., Ghosh, S., Balam, J., & Ginsburg, B. (2023). SALM: Speech-augmented language model with in-context learning for speech recognition and translation. arXiv preprint arXiv:2310.09424v1. https://arxiv.org/abs/2310.09424v1
  • Chu, H., Jia, W., Liu, Y., Zan, Y., Bai, Y., Xiao, S., Zheng, Y., Xie, Z., Xie, S., Zhang, S., Zhou, M., & Huang, S. (2023). Qwen2-Audio: Advancing universal audio understanding via unified large language models. arXiv preprint arXiv:2311.07919.
  • Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., Zhou, C., Zhou, J., & Qwen Team. (2024). Qwen2-Audio technical report. arXiv preprint arXiv:2407.10759v1. https://arxiv.org/abs/2407.10759v1
  • Dhanjal, A. S., & Singh, W. (2023). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83, 23367-23412. https://doi.org/10.1007/s11042-023-16438-y
  • Froiz-Míguez, I., Blanco-Novoa, Ó., Fraga-Lamas, P., Fustes, D., Dafonte, C., Pereira, J., & Fernández-Caramés, T. M. (2023). Design and evaluation of a cross-lingual ML-based automatic speech recognition system fine-tuned for the Galician language. Kalpa Publications in Computing, 14, 152-155.
  • Ganchev, R. (2021). Voice Signal Processing for Machine Learning. The Case of Speaker Isolation: Overview and Evaluation of Decomposition Methods Applied to the Input Signal of Voice Processing ML Models - The Use Case of the Speaker Isolation Problem. Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics.
  • Gençyılmaz, İ. Z., & Karaoğlan, K. M. (2024). Optimizing Speech to Text Conversion in Turkish: An Analysis of Machine Learning Approaches. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 13(2), 492-504. https://doi.org/10.17798/bitlisfen.1434925
  • Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint audio and speech understanding. arXiv preprint arXiv:2309.14405v3. https://arxiv.org/abs/2309.14405v3
  • Gourisaria, M. K., Agrawal, R., Sahni, M., & Singh, P. K. (2024). Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discover Internet of Things, 4(1), 1-24. https://doi.org/10.1007/s43926-023-00049-y
  • Guo, J. (2023). Innovative Application of Sensor Combined with Speech Recognition Technology in College English Education in the Context of Artificial Intelligence. Journal of Sensors, 2023, Article ID 9281914, 1-11. https://doi.org/10.1155/2023/9281914
  • Hamian, M., Faez, K., Nazari, S., & Sabeti, M. (2023). A novel learning approach in deep spiking neural networks with multi-objective optimization algorithms for automatic digit speech recognition. The Journal of Supercomputing, 79, 20263–20288. https://doi.org/10.1007/s11227-023-05420-y
  • Jia, Y. (2023). A Deep Learning System for Domain-Specific Speech Recognition. arXiv preprint arXiv:2303.10510v2. https://arxiv.org/abs/2303.10510v2
  • Keser, S. (2023). Makine Öğrenimi ve Hibrit Altuzay Sınıflandırıcılar için Yalıtık Kelime Tanıma Performanslarının Karşılaştırılması. Sürdürülebilir Mühendislik Uygulamaları ve Teknolojik Gelişmeler Dergisi, 6(2), 235-249. https://doi.org/10.51764/smutgd.1338977
  • Kheddar, H., Himeur, Y., Al-Maadeed, S., Amira, A., & Bensaali, F. (2023). Deep transfer learning for automatic speech recognition: Towards better generalization. arXiv preprint arXiv:2304.14535v2. https://arxiv.org/abs/2304.14535v2
  • Kumar, L. A., Renuka, D. K., & Priya, M. C. S. (2023). Towards robust speech recognition model using deep learning. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS) (pp. 253-256). IEEE. https://doi.org/10.1109/ICISCOIS56541.2023.10100390
  • Kwon, H. (2024). AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimedia Tools and Applications, 83, 57943-57962. https://doi.org/10.1007/s11042-023-15961-2
  • Liu, A. H., Hsu, W. N., Auli, M., & Baevski, A. (2023). Towards end-to-end unsupervised speech recognition. In 2022 IEEE Spoken Language Technology Workshop (SLT) (pp. 221-228). IEEE.
  • Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411-9457. https://doi.org/10.1007/s11042-020-10073-7
  • Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Multimedia Tools and Applications, 1-72. https://doi.org/10.1016/j.inffus.2023.101755
  • Mishra, A., Verma, R., Dhanda, N., & Gupta, K. K. (2024). Speech Recognition Using Machine Learning Techniques. 2024 2nd International Conference on Disruptive Technologies (ICDT), 1142-1146. https://doi.org/10.1109/ICDT61202.2024.10489508
  • Prabhavalkar, R., Hori, T., Sainath, T. N., Schlüter, R., & Watanabe, S. (2023). End-to-end speech recognition: A survey. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 325-353. https://doi.org/10.1109/TASLP.2023.3328283
  • Rahate, J., Tadepalli, S. N. V. R., Saroj, U., Kamble, A., & Ghare, P. (2023). Silent Speech Recognition using EEG Signals. 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), 1-5. https://doi.org/10.1109/PCEMS58491.2023.10136068
  • Rai, S., Li, T., & Lyu, B. (2023). Keyword spotting - Detecting commands in speech using deep learning. arXiv preprint arXiv:2312.05640.
  • Reid, K., & Williams, E. T. (2023). Right the docs: Characterising voice dataset documentation practices used in machine learning. arXiv preprint arXiv:2303.10721v1. https://arxiv.org/abs/2303.10721v1
  • Shisode, S., Mhatre, B., Sikligar, J., Vishwakarma, S., Tupe, S., Jagtap, S., & Nemade, M. (2023). SEMG approach for speech recognition. International Journal of Advanced Research in Computer Science, 14(3), 23-28. http://dx.doi.org/10.26483/ijarcs.v14i3.6970
  • Shukla, A., Aanand, A., & Nithiya, S. (2023). Automatic Speech Recognition using Machine Learning Techniques. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1-6. https://doi.org/10.1109/ICCCI56745.2023.10128212
  • Tandel, N. H., Prajapati, H. B., & Dabhi, V. K. (2020). Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey. 2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS), 459-465. IEEE.
  • Wang, S. (2023). Recognition of English speech – using a deep learning algorithm. Journal of Intelligent Systems, 32, 20220236. https://doi.org/10.1515/jisys-2022-0236
  • Wang, T., Zhou, L., Zhang, Z., Wu, Y., Liu, S., Gaur, Y., Chen, Z., Li, J., & Wei, F. (2023). VIOLA: Unified codec language models for speech recognition, synthesis, and translation. arXiv preprint arXiv:2305.16107v1. https://arxiv.org/abs/2305.16107v1
  • Weng, Z., Qin, Z., Tao, X., Pan, C., Liu, G., & Li, G. Y. (2023). Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Transactions on Wireless Communications, 22(9), 6227-6240. https://doi.org/10.1109/TWC.2023.3240969
  • Wojnar, T., Hryszko, J., & Roman, A. (2024). Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models. EURASIP Journal on Audio, Speech, and Music Processing, 2024(24). https://doi.org/10.1186/s13636-024-00343-9
  • Xie, Y. (2023). Application of speech recognition technology based on machine learning for network oral English teaching system. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-02143-4
  • Yao, Z., Guo, L., Yang, X., Kang, W., Kuang, F., Yang, Y., Jin, Z., Lin, L., & Povey, D. (2024). Zipformer: A faster and better encoder for automatic speech recognition. Published as a conference paper at ICLR 2024. https://arxiv.org/abs/2310.11230v4
  • Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A survey of audio classification using deep learning. IEEE Access, 11, 106620-106621. https://doi.org/10.1109/ACCESS.2023.3318015
  • Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G., Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., Park, D. S., Haghani, P., Riesa, J., Perng, G., Soltau, H., Strohman, T., Ramabhadran, B., Sainath, T., Moreno, P., Chiu, C.-C., Schalkwyk, J., Beaufays, F., & Wu, Y. (2023). Google USM: Scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037v3. https://arxiv.org/abs/2303.01037v3

makine öğrenme teknikleri ile ses tanıma : literatür taraması

Yıl 2025, Cilt: 9 Sayı: 2, 251 - 270, 31.12.2025
https://izlik.org/JA36DG22JW

Öz

Bu çalışmada, ses tanıma alanında kullanılan makine öğrenme yöntemleriyle ilgili literatür taraması yapılarak, farklı makine öğrenme yöntemlerini karşılaştırmak ve en etkili yöntemi belirlemek amaçlanmıştır. Çalışma kapsamında, Ocak 2023-Mart 2025 tarihleri arasında yayınlanan güncel literatür taranarak 30 çalışma detaylı olarak incelenmiştir. Çalışmada, derin öğrenme tabanlı yaklaşımların, özellikle Transformer mimarileri ve uçtan uca öğrenme modellerinin, geleneksel yöntemlere kıyasla daha yüksek doğruluk ve sağlamlık sergilediği sonucuna varılmıştır. Bununla birlikte, ideal bir ses tanıma sistemi için tek bir "en iyi" yaklaşım yerine, uygulama senaryosuna, mevcut kaynaklara ve hedef kullanıcı grubuna bağlı olarak farklı yaklaşımların kombinasyonunun daha uygun olabileceği değerlendirilmiştir. Ses tanıma sistemlerinin gelişiminde kaydedilen önemli ilerlemelere rağmen, büyük miktarda etiketli veri ihtiyacı, gürültülü ortamlardaki performans düşüşleri gibi çeşitli problemler ve sınırlamalar hala mevcuttur. Gelecekteki araştırmalar için düşük kaynaklı diller için yarı-denetimli ve öz-denetimli öğrenme yaklaşımları, hibrit model mimarileri, çok görevli ve çok modlu öğrenme yaklaşımları, nöromorfolojik hesaplama yaklaşımları ve standartlaştırılmış değerlendirme metrikleri üzerine çalışmalar önerilmiştir.

Kaynakça

  • Accou, B., Vanthornhout, J., Van hamme, H., & Francart, T. (2023). Decoding of the speech envelope from EEG using the VLAAI deep neural network. Scientific Reports, 13(812), 1-14. https://doi.org/10.1038/s41598-022-27332-2
  • Ahmed, I., Irfan, M. A., Iqbal, A., Khalil, A., & Siddiqui, S. I. (2024). Efficient feature extraction and classification for the development of Pashto speech recognition system. Multimedia Tools and Applications, 83, 54081-54096. https://doi.org/10.1007/s11042-023-17684-w
  • Alzaabi, S., Alzahmi, A., Almheiri, M., Al-Ali, N., Ali, N., Poon, K., & Alteneiji, A. (2024). Bilingual Speech Recognition On the Edge Using Machine Learning. 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), 1-6. https://doi.org/10.1109/ICSPIS63676.2024.10812600
  • Ameen, Z. J. M., & Kadhim, A. A. (2023). Machine learning for Arabic phonemes recognition using electrolarynx speech. International Journal of Electrical and Computer Engineering (IJECE), 13(1), 400-412. https://doi.org/10.11591/ijece.v13i1.pp400-412
  • Ayall, T. A., Zhou, C., Liu, H., Brhanemeskel, G. M., Abate, S. T., & Adjeisah, M. (2024). Amharic spoken digits recognition using convolutional neural network. Journal of Big Data, 11(64). https://doi.org/10.1186/s40537-024-00910-z Barhoush, M., Hallawa, A., & Schmeink, A. (2023). Speaker identification and localization using shuffled MFCC features and deep learning. International Journal of Speech Technology, 26, 185-196. https://doi.org/10.1007/s10772-023-10023-2
  • Chen, J. (2023). Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-01995-0
  • Chen, Z., Huang, H., Andrusenko, A., Hrinchuk, O., Puvvada, K. C., Li, J., Ghosh, S., Balam, J., & Ginsburg, B. (2023). SALM: Speech-augmented language model with in-context learning for speech recognition and translation. arXiv preprint arXiv:2310.09424v1. https://arxiv.org/abs/2310.09424v1
  • Chu, H., Jia, W., Liu, Y., Zan, Y., Bai, Y., Xiao, S., Zheng, Y., Xie, Z., Xie, S., Zhang, S., Zhou, M., & Huang, S. (2023). Qwen2-Audio: Advancing universal audio understanding via unified large language models. arXiv preprint arXiv:2311.07919.
  • Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., Zhou, C., Zhou, J., & Qwen Team. (2024). Qwen2-Audio technical report. arXiv preprint arXiv:2407.10759v1. https://arxiv.org/abs/2407.10759v1
  • Dhanjal, A. S., & Singh, W. (2023). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83, 23367-23412. https://doi.org/10.1007/s11042-023-16438-y
  • Froiz-Míguez, I., Blanco-Novoa, Ó., Fraga-Lamas, P., Fustes, D., Dafonte, C., Pereira, J., & Fernández-Caramés, T. M. (2023). Design and evaluation of a cross-lingual ML-based automatic speech recognition system fine-tuned for the Galician language. Kalpa Publications in Computing, 14, 152-155.
  • Ganchev, R. (2021). Voice Signal Processing for Machine Learning. The Case of Speaker Isolation: Overview and Evaluation of Decomposition Methods Applied to the Input Signal of Voice Processing ML Models - The Use Case of the Speaker Isolation Problem. Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics.
  • Gençyılmaz, İ. Z., & Karaoğlan, K. M. (2024). Optimizing Speech to Text Conversion in Turkish: An Analysis of Machine Learning Approaches. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 13(2), 492-504. https://doi.org/10.17798/bitlisfen.1434925
  • Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint audio and speech understanding. arXiv preprint arXiv:2309.14405v3. https://arxiv.org/abs/2309.14405v3
  • Gourisaria, M. K., Agrawal, R., Sahni, M., & Singh, P. K. (2024). Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discover Internet of Things, 4(1), 1-24. https://doi.org/10.1007/s43926-023-00049-y
  • Guo, J. (2023). Innovative Application of Sensor Combined with Speech Recognition Technology in College English Education in the Context of Artificial Intelligence. Journal of Sensors, 2023, Article ID 9281914, 1-11. https://doi.org/10.1155/2023/9281914
  • Hamian, M., Faez, K., Nazari, S., & Sabeti, M. (2023). A novel learning approach in deep spiking neural networks with multi-objective optimization algorithms for automatic digit speech recognition. The Journal of Supercomputing, 79, 20263–20288. https://doi.org/10.1007/s11227-023-05420-y
  • Jia, Y. (2023). A Deep Learning System for Domain-Specific Speech Recognition. arXiv preprint arXiv:2303.10510v2. https://arxiv.org/abs/2303.10510v2
  • Keser, S. (2023). Makine Öğrenimi ve Hibrit Altuzay Sınıflandırıcılar için Yalıtık Kelime Tanıma Performanslarının Karşılaştırılması. Sürdürülebilir Mühendislik Uygulamaları ve Teknolojik Gelişmeler Dergisi, 6(2), 235-249. https://doi.org/10.51764/smutgd.1338977
  • Kheddar, H., Himeur, Y., Al-Maadeed, S., Amira, A., & Bensaali, F. (2023). Deep transfer learning for automatic speech recognition: Towards better generalization. arXiv preprint arXiv:2304.14535v2. https://arxiv.org/abs/2304.14535v2
  • Kumar, L. A., Renuka, D. K., & Priya, M. C. S. (2023). Towards robust speech recognition model using deep learning. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS) (pp. 253-256). IEEE. https://doi.org/10.1109/ICISCOIS56541.2023.10100390
  • Kwon, H. (2024). AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimedia Tools and Applications, 83, 57943-57962. https://doi.org/10.1007/s11042-023-15961-2
  • Liu, A. H., Hsu, W. N., Auli, M., & Baevski, A. (2023). Towards end-to-end unsupervised speech recognition. In 2022 IEEE Spoken Language Technology Workshop (SLT) (pp. 221-228). IEEE.
  • Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411-9457. https://doi.org/10.1007/s11042-020-10073-7
  • Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Multimedia Tools and Applications, 1-72. https://doi.org/10.1016/j.inffus.2023.101755
  • Mishra, A., Verma, R., Dhanda, N., & Gupta, K. K. (2024). Speech Recognition Using Machine Learning Techniques. 2024 2nd International Conference on Disruptive Technologies (ICDT), 1142-1146. https://doi.org/10.1109/ICDT61202.2024.10489508
  • Prabhavalkar, R., Hori, T., Sainath, T. N., Schlüter, R., & Watanabe, S. (2023). End-to-end speech recognition: A survey. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 325-353. https://doi.org/10.1109/TASLP.2023.3328283
  • Rahate, J., Tadepalli, S. N. V. R., Saroj, U., Kamble, A., & Ghare, P. (2023). Silent Speech Recognition using EEG Signals. 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), 1-5. https://doi.org/10.1109/PCEMS58491.2023.10136068
  • Rai, S., Li, T., & Lyu, B. (2023). Keyword spotting - Detecting commands in speech using deep learning. arXiv preprint arXiv:2312.05640.
  • Reid, K., & Williams, E. T. (2023). Right the docs: Characterising voice dataset documentation practices used in machine learning. arXiv preprint arXiv:2303.10721v1. https://arxiv.org/abs/2303.10721v1
  • Shisode, S., Mhatre, B., Sikligar, J., Vishwakarma, S., Tupe, S., Jagtap, S., & Nemade, M. (2023). SEMG approach for speech recognition. International Journal of Advanced Research in Computer Science, 14(3), 23-28. http://dx.doi.org/10.26483/ijarcs.v14i3.6970
  • Shukla, A., Aanand, A., & Nithiya, S. (2023). Automatic Speech Recognition using Machine Learning Techniques. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1-6. https://doi.org/10.1109/ICCCI56745.2023.10128212
  • Tandel, N. H., Prajapati, H. B., & Dabhi, V. K. (2020). Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey. 2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS), 459-465. IEEE.
  • Wang, S. (2023). Recognition of English speech – using a deep learning algorithm. Journal of Intelligent Systems, 32, 20220236. https://doi.org/10.1515/jisys-2022-0236
  • Wang, T., Zhou, L., Zhang, Z., Wu, Y., Liu, S., Gaur, Y., Chen, Z., Li, J., & Wei, F. (2023). VIOLA: Unified codec language models for speech recognition, synthesis, and translation. arXiv preprint arXiv:2305.16107v1. https://arxiv.org/abs/2305.16107v1
  • Weng, Z., Qin, Z., Tao, X., Pan, C., Liu, G., & Li, G. Y. (2023). Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Transactions on Wireless Communications, 22(9), 6227-6240. https://doi.org/10.1109/TWC.2023.3240969
  • Wojnar, T., Hryszko, J., & Roman, A. (2024). Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models. EURASIP Journal on Audio, Speech, and Music Processing, 2024(24). https://doi.org/10.1186/s13636-024-00343-9
  • Xie, Y. (2023). Application of speech recognition technology based on machine learning for network oral English teaching system. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-02143-4
  • Yao, Z., Guo, L., Yang, X., Kang, W., Kuang, F., Yang, Y., Jin, Z., Lin, L., & Povey, D. (2024). Zipformer: A faster and better encoder for automatic speech recognition. Published as a conference paper at ICLR 2024. https://arxiv.org/abs/2310.11230v4
  • Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A survey of audio classification using deep learning. IEEE Access, 11, 106620-106621. https://doi.org/10.1109/ACCESS.2023.3318015
  • Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G., Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., Park, D. S., Haghani, P., Riesa, J., Perng, G., Soltau, H., Strohman, T., Ramabhadran, B., Sainath, T., Moreno, P., Chiu, C.-C., Schalkwyk, J., Beaufays, F., & Wu, Y. (2023). Google USM: Scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037v3. https://arxiv.org/abs/2303.01037v3
Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgi Sistemleri Organizasyonu ve Yönetimi
Bölüm Araştırma Makalesi
Yazarlar

Mutlu Merih Aktuz 0009-0003-9793-1572

Gönderilme Tarihi 22 Mayıs 2025
Kabul Tarihi 1 Temmuz 2025
Yayımlanma Tarihi 31 Aralık 2025
DOI https://doi.org/10.53600/ajesa.1704161
IZ https://izlik.org/JA36DG22JW
Yayımlandığı Sayı Yıl 2025 Cilt: 9 Sayı: 2

Kaynak Göster

APA Aktuz, M. M. (2025). makine öğrenme teknikleri ile ses tanıma : literatür taraması. AURUM Journal of Engineering Systems and Architecture, 9(2), 251-270. https://doi.org/10.53600/ajesa.1704161

.