makine öğrenme teknikleri ile ses tanıma : literatür taraması

Mutlu Merih Aktuz

doi:10.53600/ajesa.1704161

Araştırma Makalesi

VOICE RECOGNITION WITH MACHINE LEARNING TECHNIQUES: LITERATURE REVIEW

Yıl 2025, Cilt: 9 Sayı: 2 , 251 - 270 , 31.12.2025

Mutlu Merih Aktuz

https://izlik.org/JA36DG22JW

Öz

This study aims to compare different machine learning methods and determine the most effective method by reviewing the literature on machine learning methods used in the field of voice recognition. Within the scope of the study, 30 studies were analyzed in detail by reviewing the current literature published between January 2023 and March 2025. The study concluded that deep learning-based approaches, especially Transformer architectures and end-to-end learning models, exhibit higher accuracy and robustness compared to traditional methods. However, instead of a single “best” approach for an ideal voice recognition system, a combination of different approaches may be more appropriate depending on the application scenario, available resources and target user group. Despite the significant progress made in the development of voice recognition systems, several problems and limitations still exist, such as the need for large amounts of labeled data and performance degradation in noisy environments. For future research, semi-supervised and self-supervised learning approaches for low-resource languages, hybrid model architectures, multi-task and multi-modal learning approaches, neuromorphological computational approaches, and standardized evaluation metrics are proposed

Anahtar Kelimeler

Voice recognition , machine learning , deep learning , automatic speech recognition , feature extraction

Kaynakça

Accou, B., Vanthornhout, J., Van hamme, H., & Francart, T. (2023). Decoding of the speech envelope from EEG using the VLAAI deep neural network. Scientific Reports, 13(812), 1-14. https://doi.org/10.1038/s41598-022-27332-2
Ahmed, I., Irfan, M. A., Iqbal, A., Khalil, A., & Siddiqui, S. I. (2024). Efficient feature extraction and classification for the development of Pashto speech recognition system. Multimedia Tools and Applications, 83, 54081-54096. https://doi.org/10.1007/s11042-023-17684-w
Alzaabi, S., Alzahmi, A., Almheiri, M., Al-Ali, N., Ali, N., Poon, K., & Alteneiji, A. (2024). Bilingual Speech Recognition On the Edge Using Machine Learning. 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), 1-6. https://doi.org/10.1109/ICSPIS63676.2024.10812600
Ameen, Z. J. M., & Kadhim, A. A. (2023). Machine learning for Arabic phonemes recognition using electrolarynx speech. International Journal of Electrical and Computer Engineering (IJECE), 13(1), 400-412. https://doi.org/10.11591/ijece.v13i1.pp400-412
Ayall, T. A., Zhou, C., Liu, H., Brhanemeskel, G. M., Abate, S. T., & Adjeisah, M. (2024). Amharic spoken digits recognition using convolutional neural network. Journal of Big Data, 11(64). https://doi.org/10.1186/s40537-024-00910-z Barhoush, M., Hallawa, A., & Schmeink, A. (2023). Speaker identification and localization using shuffled MFCC features and deep learning. International Journal of Speech Technology, 26, 185-196. https://doi.org/10.1007/s10772-023-10023-2
Chen, J. (2023). Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-01995-0
Chen, Z., Huang, H., Andrusenko, A., Hrinchuk, O., Puvvada, K. C., Li, J., Ghosh, S., Balam, J., & Ginsburg, B. (2023). SALM: Speech-augmented language model with in-context learning for speech recognition and translation. arXiv preprint arXiv:2310.09424v1. https://arxiv.org/abs/2310.09424v1
Chu, H., Jia, W., Liu, Y., Zan, Y., Bai, Y., Xiao, S., Zheng, Y., Xie, Z., Xie, S., Zhang, S., Zhou, M., & Huang, S. (2023). Qwen2-Audio: Advancing universal audio understanding via unified large language models. arXiv preprint arXiv:2311.07919.
Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., Zhou, C., Zhou, J., & Qwen Team. (2024). Qwen2-Audio technical report. arXiv preprint arXiv:2407.10759v1. https://arxiv.org/abs/2407.10759v1
Dhanjal, A. S., & Singh, W. (2023). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83, 23367-23412. https://doi.org/10.1007/s11042-023-16438-y
Froiz-Míguez, I., Blanco-Novoa, Ó., Fraga-Lamas, P., Fustes, D., Dafonte, C., Pereira, J., & Fernández-Caramés, T. M. (2023). Design and evaluation of a cross-lingual ML-based automatic speech recognition system fine-tuned for the Galician language. Kalpa Publications in Computing, 14, 152-155.
Ganchev, R. (2021). Voice Signal Processing for Machine Learning. The Case of Speaker Isolation: Overview and Evaluation of Decomposition Methods Applied to the Input Signal of Voice Processing ML Models - The Use Case of the Speaker Isolation Problem. Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics.
Gençyılmaz, İ. Z., & Karaoğlan, K. M. (2024). Optimizing Speech to Text Conversion in Turkish: An Analysis of Machine Learning Approaches. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 13(2), 492-504. https://doi.org/10.17798/bitlisfen.1434925
Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint audio and speech understanding. arXiv preprint arXiv:2309.14405v3. https://arxiv.org/abs/2309.14405v3
Gourisaria, M. K., Agrawal, R., Sahni, M., & Singh, P. K. (2024). Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discover Internet of Things, 4(1), 1-24. https://doi.org/10.1007/s43926-023-00049-y
Guo, J. (2023). Innovative Application of Sensor Combined with Speech Recognition Technology in College English Education in the Context of Artificial Intelligence. Journal of Sensors, 2023, Article ID 9281914, 1-11. https://doi.org/10.1155/2023/9281914
Hamian, M., Faez, K., Nazari, S., & Sabeti, M. (2023). A novel learning approach in deep spiking neural networks with multi-objective optimization algorithms for automatic digit speech recognition. The Journal of Supercomputing, 79, 20263–20288. https://doi.org/10.1007/s11227-023-05420-y
Jia, Y. (2023). A Deep Learning System for Domain-Specific Speech Recognition. arXiv preprint arXiv:2303.10510v2. https://arxiv.org/abs/2303.10510v2
Keser, S. (2023). Makine Öğrenimi ve Hibrit Altuzay Sınıflandırıcılar için Yalıtık Kelime Tanıma Performanslarının Karşılaştırılması. Sürdürülebilir Mühendislik Uygulamaları ve Teknolojik Gelişmeler Dergisi, 6(2), 235-249. https://doi.org/10.51764/smutgd.1338977
Kheddar, H., Himeur, Y., Al-Maadeed, S., Amira, A., & Bensaali, F. (2023). Deep transfer learning for automatic speech recognition: Towards better generalization. arXiv preprint arXiv:2304.14535v2. https://arxiv.org/abs/2304.14535v2
Kumar, L. A., Renuka, D. K., & Priya, M. C. S. (2023). Towards robust speech recognition model using deep learning. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS) (pp. 253-256). IEEE. https://doi.org/10.1109/ICISCOIS56541.2023.10100390
Kwon, H. (2024). AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimedia Tools and Applications, 83, 57943-57962. https://doi.org/10.1007/s11042-023-15961-2
Liu, A. H., Hsu, W. N., Auli, M., & Baevski, A. (2023). Towards end-to-end unsupervised speech recognition. In 2022 IEEE Spoken Language Technology Workshop (SLT) (pp. 221-228). IEEE.
Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411-9457. https://doi.org/10.1007/s11042-020-10073-7
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Multimedia Tools and Applications, 1-72. https://doi.org/10.1016/j.inffus.2023.101755
Mishra, A., Verma, R., Dhanda, N., & Gupta, K. K. (2024). Speech Recognition Using Machine Learning Techniques. 2024 2nd International Conference on Disruptive Technologies (ICDT), 1142-1146. https://doi.org/10.1109/ICDT61202.2024.10489508
Prabhavalkar, R., Hori, T., Sainath, T. N., Schlüter, R., & Watanabe, S. (2023). End-to-end speech recognition: A survey. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 325-353. https://doi.org/10.1109/TASLP.2023.3328283
Rahate, J., Tadepalli, S. N. V. R., Saroj, U., Kamble, A., & Ghare, P. (2023). Silent Speech Recognition using EEG Signals. 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), 1-5. https://doi.org/10.1109/PCEMS58491.2023.10136068
Rai, S., Li, T., & Lyu, B. (2023). Keyword spotting - Detecting commands in speech using deep learning. arXiv preprint arXiv:2312.05640.
Reid, K., & Williams, E. T. (2023). Right the docs: Characterising voice dataset documentation practices used in machine learning. arXiv preprint arXiv:2303.10721v1. https://arxiv.org/abs/2303.10721v1
Shisode, S., Mhatre, B., Sikligar, J., Vishwakarma, S., Tupe, S., Jagtap, S., & Nemade, M. (2023). SEMG approach for speech recognition. International Journal of Advanced Research in Computer Science, 14(3), 23-28. http://dx.doi.org/10.26483/ijarcs.v14i3.6970
Shukla, A., Aanand, A., & Nithiya, S. (2023). Automatic Speech Recognition using Machine Learning Techniques. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1-6. https://doi.org/10.1109/ICCCI56745.2023.10128212
Tandel, N. H., Prajapati, H. B., & Dabhi, V. K. (2020). Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey. 2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS), 459-465. IEEE.
Wang, S. (2023). Recognition of English speech – using a deep learning algorithm. Journal of Intelligent Systems, 32, 20220236. https://doi.org/10.1515/jisys-2022-0236
Wang, T., Zhou, L., Zhang, Z., Wu, Y., Liu, S., Gaur, Y., Chen, Z., Li, J., & Wei, F. (2023). VIOLA: Unified codec language models for speech recognition, synthesis, and translation. arXiv preprint arXiv:2305.16107v1. https://arxiv.org/abs/2305.16107v1
Weng, Z., Qin, Z., Tao, X., Pan, C., Liu, G., & Li, G. Y. (2023). Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Transactions on Wireless Communications, 22(9), 6227-6240. https://doi.org/10.1109/TWC.2023.3240969
Wojnar, T., Hryszko, J., & Roman, A. (2024). Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models. EURASIP Journal on Audio, Speech, and Music Processing, 2024(24). https://doi.org/10.1186/s13636-024-00343-9
Xie, Y. (2023). Application of speech recognition technology based on machine learning for network oral English teaching system. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-02143-4
Yao, Z., Guo, L., Yang, X., Kang, W., Kuang, F., Yang, Y., Jin, Z., Lin, L., & Povey, D. (2024). Zipformer: A faster and better encoder for automatic speech recognition. Published as a conference paper at ICLR 2024. https://arxiv.org/abs/2310.11230v4
Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A survey of audio classification using deep learning. IEEE Access, 11, 106620-106621. https://doi.org/10.1109/ACCESS.2023.3318015
Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G., Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., Park, D. S., Haghani, P., Riesa, J., Perng, G., Soltau, H., Strohman, T., Ramabhadran, B., Sainath, T., Moreno, P., Chiu, C.-C., Schalkwyk, J., Beaufays, F., & Wu, Y. (2023). Google USM: Scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037v3. https://arxiv.org/abs/2303.01037v3

makine öğrenme teknikleri ile ses tanıma : literatür taraması

Yıl 2025, Cilt: 9 Sayı: 2 , 251 - 270 , 31.12.2025

Mutlu Merih Aktuz

https://izlik.org/JA36DG22JW

Öz

Bu çalışmada, ses tanıma alanında kullanılan makine öğrenme yöntemleriyle ilgili literatür taraması yapılarak, farklı makine öğrenme yöntemlerini karşılaştırmak ve en etkili yöntemi belirlemek amaçlanmıştır. Çalışma kapsamında, Ocak 2023-Mart 2025 tarihleri arasında yayınlanan güncel literatür taranarak 30 çalışma detaylı olarak incelenmiştir. Çalışmada, derin öğrenme tabanlı yaklaşımların, özellikle Transformer mimarileri ve uçtan uca öğrenme modellerinin, geleneksel yöntemlere kıyasla daha yüksek doğruluk ve sağlamlık sergilediği sonucuna varılmıştır. Bununla birlikte, ideal bir ses tanıma sistemi için tek bir "en iyi" yaklaşım yerine, uygulama senaryosuna, mevcut kaynaklara ve hedef kullanıcı grubuna bağlı olarak farklı yaklaşımların kombinasyonunun daha uygun olabileceği değerlendirilmiştir. Ses tanıma sistemlerinin gelişiminde kaydedilen önemli ilerlemelere rağmen, büyük miktarda etiketli veri ihtiyacı, gürültülü ortamlardaki performans düşüşleri gibi çeşitli problemler ve sınırlamalar hala mevcuttur. Gelecekteki araştırmalar için düşük kaynaklı diller için yarı-denetimli ve öz-denetimli öğrenme yaklaşımları, hibrit model mimarileri, çok görevli ve çok modlu öğrenme yaklaşımları, nöromorfolojik hesaplama yaklaşımları ve standartlaştırılmış değerlendirme metrikleri üzerine çalışmalar önerilmiştir.

Anahtar Kelimeler

Ses tanıma , makine öğrenme , derin öğrenme , otomatik konuşma tanıma , özellik çıkarma

Kaynakça

Accou, B., Vanthornhout, J., Van hamme, H., & Francart, T. (2023). Decoding of the speech envelope from EEG using the VLAAI deep neural network. Scientific Reports, 13(812), 1-14. https://doi.org/10.1038/s41598-022-27332-2
Ahmed, I., Irfan, M. A., Iqbal, A., Khalil, A., & Siddiqui, S. I. (2024). Efficient feature extraction and classification for the development of Pashto speech recognition system. Multimedia Tools and Applications, 83, 54081-54096. https://doi.org/10.1007/s11042-023-17684-w
Alzaabi, S., Alzahmi, A., Almheiri, M., Al-Ali, N., Ali, N., Poon, K., & Alteneiji, A. (2024). Bilingual Speech Recognition On the Edge Using Machine Learning. 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), 1-6. https://doi.org/10.1109/ICSPIS63676.2024.10812600
Ameen, Z. J. M., & Kadhim, A. A. (2023). Machine learning for Arabic phonemes recognition using electrolarynx speech. International Journal of Electrical and Computer Engineering (IJECE), 13(1), 400-412. https://doi.org/10.11591/ijece.v13i1.pp400-412
Ayall, T. A., Zhou, C., Liu, H., Brhanemeskel, G. M., Abate, S. T., & Adjeisah, M. (2024). Amharic spoken digits recognition using convolutional neural network. Journal of Big Data, 11(64). https://doi.org/10.1186/s40537-024-00910-z Barhoush, M., Hallawa, A., & Schmeink, A. (2023). Speaker identification and localization using shuffled MFCC features and deep learning. International Journal of Speech Technology, 26, 185-196. https://doi.org/10.1007/s10772-023-10023-2
Chen, J. (2023). Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-01995-0
Chen, Z., Huang, H., Andrusenko, A., Hrinchuk, O., Puvvada, K. C., Li, J., Ghosh, S., Balam, J., & Ginsburg, B. (2023). SALM: Speech-augmented language model with in-context learning for speech recognition and translation. arXiv preprint arXiv:2310.09424v1. https://arxiv.org/abs/2310.09424v1
Chu, H., Jia, W., Liu, Y., Zan, Y., Bai, Y., Xiao, S., Zheng, Y., Xie, Z., Xie, S., Zhang, S., Zhou, M., & Huang, S. (2023). Qwen2-Audio: Advancing universal audio understanding via unified large language models. arXiv preprint arXiv:2311.07919.
Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., Zhou, C., Zhou, J., & Qwen Team. (2024). Qwen2-Audio technical report. arXiv preprint arXiv:2407.10759v1. https://arxiv.org/abs/2407.10759v1
Dhanjal, A. S., & Singh, W. (2023). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83, 23367-23412. https://doi.org/10.1007/s11042-023-16438-y
Froiz-Míguez, I., Blanco-Novoa, Ó., Fraga-Lamas, P., Fustes, D., Dafonte, C., Pereira, J., & Fernández-Caramés, T. M. (2023). Design and evaluation of a cross-lingual ML-based automatic speech recognition system fine-tuned for the Galician language. Kalpa Publications in Computing, 14, 152-155.
Ganchev, R. (2021). Voice Signal Processing for Machine Learning. The Case of Speaker Isolation: Overview and Evaluation of Decomposition Methods Applied to the Input Signal of Voice Processing ML Models - The Use Case of the Speaker Isolation Problem. Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics.
Gençyılmaz, İ. Z., & Karaoğlan, K. M. (2024). Optimizing Speech to Text Conversion in Turkish: An Analysis of Machine Learning Approaches. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 13(2), 492-504. https://doi.org/10.17798/bitlisfen.1434925
Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint audio and speech understanding. arXiv preprint arXiv:2309.14405v3. https://arxiv.org/abs/2309.14405v3
Gourisaria, M. K., Agrawal, R., Sahni, M., & Singh, P. K. (2024). Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discover Internet of Things, 4(1), 1-24. https://doi.org/10.1007/s43926-023-00049-y
Guo, J. (2023). Innovative Application of Sensor Combined with Speech Recognition Technology in College English Education in the Context of Artificial Intelligence. Journal of Sensors, 2023, Article ID 9281914, 1-11. https://doi.org/10.1155/2023/9281914
Hamian, M., Faez, K., Nazari, S., & Sabeti, M. (2023). A novel learning approach in deep spiking neural networks with multi-objective optimization algorithms for automatic digit speech recognition. The Journal of Supercomputing, 79, 20263–20288. https://doi.org/10.1007/s11227-023-05420-y
Jia, Y. (2023). A Deep Learning System for Domain-Specific Speech Recognition. arXiv preprint arXiv:2303.10510v2. https://arxiv.org/abs/2303.10510v2
Keser, S. (2023). Makine Öğrenimi ve Hibrit Altuzay Sınıflandırıcılar için Yalıtık Kelime Tanıma Performanslarının Karşılaştırılması. Sürdürülebilir Mühendislik Uygulamaları ve Teknolojik Gelişmeler Dergisi, 6(2), 235-249. https://doi.org/10.51764/smutgd.1338977
Kheddar, H., Himeur, Y., Al-Maadeed, S., Amira, A., & Bensaali, F. (2023). Deep transfer learning for automatic speech recognition: Towards better generalization. arXiv preprint arXiv:2304.14535v2. https://arxiv.org/abs/2304.14535v2
Kumar, L. A., Renuka, D. K., & Priya, M. C. S. (2023). Towards robust speech recognition model using deep learning. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS) (pp. 253-256). IEEE. https://doi.org/10.1109/ICISCOIS56541.2023.10100390
Kwon, H. (2024). AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimedia Tools and Applications, 83, 57943-57962. https://doi.org/10.1007/s11042-023-15961-2
Liu, A. H., Hsu, W. N., Auli, M., & Baevski, A. (2023). Towards end-to-end unsupervised speech recognition. In 2022 IEEE Spoken Language Technology Workshop (SLT) (pp. 221-228). IEEE.
Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411-9457. https://doi.org/10.1007/s11042-020-10073-7
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Multimedia Tools and Applications, 1-72. https://doi.org/10.1016/j.inffus.2023.101755
Mishra, A., Verma, R., Dhanda, N., & Gupta, K. K. (2024). Speech Recognition Using Machine Learning Techniques. 2024 2nd International Conference on Disruptive Technologies (ICDT), 1142-1146. https://doi.org/10.1109/ICDT61202.2024.10489508
Prabhavalkar, R., Hori, T., Sainath, T. N., Schlüter, R., & Watanabe, S. (2023). End-to-end speech recognition: A survey. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 325-353. https://doi.org/10.1109/TASLP.2023.3328283
Rahate, J., Tadepalli, S. N. V. R., Saroj, U., Kamble, A., & Ghare, P. (2023). Silent Speech Recognition using EEG Signals. 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), 1-5. https://doi.org/10.1109/PCEMS58491.2023.10136068
Rai, S., Li, T., & Lyu, B. (2023). Keyword spotting - Detecting commands in speech using deep learning. arXiv preprint arXiv:2312.05640.
Reid, K., & Williams, E. T. (2023). Right the docs: Characterising voice dataset documentation practices used in machine learning. arXiv preprint arXiv:2303.10721v1. https://arxiv.org/abs/2303.10721v1
Shisode, S., Mhatre, B., Sikligar, J., Vishwakarma, S., Tupe, S., Jagtap, S., & Nemade, M. (2023). SEMG approach for speech recognition. International Journal of Advanced Research in Computer Science, 14(3), 23-28. http://dx.doi.org/10.26483/ijarcs.v14i3.6970
Shukla, A., Aanand, A., & Nithiya, S. (2023). Automatic Speech Recognition using Machine Learning Techniques. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1-6. https://doi.org/10.1109/ICCCI56745.2023.10128212
Tandel, N. H., Prajapati, H. B., & Dabhi, V. K. (2020). Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey. 2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS), 459-465. IEEE.
Wang, S. (2023). Recognition of English speech – using a deep learning algorithm. Journal of Intelligent Systems, 32, 20220236. https://doi.org/10.1515/jisys-2022-0236
Wang, T., Zhou, L., Zhang, Z., Wu, Y., Liu, S., Gaur, Y., Chen, Z., Li, J., & Wei, F. (2023). VIOLA: Unified codec language models for speech recognition, synthesis, and translation. arXiv preprint arXiv:2305.16107v1. https://arxiv.org/abs/2305.16107v1
Weng, Z., Qin, Z., Tao, X., Pan, C., Liu, G., & Li, G. Y. (2023). Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Transactions on Wireless Communications, 22(9), 6227-6240. https://doi.org/10.1109/TWC.2023.3240969
Wojnar, T., Hryszko, J., & Roman, A. (2024). Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models. EURASIP Journal on Audio, Speech, and Music Processing, 2024(24). https://doi.org/10.1186/s13636-024-00343-9
Xie, Y. (2023). Application of speech recognition technology based on machine learning for network oral English teaching system. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-02143-4
Yao, Z., Guo, L., Yang, X., Kang, W., Kuang, F., Yang, Y., Jin, Z., Lin, L., & Povey, D. (2024). Zipformer: A faster and better encoder for automatic speech recognition. Published as a conference paper at ICLR 2024. https://arxiv.org/abs/2310.11230v4
Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A survey of audio classification using deep learning. IEEE Access, 11, 106620-106621. https://doi.org/10.1109/ACCESS.2023.3318015
Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G., Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., Park, D. S., Haghani, P., Riesa, J., Perng, G., Soltau, H., Strohman, T., Ramabhadran, B., Sainath, T., Moreno, P., Chiu, C.-C., Schalkwyk, J., Beaufays, F., & Wu, Y. (2023). Google USM: Scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037v3. https://arxiv.org/abs/2303.01037v3

Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Bilgi Sistemleri Organizasyonu ve Yönetimi
Bölüm	Araştırma Makalesi
Yazarlar	Mutlu Merih Aktuz 0009-0003-9793-1572
Gönderilme Tarihi	22 Mayıs 2025
Kabul Tarihi	1 Temmuz 2025
Yayımlanma Tarihi	31 Aralık 2025
DOI	https://doi.org/10.53600/ajesa.1704161
IZ	https://izlik.org/JA36DG22JW
Yayımlandığı Sayı	Yıl 2025 Cilt: 9 Sayı: 2

Kaynak Göster

APA	Aktuz, M. M. (2025). makine öğrenme teknikleri ile ses tanıma : literatür taraması. AURUM Journal of Engineering Systems and Architecture, 9(2), 251-270. https://doi.org/10.53600/ajesa.1704161

Makale Dosyaları

Tam Metin