TY - JOUR T1 - Türkçe Sesli Komut Verilerinin Evrişimsel Sinir Ağı ile Sınıflandırılması TT - Classification of Turkish Voice Command Data with Convolutional Neural Network AU - Karakaş, Betül PY - 2025 DA - April Y2 - 2025 DO - 10.46387/bjesr.1581590 JF - Mühendislik Bilimleri ve Araştırmaları Dergisi JO - BJESR PB - Bandırma Onyedi Eylül Üniversitesi WT - DergiPark SN - 2687-4415 SP - 51 EP - 59 VL - 7 IS - 1 LA - tr AB - Bu çalışma kapsamında derin öğrenme literatüründe yaygın olarak kullanılan yöntemlerden biri olan evrişimsel sinir ağları kullanılarak Türkçe Sesli Komut Veri Kümesi üzerinde sınıflandırma işlemi gerçekleştirilmiştir. Toplamda 849.57 MB dosya boyutu bulunan 14 farklı komut barındıran ve 1 saniyelik 26.485 ses dosyasından oluşan bu veri kümesinin asıl amacı küresel olarak kullanılan İngilizce bir sesli komut veri kümesi bulunurken belirli bir görev için literatürde Türkçe sesli komut veri kümesinin bulunmaması ve yaygınlaştırılmasıdır. Performans parametresi olarak “Doğruluk” kullanılmıştır. Bir evrişimsel sinir ağı mimarisi olan YAMNET ağı, toplamda 2 koşum olarak minimum 128 küme büyüklüğü için her bir koşumda 165 iterasyon için eğitilmiştir. Mimarinin test işlemi her bir komut için incelenmiş ve YamNet mimarisi yüksek performans göstermiştir. Eğitim sonunda %98,04 validasyon doğruluğu elde edilmiş ve eğitim süreci boyunca ağın hiç görmediği veriler ile ortalama %97,44 test doğruluğuna ulaşılmıştır. KW - Derin Öğrenme KW - Türkçe Sesli Komut KW - Evrişimsel Sinir Ağları KW - Yamnet KW - Matlab N2 - In this study, the classification process was performed on the Turkish Voice Command Dataset using convolutional neural networks (CNN), one of the widely used methods in the deep-learning literature. The main purpose of this dataset, which consists of 26,485 1-second audio files containing 14 different commands with a total file size of 849.57 MB, is to popularize the fact that there is a globally used English voice command dataset, but there is no Turkish voice command dataset in the literature for a specific task. "Accuracy" was used as the performance metric. A CNN architecture, YamNet was trained for a total of 2 epochs, with 165 iterations in each run for 128 minibatch size. Testing of architecture process was examined for each command and it was revealed that YamNet classified with high performance. At the end of the training, 98.04% validation accuracy obtained and 97.44% average test accuracy were achieved with data that the network had never seen during its training phase. CR - O. Aydogmus, M.C. Bingol, G. Boztas, and T. Tuncer, "An automated voice command classification model based on an attention-deep convolutional neural network for industrial automation system," Engineering Applications of Artificial Intelligence, vol. 126, p. 107120, 2023 CR - H. Wagner, "Austrian Dialect Classification Using Machine Learning," 2019 CR - I. Rodomagoulakis, A. Katsamanis, G. Potamianos, P. Giannoulis, A. Tsiami, and P. Maragos, "Room-localized spoken command recognition in multi-room, multi-microphone environments," Computer Speech & Language,vol. 46, pp. 419-443, 2017 CR - H. Bahuleyan, "Music genre classification using machine learning techniques," arXiv preprint arXiv:1804.01149,vol. 2018 CR - E. Acar, "Learning representations for affective video understanding," in Proceedings of the 21st ACM international conference on Multimedia,pp. 1055-1058,2013 CR - S. Aggarwal, S. Selvakanmani, B. Pant, K. Kaur, A. Verma, and G.N. Binegde, "Audio segmentation techniques and applications based on deep learning," Scientific Programming,vol. 2022, no. 1, p. 7994191, 2022 CR - Y. Cui and F. Wang, "Research on audio recognition based on the deep neural network in music teaching," Computational Intelligence and Neuroscience,vol. 2022, no. 1, p. 7055624, 2022 CR - A. Greco, N. Petkov, A. Saggese, and M. Vento, "AReN: a deep learning approach for sound event recognition using a brain inspired representation," IEEE transactions on information forensics and security,vol. 15, pp. 3610-3624, 2020 CR - K. PVSMS,"A deep learning based system to predict the noise (disturbance) in audio files" in Intelligent Systems and Computer Technology, IOS Press, 2020 CR - T. Giannakopoulos, E. Spyrou, and S. J. Perantonis, "Recognition of urban sound events using deep context-aware feature extractors and handcrafted features," in Artificial Intelligence Applications and Innovations: AIAI2019 IFIP WG 12.5 International Workshops: MHDW and 5G-PINE 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15,pp. 184-195,2019 CR - M. Huzaifah, "Comparison of time-frequency representations for environmental sound classification using convolutional neural networks," arXiv preprint arXiv:1706.07156,vol. 2017 CR - J. Salamon and J.P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," IEEE Signal processing letters,vol. 24, no. 3, pp. 279-283, 2017 CR - H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, and T. Sainath, "Deep learning for audio signal processing," IEEE Journal of Selected Topics in Signal Processing,vol. 13, no. 2, pp. 206-219, 2019 CR - S. Hershey et al., "CNN architectures for large-scale audio classification," in 2017 ieee international conference on acoustics, speech and signal processing (icassp),pp. 131-135,2017 CR - L. Wyse, "Audio spectrogram representations for processing with convolutional neural networks," arXiv preprint arXiv:1706.09559,vol. 2017 CR - P. Chandna, M. Miron, J. Janer, and E. Gómez, "Monoaural audio source separation using deep convolutional neural networks," in Latent Variable Analysis and Signal Separation: 13th International Conference, LVA/ICA 2017, Grenoble, France, February 21-23, 2017, Proceedings 13,pp. 258-266,2017 CR - T. Grill and J. Schlüter, "Two convolutional neural networks for bird detection in audio signals," in 2017 25th European Signal Processing Conference (EUSIPCO),pp. 1764-1768,2017 CR - J. Pons Puig, O. Slizovskaia, E. Gómez Gutiérrez, and X. Serra, "Timbre analysis of music audio signals with convolutional neural networks," in EUSIPCO 2017. 25th European Signal Processing Conference; 2017 Aug 28-Sep 2; Kos, Greece. EURASIP; 2017. p. 2813, 2017 CR - A.A. Hidayat, T.W. Cenggoro, and B. Pardamean, "Convolutional neural networks for scops owl sound classification," Procedia Computer Science, vol. 179, pp. 81-87, 2021 CR - J. Lee and H.-J. Choi, "Deep learning approaches for pathological voice detection using heterogeneous parameters," IEICE TRANSACTIONS on Information and Systems, vol. 103, no. 8, pp. 1920-1923, 2020 CR - M. A. Mohammed et al., "Voice pathology detection and classification using convolutional neural network model," Applied Sciences, vol. 10, no. 11, p. 3723, 2020 CR - Y. Kutlu and G. Karaca, "Recognition of turkish command to play chess game using cnn," Akıllı Sistemler ve Uygulamaları Dergisi,vol. 5, no. 1, pp. 71-73, 2022 CR - E. Akın and M. E. Şahin, "Derin Öğrenme ve Yapay Sinir Ağı Modelleri Üzerine Bir İnceleme," EMO Bilimsel Dergi,vol. 14, no. 1, pp. 27-38, CR - O. Çetin, "Yapay sinir ağlarının uyarlanabilir donanımsal yapılarda gerçeklenmesi," Sakarya Universitesi (Turkey), CR - Ö.Ü.A. Yılmaz,"Derin Öğrenme" İn Kodlab Yayın Dağıtım Yazılım Ltd. Şti., 2021 CR - K. Fırıldak and M.F. Talu, "Evrişimsel sinir ağlarında kullanılan transfer öğrenme yaklaşımlarının incelenmesi," Computer Science, vol. 4, no. 2, pp. 88-95, 2019 CR - E. Alpaydın,"Yapay öğrenme" in Boğaziçi Üniversitesi Yayınları, 2011 CR - E. Aminanto and K. Kim, "Deep learning in intrusion detection system: An overview," in 2016 International Research Conference on Engineering and Technology (2016 IRCET), pp. 2016 CR - L. Deng, "A tutorial survey of architectures, algorithms, and applications for deep learning," APSIPA transactions on Signal and Information Processing,vol. 3, p. e2, 2014 CR - N. Ketkar and E. Santana,"Deep learning with Python" in Springer, 2017 CR - Mathworks. "What Is a Convolutional Neural Network?" https://www.mathworks.com/discovery/convolutional-neural-network.html (accessed 10 Oct, 2024). CR - Ö. İnik and E. Ülker, "Derin öğrenme ve görüntü analizinde kullanılan derin öğrenme modelleri," Gaziosmanpaşa Bilimsel Araştırma Dergisi,vol. 6, no. 3, pp. 85-104, 2017 CR - F. Bayram, "Derin öğrenme tabanlı otomatik plaka tanıma," Politeknik Dergisi,vol. 23, no. 4, pp. 955-960, 2020 CR - J. W. Kim, J. Salamon, P. Li, and J. P. Bello, "Crepe: A convolutional representation for pitch estimation," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),pp. 161-165,2018 CR - A. L. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello, "Look, listen, and learn more: Design choices for deep audio embeddings," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),pp. 3852-3856,2019 CR - J. F. Gemmeke et al., "Audio set: An ontology and human-labeled dataset for audio events," in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP),pp. 776-780,2017 CR - M. Kurtkaya, Online Dataset "Turkish Speech Command Dataset", 2020, https://www.kaggle.com/datasets/muratkurtkaya/turkish-speech-command-dataset, [Accessed: Oct 2024] CR - M. M. Tayiz and M. Canayaz, "Evrişimsel Sinir Ağları ile Türkçe Videolarda Geçen Küfür Seslerinin Sansürlenmesi," Muş Alparslan Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi,vol. 2, no. 2, pp. 101-110, 2021 UR - https://doi.org/10.46387/bjesr.1581590 L1 - https://dergipark.org.tr/tr/download/article-file/4349783 ER -