TY  - JOUR
T1  - Türkçe Sesli Komut Verilerinin Evrişimsel Sinir Ağı ile Sınıflandırılması
TT  - Classification of Turkish Voice Command Data with Convolutional Neural Network
AU  - Karakaş, Betül
PY  - 2025
DA  - April
Y2  - 2025
DO  - 10.46387/bjesr.1581590
JF  - Mühendislik Bilimleri ve Araştırmaları Dergisi
JO  - BJESR
PB  - Bandırma Onyedi Eylül Üniversitesi
WT  - DergiPark
SN  - 2687-4415
SP  - 51
EP  - 59
VL  - 7
IS  - 1
LA  - tr
AB  - Bu çalışma kapsamında derin öğrenme literatüründe yaygın olarak kullanılan yöntemlerden biri olan evrişimsel sinir ağları kullanılarak Türkçe Sesli Komut Veri Kümesi üzerinde sınıflandırma işlemi gerçekleştirilmiştir. Toplamda 849.57 MB dosya boyutu bulunan 14 farklı komut barındıran ve 1 saniyelik 26.485 ses dosyasından oluşan bu veri kümesinin asıl amacı küresel olarak kullanılan İngilizce bir sesli komut veri kümesi bulunurken belirli bir görev için literatürde Türkçe sesli komut veri kümesinin bulunmaması ve yaygınlaştırılmasıdır. Performans parametresi olarak “Doğruluk” kullanılmıştır. Bir evrişimsel sinir ağı mimarisi olan YAMNET ağı, toplamda 2 koşum olarak minimum 128 küme büyüklüğü için her bir koşumda 165 iterasyon için eğitilmiştir. Mimarinin test işlemi her bir komut için incelenmiş ve YamNet mimarisi yüksek performans göstermiştir. Eğitim sonunda %98,04 validasyon doğruluğu elde edilmiş ve eğitim süreci boyunca ağın hiç görmediği veriler ile ortalama %97,44 test doğruluğuna ulaşılmıştır.
KW  - Derin Öğrenme
KW  - Türkçe Sesli Komut
KW  - Evrişimsel Sinir Ağları
KW  - Yamnet
KW  - Matlab
N2  - In this study, the classification process was performed on the Turkish Voice Command Dataset using convolutional neural networks (CNN), one of the widely used methods in the deep-learning literature. The main purpose of this dataset, which consists of 26,485 1-second audio files containing 14 different commands with a total file size of 849.57 MB, is to popularize the fact that there is a globally used English voice command dataset, but there is no Turkish voice command dataset in the literature for a specific task. &quot;Accuracy&quot; was used as the performance metric. A CNN architecture, YamNet was trained for a total of 2 epochs, with 165 iterations in each run for 128 minibatch size. Testing of architecture process was examined for each command and it was revealed that YamNet classified with high performance. At the end of the training, 98.04% validation accuracy obtained and 97.44% average test accuracy were achieved with data that the network had never seen during its training phase.
CR  - O. Aydogmus, M.C. Bingol, G. Boztas, and T. Tuncer, &quot;An automated voice command classification model based on an attention-deep convolutional neural network for industrial automation system,&quot; Engineering Applications of Artificial Intelligence, vol. 126, p. 107120, 2023
CR  - H. Wagner, &quot;Austrian Dialect Classification Using Machine Learning,&quot; 2019
CR  - I. Rodomagoulakis, A. Katsamanis, G. Potamianos, P. Giannoulis, A. Tsiami, and P. Maragos, &quot;Room-localized spoken command recognition in multi-room, multi-microphone environments,&quot; Computer Speech &amp; Language,vol. 46, pp. 419-443, 2017
CR  - H. Bahuleyan, &quot;Music genre classification using machine learning techniques,&quot; arXiv preprint arXiv:1804.01149,vol. 2018
CR  - E. Acar, &quot;Learning representations for affective video understanding,&quot; in Proceedings of the 21st ACM international conference on Multimedia,pp. 1055-1058,2013
CR  - S. Aggarwal, S. Selvakanmani, B. Pant, K. Kaur, A. Verma, and G.N. Binegde, &quot;Audio segmentation techniques and applications based on deep learning,&quot; Scientific Programming,vol. 2022, no. 1, p. 7994191, 2022
CR  - Y. Cui and F. Wang, &quot;Research on audio recognition based on the deep neural network in music teaching,&quot; Computational Intelligence and Neuroscience,vol. 2022, no. 1, p. 7055624, 2022
CR  - A. Greco, N. Petkov, A. Saggese, and M. Vento, &quot;AReN: a deep learning approach for sound event recognition using a brain inspired representation,&quot; IEEE transactions on information forensics and security,vol. 15, pp. 3610-3624, 2020
CR  - K. PVSMS,&quot;A deep learning based system to predict the noise (disturbance) in audio files&quot; in Intelligent Systems and Computer Technology, IOS Press, 2020
CR  - T. Giannakopoulos, E. Spyrou, and S. J. Perantonis, &quot;Recognition of urban sound events using deep context-aware feature extractors and handcrafted features,&quot; in Artificial Intelligence Applications and Innovations: AIAI2019 IFIP WG 12.5 International Workshops: MHDW and 5G-PINE 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15,pp. 184-195,2019
CR  - M. Huzaifah, &quot;Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,&quot; arXiv preprint arXiv:1706.07156,vol. 2017
CR  - J. Salamon and J.P. Bello, &quot;Deep convolutional neural networks and data augmentation for environmental sound classification,&quot; IEEE Signal processing letters,vol. 24, no. 3, pp. 279-283, 2017
CR  - H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, and T. Sainath, &quot;Deep learning for audio signal processing,&quot; IEEE Journal of Selected Topics in Signal Processing,vol. 13, no. 2, pp. 206-219, 2019
CR  - S. Hershey et al., &quot;CNN architectures for large-scale audio classification,&quot; in 2017 ieee international conference on acoustics, speech and signal processing (icassp),pp. 131-135,2017
CR  - L. Wyse, &quot;Audio spectrogram representations for processing with convolutional neural networks,&quot; arXiv preprint arXiv:1706.09559,vol. 2017
CR  - P. Chandna, M. Miron, J. Janer, and E. Gómez, &quot;Monoaural audio source separation using deep convolutional neural networks,&quot; in Latent Variable Analysis and Signal Separation: 13th International Conference, LVA/ICA 2017, Grenoble, France, February 21-23, 2017, Proceedings 13,pp. 258-266,2017
CR  - T. Grill and J. Schlüter, &quot;Two convolutional neural networks for bird detection in audio signals,&quot; in 2017 25th European Signal Processing Conference (EUSIPCO),pp. 1764-1768,2017
CR  - J. Pons Puig, O. Slizovskaia, E. Gómez Gutiérrez, and X. Serra, &quot;Timbre analysis of music audio signals with convolutional neural networks,&quot; in EUSIPCO 2017. 25th European Signal Processing Conference; 2017 Aug 28-Sep 2; Kos, Greece. EURASIP; 2017. p. 2813, 2017
CR  - A.A. Hidayat, T.W. Cenggoro, and B. Pardamean, &quot;Convolutional neural networks for scops owl sound classification,&quot; Procedia Computer Science, vol. 179, pp. 81-87, 2021
CR  - J. Lee and H.-J. Choi, &quot;Deep learning approaches for pathological voice detection using heterogeneous parameters,&quot; IEICE TRANSACTIONS on Information and Systems, vol. 103, no. 8, pp. 1920-1923, 2020
CR  - M. A. Mohammed et al., &quot;Voice pathology detection and classification using convolutional neural network model,&quot; Applied Sciences, vol. 10, no. 11, p. 3723, 2020
CR  - Y. Kutlu and G. Karaca, &quot;Recognition of turkish command to play chess game using cnn,&quot; Akıllı Sistemler ve Uygulamaları Dergisi,vol. 5, no. 1, pp. 71-73, 2022
CR  - E. Akın and M. E. Şahin, &quot;Derin Öğrenme ve Yapay Sinir Ağı Modelleri Üzerine Bir İnceleme,&quot; EMO Bilimsel Dergi,vol. 14, no. 1, pp. 27-38,
CR  - O. Çetin, &quot;Yapay sinir ağlarının uyarlanabilir donanımsal yapılarda gerçeklenmesi,&quot; Sakarya Universitesi (Turkey),
CR  - Ö.Ü.A. Yılmaz,&quot;Derin Öğrenme&quot; İn Kodlab Yayın Dağıtım Yazılım Ltd. Şti., 2021
CR  - K. Fırıldak and M.F. Talu, &quot;Evrişimsel sinir ağlarında kullanılan transfer öğrenme yaklaşımlarının incelenmesi,&quot; Computer Science, vol. 4, no. 2, pp. 88-95, 2019
CR  - E. Alpaydın,&quot;Yapay öğrenme&quot; in Boğaziçi Üniversitesi Yayınları, 2011
CR  - E. Aminanto and K. Kim, &quot;Deep learning in intrusion detection system: An overview,&quot; in 2016 International Research Conference on Engineering and Technology (2016 IRCET), pp. 2016
CR  - L. Deng, &quot;A tutorial survey of architectures, algorithms, and applications for deep learning,&quot; APSIPA transactions on Signal and Information Processing,vol. 3, p. e2, 2014
CR  - N. Ketkar and E. Santana,&quot;Deep learning with Python&quot; in Springer, 2017
CR  - Mathworks. &quot;What Is a Convolutional Neural Network?&quot; https://www.mathworks.com/discovery/convolutional-neural-network.html (accessed 10 Oct, 2024).
CR  - Ö. İnik and E. Ülker, &quot;Derin öğrenme ve görüntü analizinde kullanılan derin öğrenme modelleri,&quot; Gaziosmanpaşa Bilimsel Araştırma Dergisi,vol. 6, no. 3, pp. 85-104, 2017
CR  - F. Bayram, &quot;Derin öğrenme tabanlı otomatik plaka tanıma,&quot; Politeknik Dergisi,vol. 23, no. 4, pp. 955-960, 2020
CR  - J. W. Kim, J. Salamon, P. Li, and J. P. Bello, &quot;Crepe: A convolutional representation for pitch estimation,&quot; in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),pp. 161-165,2018
CR  - A. L. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello, &quot;Look, listen, and learn more: Design choices for deep audio embeddings,&quot; in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),pp. 3852-3856,2019
CR  - J. F. Gemmeke et al., &quot;Audio set: An ontology and human-labeled dataset for audio events,&quot; in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP),pp. 776-780,2017
CR  - M. Kurtkaya, Online Dataset &quot;Turkish Speech Command Dataset&quot;, 2020, https://www.kaggle.com/datasets/muratkurtkaya/turkish-speech-command-dataset, [Accessed: Oct 2024]
CR  - M. M. Tayiz and M. Canayaz, &quot;Evrişimsel Sinir Ağları ile Türkçe Videolarda Geçen Küfür Seslerinin Sansürlenmesi,&quot; Muş Alparslan Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi,vol. 2, no. 2, pp. 101-110, 2021
UR  - https://doi.org/10.46387/bjesr.1581590
L1  - https://dergipark.org.tr/tr/download/article-file/4349783
ER  -