Araştırma Makalesi
BibTex RIS Kaynak Göster

TinyML based audio visual keyword detection

Yıl 2024, Cilt: 13 Sayı: 4, 1207 - 1215, 15.10.2024
https://doi.org/10.28948/ngumuh.1482481

Öz

Keyword detection (KWD) is one of the areas where machine learning is used. Its purpose is the automatic detection of specific words or objects from audio or image data. As portable artificial intelligence applications become more prevalent, the number of applications in this field is also growing. In particular, hybrid systems (the use of audio and video together) are being studied to increase the effectiveness of KWD applications. The system aims to combine audio and visual commands detected through two different channels. Extensive work has been done on audiovisual keyword detection in a computer environment, yielding good results. On the other hand, efforts are being made within the scope of TinyML (Low-Power Machine Learning) to implement deep learning applications on low-capacity processors. In these applications, reducing the parameters of the deep learning model (quantization, pruning) makes it possible to implement the model on ordinary microcontrollers. In this study, a keyword detection application in the field of TinyML is proposed using audio and visual data. In the training of the proposed hybrid model, the audio and visual models were first trained separately in the Edge Impulse software environment. Developed MobileNetV2 and CNN-based models were loaded onto ESP32-CAM and Arduino Nano BLE development kits and tested. Subsequently, the models were combined using a linear weighted fusion method and tested. In the experimental results, according to the accuracy criterion, the success rate of the audio-based KWD was 85%, the success rate of the image-based KWD was 85%, while the classification success in the audiovisual hybrid application was around 90%.

Kaynakça

  • J. Tian, The human resources development applications of machine learning in the view of artificial ıntelligence. IEEE 3rd International Conference, 39-43, 2020. https://doi.org/10.1109/CCET50901.2020.9213113.
  • M. Rusci and T. Tuytelaars, On-device customization of tiny deep learning models for keyword spotting with few examples. IEEE Micro, 43(6), 50-57, 2023. https://doi.org/10.1109/MM.2023.3311826.
  • Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki and A. S. Hafid, A comprehensive survey on TinyML. IEEE Access, 11, 96892-96922, 2023. https://doi.org/10.1109/ACCESS.2023.3294111.
  • P. Warden and D. Situnayake, TinyML machine learning with TensorFlow lite on arduino and ultra-low-power microcontrollers. O’Reilly Media, 2019.
  • M. Altayeb, M. Zennaro and E. Pietrosemoli, TinyML gamma radiation classifier. Nuclear Engineering and Technology, 55(2), 443-451, 2023. https://doi.org/10.1016/j.net.2022.09.032.
  • M. Lord, TinyML, anomaly detection. Masters Thesis, California State University, Computer Science, Northridge, USA, 2021.
  • M. Monfort Grau, TinyML from basic to advanced applications. Bachelor Thesis, Universitat Politècnica de Catalunya, Facultat d'Informàtica de Barcelona, Spain, 2021.
  • S. Sadhu and P. K. Ghosh, Low resource point process models for keyword spotting using unsupervised online learning. 25th European Signal Processing Conference, 538-542, 2017. https://doi.org/10.23919/ eusipco.2017.8081265.
  • Z. Tang, L. Chen, B. Wu, D. Yu and D. Manocha, Improving reverberant speech training using diffuse acoustic simulation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6969-6973, 2020. https://doi.org/10.48550/arXiv.1907.03988.
  • J. M. Phillips and J. M. Conrad, Robotic system control using embedded machine learning and speech recognition. 19th International Conference on Smart Communities, Improving Quality of Life Using ICT, IoT and AI (HONET), 214-218, 2022. https://doi.org/10.1109/HONET56683.2022.10019106.
  • H. Han and J. Siebert, TinyML: A systematic review and synthesis of existing research. International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 269-274, 2022. https://doi.org/10.1109/ICAIIC54071.2022.9722636.
  • N. S. Huynh, S. De La Cruz and A. Perez-Pons, Denial-of Service (DoS) Attack Detection Using Edge Machine Learning. International Conference on Machine Learning and Applications (ICMLA), 1741-1745, 2023. https://doi.org/10.1109/ICMLA58977.2023.00264.
  • H. Andrew, Z. Menglong, C. Bo, K. Dmitry, W. Weijun, W. Tobias, A. Marco and A. Hartwig, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Computer Vision and Pattern Recognition, 2017. https://doi.org/10.48550/ arXiv.1704.04861.
  • Kaggle, Hand Gesture Recognition Dataset. https://www.kaggle.com/datasets/aryarishabh/hand-gesture-recognition-dataset, Accessed 14 January 2024.
  • W. Pete, Speech commands: A dataset for limited-vocabulary speech recognition. Computation and Language, 2018. https://doi.org/10.48550/ arXiv.1804.03209.
  • Papers with code, Speech Commands, https://paperswithcode.com/dataset/speech-commands, Accessed 2 February 2024.
  • V. Roman and M. Nikolay, Learning efficient representations for keyword spotting with triplet loss. 23rd International Conference SPECOM, 2021. https://doi.org/10.1007/978-3-030-87802-3_69.
  • B. Kim, S. Chang, J. Lee and D. Sung, Broadcasted Residual Learning for Efficient Keyword Spotting. Proceedings of INTERSPEECH, 2021. https://doi.org/10.48550/arXiv.2106.04140.
  • D. Seo, H.-S. Oh and Y. Jung, Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting. IEEE Access, 9, 80682-80691, 2021. https://doi.org/10.1109/ACCESS.2021.3078715.
  • R. Tang, J. Lee, A. Razi, J. Cambre, I. Bicking, J. Kaye and J. Lin, Howl: A Deployed, Open-Source Wake Word Detection System. Computation and Language, 2020, https://doi.org/10.48550/arXiv.2008.09606.
  • A. Berg, M. O’Connor and M. Tairum Cruz, Keyword Transformer: A Self-Attention Model for Keyword Spotting. Interspeech, 4249-4253, 2021, https://doi.org/10.21437/Interspeech.2021-1286.
  • C. Reddy, E. Beyrami, J. Pool, R. Cutler, S. Srinivasan and J. Gehrke, A scalable noisy speech dataset and online subjective test framework. InterSpeech, 2019. https://doi.org/10.48550/arXiv.1909.08050.
  • A. Mahmood and U. Köse, Speech recognition based on convolutional neural networks and MFCC algorithm. Advances in Artificial Intelligence Reseach, 1(1), 6–12, 2021.
  • Y. Xu, J. Sun, Y. Han, S. Zhao, C. Mei, T. Guo, S. Zhou, C. Xie, W. Zou, X. Li, S. Zhou, C. Xie, W. Zou and X. Li, Audio-Visual Wake Word Spotting System For MISP Challenge 2021. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9246-9250, 2021, https://doi.org/10.48550/ arXiv.2204.08686.

TinyML tabanlı görsel işitsel anahtar kelime tespiti

Yıl 2024, Cilt: 13 Sayı: 4, 1207 - 1215, 15.10.2024
https://doi.org/10.28948/ngumuh.1482481

Öz

Anahtar kelime tespiti (AKT), makine öğreniminin kullanıldığı alanlardan birisidir. Amacı, ses veya görüntü verisinden belirli kelime veya objenin otomatik tespit edilmesidir. Taşınabilir yapay zekâ uygulamalarının artmasıyla beraber, bu alanda da uygulamalar artmaktadır. Özellikle AKT uygulamalarının etkinliğini artırmak için hibrit sistemler (ses ve görüntünün birlikte kullanımı) üzerinde çalışma yapılmaktadır. Bu sistem ile birlikle iki farklı kanaldan algılanan ses ve görüntü komutlarının birleştirilmesi amaçlanmaktadır. Bilgisayar (PC) ortamında görsel işitsel AKT üzerinde birçok çalışma yapılmış ve iyi sonuçlar elde edilmiştir. Diğer taraftan derin öğrenme uygulamalarını düşük kapasiteli işlemciler üzerinde gerçekleştirmek için TinyML (Düşük Kapasiteli Makine Öğrenmesi) kapsamında çalışmalar yapılmaktadır. Bu uygulamalarda, derin öğrenmeye yönelik geliştirilen modelin parametrelerini azaltarak (nicelleştirme, kırpma) sıradan mikrodenetleyici üzerinde uygulama imkânı oluşturmaktadır. Bu çalışmada ses ve görüntü verisi kullanılarak, TinyML alanında AKT uygulaması önerilmiştir. Önerilen hibrit modelin eğitiminde öncelikle ses ve görüntü modelleri Edge Impulse yazılım ortamında ayrı ayrı eğitilmiştir. Geliştirilen MobileNetV2 ve CNN tabanlı modeller ESP32-CAM ve Arduino Nano BLE geliştirme kitlerine yüklenerek, denenmiştir. Daha sonra modeller doğrusal ağırlıklı birleştirme metodu ile birleştirilerek denenmiştir. Sistemin başarısı standart ölçütlere göre test edilmiştir. Deneysel sonuçlarda doğruluk ölçütüne göre, sadece ses tabanlı AKT başarısı %85, sadece görüntü tabanlı AKT başarısı %85 olurken, görsel işitsel hibrit uygulamasında sınıflandırma başarısı %90 civarında olmuştur.

Kaynakça

  • J. Tian, The human resources development applications of machine learning in the view of artificial ıntelligence. IEEE 3rd International Conference, 39-43, 2020. https://doi.org/10.1109/CCET50901.2020.9213113.
  • M. Rusci and T. Tuytelaars, On-device customization of tiny deep learning models for keyword spotting with few examples. IEEE Micro, 43(6), 50-57, 2023. https://doi.org/10.1109/MM.2023.3311826.
  • Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki and A. S. Hafid, A comprehensive survey on TinyML. IEEE Access, 11, 96892-96922, 2023. https://doi.org/10.1109/ACCESS.2023.3294111.
  • P. Warden and D. Situnayake, TinyML machine learning with TensorFlow lite on arduino and ultra-low-power microcontrollers. O’Reilly Media, 2019.
  • M. Altayeb, M. Zennaro and E. Pietrosemoli, TinyML gamma radiation classifier. Nuclear Engineering and Technology, 55(2), 443-451, 2023. https://doi.org/10.1016/j.net.2022.09.032.
  • M. Lord, TinyML, anomaly detection. Masters Thesis, California State University, Computer Science, Northridge, USA, 2021.
  • M. Monfort Grau, TinyML from basic to advanced applications. Bachelor Thesis, Universitat Politècnica de Catalunya, Facultat d'Informàtica de Barcelona, Spain, 2021.
  • S. Sadhu and P. K. Ghosh, Low resource point process models for keyword spotting using unsupervised online learning. 25th European Signal Processing Conference, 538-542, 2017. https://doi.org/10.23919/ eusipco.2017.8081265.
  • Z. Tang, L. Chen, B. Wu, D. Yu and D. Manocha, Improving reverberant speech training using diffuse acoustic simulation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6969-6973, 2020. https://doi.org/10.48550/arXiv.1907.03988.
  • J. M. Phillips and J. M. Conrad, Robotic system control using embedded machine learning and speech recognition. 19th International Conference on Smart Communities, Improving Quality of Life Using ICT, IoT and AI (HONET), 214-218, 2022. https://doi.org/10.1109/HONET56683.2022.10019106.
  • H. Han and J. Siebert, TinyML: A systematic review and synthesis of existing research. International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 269-274, 2022. https://doi.org/10.1109/ICAIIC54071.2022.9722636.
  • N. S. Huynh, S. De La Cruz and A. Perez-Pons, Denial-of Service (DoS) Attack Detection Using Edge Machine Learning. International Conference on Machine Learning and Applications (ICMLA), 1741-1745, 2023. https://doi.org/10.1109/ICMLA58977.2023.00264.
  • H. Andrew, Z. Menglong, C. Bo, K. Dmitry, W. Weijun, W. Tobias, A. Marco and A. Hartwig, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Computer Vision and Pattern Recognition, 2017. https://doi.org/10.48550/ arXiv.1704.04861.
  • Kaggle, Hand Gesture Recognition Dataset. https://www.kaggle.com/datasets/aryarishabh/hand-gesture-recognition-dataset, Accessed 14 January 2024.
  • W. Pete, Speech commands: A dataset for limited-vocabulary speech recognition. Computation and Language, 2018. https://doi.org/10.48550/ arXiv.1804.03209.
  • Papers with code, Speech Commands, https://paperswithcode.com/dataset/speech-commands, Accessed 2 February 2024.
  • V. Roman and M. Nikolay, Learning efficient representations for keyword spotting with triplet loss. 23rd International Conference SPECOM, 2021. https://doi.org/10.1007/978-3-030-87802-3_69.
  • B. Kim, S. Chang, J. Lee and D. Sung, Broadcasted Residual Learning for Efficient Keyword Spotting. Proceedings of INTERSPEECH, 2021. https://doi.org/10.48550/arXiv.2106.04140.
  • D. Seo, H.-S. Oh and Y. Jung, Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting. IEEE Access, 9, 80682-80691, 2021. https://doi.org/10.1109/ACCESS.2021.3078715.
  • R. Tang, J. Lee, A. Razi, J. Cambre, I. Bicking, J. Kaye and J. Lin, Howl: A Deployed, Open-Source Wake Word Detection System. Computation and Language, 2020, https://doi.org/10.48550/arXiv.2008.09606.
  • A. Berg, M. O’Connor and M. Tairum Cruz, Keyword Transformer: A Self-Attention Model for Keyword Spotting. Interspeech, 4249-4253, 2021, https://doi.org/10.21437/Interspeech.2021-1286.
  • C. Reddy, E. Beyrami, J. Pool, R. Cutler, S. Srinivasan and J. Gehrke, A scalable noisy speech dataset and online subjective test framework. InterSpeech, 2019. https://doi.org/10.48550/arXiv.1909.08050.
  • A. Mahmood and U. Köse, Speech recognition based on convolutional neural networks and MFCC algorithm. Advances in Artificial Intelligence Reseach, 1(1), 6–12, 2021.
  • Y. Xu, J. Sun, Y. Han, S. Zhao, C. Mei, T. Guo, S. Zhou, C. Xie, W. Zou, X. Li, S. Zhou, C. Xie, W. Zou and X. Li, Audio-Visual Wake Word Spotting System For MISP Challenge 2021. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9246-9250, 2021, https://doi.org/10.48550/ arXiv.2204.08686.
Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Derin Öğrenme, Gömülü Sistemler
Bölüm Araştırma Makaleleri
Yazarlar

Mehmet Tosun 0009-0007-0769-1990

Hamit Erdem 0000-0003-1704-1581

Erken Görünüm Tarihi 11 Eylül 2024
Yayımlanma Tarihi 15 Ekim 2024
Gönderilme Tarihi 11 Mayıs 2024
Kabul Tarihi 30 Temmuz 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 13 Sayı: 4

Kaynak Göster

APA Tosun, M., & Erdem, H. (2024). TinyML tabanlı görsel işitsel anahtar kelime tespiti. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 13(4), 1207-1215. https://doi.org/10.28948/ngumuh.1482481
AMA Tosun M, Erdem H. TinyML tabanlı görsel işitsel anahtar kelime tespiti. NÖHÜ Müh. Bilim. Derg. Ekim 2024;13(4):1207-1215. doi:10.28948/ngumuh.1482481
Chicago Tosun, Mehmet, ve Hamit Erdem. “TinyML Tabanlı görsel işitsel Anahtar Kelime Tespiti”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13, sy. 4 (Ekim 2024): 1207-15. https://doi.org/10.28948/ngumuh.1482481.
EndNote Tosun M, Erdem H (01 Ekim 2024) TinyML tabanlı görsel işitsel anahtar kelime tespiti. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13 4 1207–1215.
IEEE M. Tosun ve H. Erdem, “TinyML tabanlı görsel işitsel anahtar kelime tespiti”, NÖHÜ Müh. Bilim. Derg., c. 13, sy. 4, ss. 1207–1215, 2024, doi: 10.28948/ngumuh.1482481.
ISNAD Tosun, Mehmet - Erdem, Hamit. “TinyML Tabanlı görsel işitsel Anahtar Kelime Tespiti”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13/4 (Ekim 2024), 1207-1215. https://doi.org/10.28948/ngumuh.1482481.
JAMA Tosun M, Erdem H. TinyML tabanlı görsel işitsel anahtar kelime tespiti. NÖHÜ Müh. Bilim. Derg. 2024;13:1207–1215.
MLA Tosun, Mehmet ve Hamit Erdem. “TinyML Tabanlı görsel işitsel Anahtar Kelime Tespiti”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, c. 13, sy. 4, 2024, ss. 1207-15, doi:10.28948/ngumuh.1482481.
Vancouver Tosun M, Erdem H. TinyML tabanlı görsel işitsel anahtar kelime tespiti. NÖHÜ Müh. Bilim. Derg. 2024;13(4):1207-15.

download