Research Article

Using Of Deep Learning Models In Acoustic Scene Classification

Volume: 13 Number: 3 September 30, 2025
EN

Using Of Deep Learning Models In Acoustic Scene Classification

Abstract

Ambient sound analysis has become more prominent with the rise of portable and wearable devices. It provides valuable insights into a person's environment by analyzing surrounding sounds. Recently, deep learning methods, frequently used in image and text processing, have been applied to this field and are proving more effective than traditional machine learning techniques. In this study, we evaluated the performance of different deep learning models using mel-spectrograms of 3 classes of stage sounds based on TAU Acoustic Scene 2023 dataset. Our results indicate that a simple Convolutional Neural Network (CNN) model gives better classification results compared to other more complex models in classification tasks. Despite having the fewest parameters, the CNN model achieved the highest success with 59% accuracy. This suggests that simpler models can be highly effective for acoustic scene classification, highlighting the value of more efficient and computationally feasible approaches in this domain

Keywords

References

  1. [1] Kriman S, Beliaev S, Ginsburg B, Huang J, Kuchaiev O, Lavrukhin V, Leary R, Li J, Zhang Y, Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions, in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 2020; arXiv:1910.10261.
  2. [2] Shivakumar KM, Aravind KG, Anoop TV, Gupta D. Kannada speech to text conversion using CMU Sphinx. Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016; 3-1:6.
  3. [3] Mathur A, Saxena T, Krishnamurthi R. Generating subtitles automatically using audio extraction and speech recognition. Proceedings of the 2015 IEEE International Conference on Computational Intelligence and Communication Technology (CICT). 2015; 621–626.
  4. [4] Sakurai M, Kosaka T. Emotion recognition combining acoustic and linguistic features based on speech recognition results. Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics. 2021; 824-827.
  5. [5] Yağcı M, Aygül ME. Derin öğrenme tabanlı gerçek zamanlı vücut hareketlerinden duygu analizi modeli. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji. 2022; 12: 664–674.
  6. [6] Fayek HM, Johnson J. Temporal reasoning via audio question answering. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020; 1-1.
  7. [7] Ewert S, Müller M. Estimating note intensities in music recordings. In: Proceedings of ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. 2011; 385-388.
  8. [8] Türkmen MC, Ergin AA. Music note detection using matrix pencil method. In: Proceedings of the 31st IEEE Conference on Signal Processing Communications Applications, SIU 2023, 2023; 1-4.

Details

Primary Language

English

Subjects

Signal Processing

Journal Section

Research Article

Early Pub Date

July 2, 2025

Publication Date

September 30, 2025

Submission Date

November 15, 2024

Acceptance Date

June 4, 2025

Published in Issue

Year 2025 Volume: 13 Number: 3

APA
Bozdağ, Z., & Çiğ, H. (2025). Using Of Deep Learning Models In Acoustic Scene Classification. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 13(3), 849-858. https://doi.org/10.29109/gujsc.1585401

                                TRINDEX     16167        16166    21432    logo.png

      

    e-ISSN:2147-9526