EN
Using Of Deep Learning Models In Acoustic Scene Classification
Abstract
Ambient sound analysis has become more prominent with the rise of portable and wearable devices. It provides valuable insights into a person's environment by analyzing surrounding sounds. Recently, deep learning methods, frequently used in image and text processing, have been applied to this field and are proving more effective than traditional machine learning techniques.
In this study, we evaluated the performance of different deep learning models using mel-spectrograms of 3 classes of stage sounds based on TAU Acoustic Scene 2023 dataset. Our results indicate that a simple Convolutional Neural Network (CNN) model gives better classification results compared to other more complex models in classification tasks. Despite having the fewest parameters, the CNN model achieved the highest success with 59% accuracy. This suggests that simpler models can be highly effective for acoustic scene classification, highlighting the value of more efficient and computationally feasible approaches in this domain
Keywords
Kaynakça
- [1] Kriman S, Beliaev S, Ginsburg B, Huang J, Kuchaiev O, Lavrukhin V, Leary R, Li J, Zhang Y, Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions, in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 2020; arXiv:1910.10261.
- [2] Shivakumar KM, Aravind KG, Anoop TV, Gupta D. Kannada speech to text conversion using CMU Sphinx. Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016; 3-1:6.
- [3] Mathur A, Saxena T, Krishnamurthi R. Generating subtitles automatically using audio extraction and speech recognition. Proceedings of the 2015 IEEE International Conference on Computational Intelligence and Communication Technology (CICT). 2015; 621–626.
- [4] Sakurai M, Kosaka T. Emotion recognition combining acoustic and linguistic features based on speech recognition results. Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics. 2021; 824-827.
- [5] Yağcı M, Aygül ME. Derin öğrenme tabanlı gerçek zamanlı vücut hareketlerinden duygu analizi modeli. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji. 2022; 12: 664–674.
- [6] Fayek HM, Johnson J. Temporal reasoning via audio question answering. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020; 1-1.
- [7] Ewert S, Müller M. Estimating note intensities in music recordings. In: Proceedings of ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. 2011; 385-388.
- [8] Türkmen MC, Ergin AA. Music note detection using matrix pencil method. In: Proceedings of the 31st IEEE Conference on Signal Processing Communications Applications, SIU 2023, 2023; 1-4.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Sinyal İşleme
Bölüm
Araştırma Makalesi
Erken Görünüm Tarihi
2 Temmuz 2025
Yayımlanma Tarihi
30 Eylül 2025
Gönderilme Tarihi
15 Kasım 2024
Kabul Tarihi
4 Haziran 2025
Yayımlandığı Sayı
Yıl 2025 Cilt: 13 Sayı: 3
