Research Article

Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K

Volume: 9 Number: 3 June 30, 2026

Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K

Abstract

Environmental sound classification (ESC) is critically important for monitoring noise pollution and ensuring urban safety in smart city applications. Although deep learning–based approaches have achieved high performance in this domain, many studies in the literature rely on randomly partitioned datasets that cause data leakage or require massive pretraining corpora with high computational cost (e.g., AudioSet). In this study, we propose a methodologically robust and computationally efficient approach for environmental sound classification on the UrbanSound8K dataset. To ensure the reliability of the results, we adopt the Official 10-Fold Cross-Validation protocol, which is considered the most challenging evaluation scheme in the literature. In our experiments, the Vision Transformer (ViT) architecture is compared with modern CNN architectures. In addition, the impact of data augmentation techniques such as MixUp and SpecAugment on these architectures is analyzed. The results show that under the Official 10-fold protocol, ConvNeXt-Tiny achieves the best mean accuracy, reaching 83.94% with MixUp and 82.81% with the combined SpecAugment+MixUp setting, while ViT attains 81.94% under SpecAugment+MixUp. In contrast, Random splitting artificially inflates performance to 98.06% due to leakage, underscoring the need for the Official, leakage-free protocol.

Keywords

References

  1. B. İşler, "Urban sound recognition in smart cities using an IoT–fog computing framework and deep learning models: A performance comparison", Appl. Sci., vol. 15, no. 3, Art. no. 1201, 2025, doi: 10.3390/app15031201.
  2. United Nations, Department of Economic and Social Affairs, Population Division, World Urbanization Prospects: The 2018 Revision (ST/ESA/SER.A/420). New York, NY, USA: United Nations, 2019.
  3. B. Peng, W. H. Abdulla, and K. I.-K. Wang, "Urban noise monitoring using edge computing with CNN–LSTM on Jetson Nano" in Proc. 2023 Asia Pacific Signal and Information Processing Association Annu. Summit and Conf. (APSIPA ASC), 2023, pp. 2244–2250.
  4. M. Çakır, M. A. Güvenç, and S. Mıstıkoğlu, "IoT-based Condition Monitoring System Design for Investigation of Non-Oil Ball Bearing in terms of Vibration, Temperature, Acoustic Emission, Current and Revolution Parameters," in Proc. 10th Int. Symp. Intelligent Manufacturing and Service Systems (IMSS), Sakarya, Turkey, Sep. 2019, pp. 1059–1068.
  5. M. Çakır, M. A. Güvenç, and S. Mıstıkoğlu, "The experimental application of popular machine learning algorithms on predictive maintenance and the design of IIoT based condition monitoring system," Comput. Ind. Eng., vol. 151, p. 106948, Jan. 2021, doi: 10.1016/j.cie.2020.106948.
  6. A. M. Tripathi and O. J. Pandey, "Divide and distill: New outlooks on knowledge distillation for environmental sound classification", IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 1100–1113, 2023, doi: 10.1109/TASLP.2023.3244507.
  7. R. Jahangir, M. A. Nauman, R. Alroobaea, J. Almotiri, M. M. Malik, and S. M. Alzahrani, "Deep learning-based environmental sound classification using feature fusion and data enhancement", Comput. Mater. Continua, vol. 74, no. 1, pp. 1069–1091, 2023, doi: 10.32604/cmc.2023.032719.
  8. B. Peng, K. I.-K. Wang, and W. H. Abdulla, "Robust classification of urban sounds in noisy environments: A novel approach using SPWVD–MFCC and dual-stream classifier", Acoust. Aust., vol. 53, pp. 253–268, 2025, doi: 10.1007/s40857-025-00350-6.

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Research Article

Early Pub Date

June 25, 2026

Publication Date

June 30, 2026

Submission Date

November 26, 2025

Acceptance Date

March 30, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

APA
Yurtsever, U. (2026). Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K. Sakarya University Journal of Computer and Information Sciences, 9(3), 934-942. https://doi.org/10.35377/saucis...1830726
AMA
1.Yurtsever U. Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K. SAUCIS. 2026;9(3):934-942. doi:10.35377/saucis.1830726
Chicago
Yurtsever, Ulaş. 2026. “Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K”. Sakarya University Journal of Computer and Information Sciences 9 (3): 934-42. https://doi.org/10.35377/saucis. 1830726.
EndNote
Yurtsever U (June 1, 2026) Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K. Sakarya University Journal of Computer and Information Sciences 9 3 934–942.
IEEE
[1]U. Yurtsever, “Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K”, SAUCIS, vol. 9, no. 3, pp. 934–942, June 2026, doi: 10.35377/saucis...1830726.
ISNAD
Yurtsever, Ulaş. “Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 934-942. https://doi.org/10.35377/saucis. 1830726.
JAMA
1.Yurtsever U. Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K. SAUCIS. 2026;9:934–942.
MLA
Yurtsever, Ulaş. “Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 934-42, doi:10.35377/saucis. 1830726.
Vancouver
1.Ulaş Yurtsever. Modern Deep Learning Architectures for Urban Sound Classification on UrbanSound8K. SAUCIS. 2026 Jun. 1;9(3):934-42. doi:10.35377/saucis. 1830726

 

INDEXING & ABSTRACTING & ARCHIVING

 

31045 31044   ResimLink - Resim Yükle  31047 

31043 28939 28938 34240
 

 

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License