Trafikle İlgili Seslerin İşitsel Modeller ve Konvolüsyonel Sinir Ağları Kullanılarak Sınıflandırılması
Yıl 2023,
Cilt: 5 Sayı: 2, 233 - 242, 27.10.2023
Mariem Mine Cheikh Mohamed Fadel
,
Zeynep Özer
Öz
Çalışma, kentsel alanlarda gürültü kirliliğini azaltmak için otoyollardaki akustik olayların kaynaklarını belirlemek için yeni bir yaklaşım önermektedir. Önerilen yöntem, insan kulağının özelliklerine göre modellenen logaritmik ölçekte hız haritası özelliklerini kullanır ve spektral enerjinin yoğunlaştığı bölgelere odaklanmak için bir eşik işlevi içerir. Önerilen modelde dört farklı iç tüy hücresi yöntemi karşılaştırıldı ve Joergensen IHC yöntemi diğer modellere göre sınıflandırma performansında önemli bir gelişme sağladı. Önerilen model, önceki çalışmalara kıyasla F-skoru değerinde yaklaşık %10'luk bir iyileşme sağlamıştır. Genel olarak, bu çalışma, makine öğrenimi teknikleri ve işitsel modeller kullanarak akustik trafik izlemeye umut verici bir yaklaşım sunmaktadır.
Kaynakça
- K.S. Rao, S.G. Koolagudi, and R.R. Vempada “Emotion recognition from speech using global and local prosodic features,” Int. J. Speech Technol., vol. 16, no. 2, pp. 143–160, 2013.
- M. Valstar et al. “AVEC 2016 - Depression, mood, and emotion recognition workshop and challenge,” in AVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia, pp. 3–10, 2016.
- S.R. Bandela and T.K. Kumar “Speech emotion recognition using semi-NMF feature optimization,” Turkish J. Electr. Eng. Comput. Sci., vol. 27, no. 5, pp. 3741–3757, 2019.
- O. Martin, I. Kotsia, B. Macq, and I. Pitas “The eNTERFACE’05 Audio-Visual emotion database,” in ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006.
- Y. Wang and L. Guan “Recognizing human emotional state from audiovisual signals,” IEEE Trans. Multimed., vol. 10, no. 5, pp. 936–946, 2008.
- J.B. Alonso, J. Cabrera, M. Medina, and C.M. Travieso “New approach in quantification of emotional intensity from the speech signal: Emotional temperature,” Expert Syst. Appl., vol. 42, no. 24, pp. 9554–9564, 2015.
- C.K. Yogesh et al. “A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal,” Expert Syst. Appl., vol. 69, pp. 149–158, 2017.
- M. Huzaifah “Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks,” 2017.
- G. Lu, L. Yuan, W. Yang, J. Yan, and H. Li “Speech emotion recognition based on long short-term memory and convolutional neural networks,” Nanjing Youdian Daxue Xuebao (Ziran Kexue Ban)/Journal Nanjing Univ. Posts Telecommun. (Natural Sci.), vol. 38, no. 5, pp. 63–69, 2018.
- I. Ozer, Z. Ozer, and O. Findik “Noise robust sound event classification with convolutional neural network,” Neurocomputing, vol. 272, 2018.
- T. Dau, D. Püschel, and A. Kohlrausch “A quantitative model of the ‘“effective”’ signal processing in the auditory system. I. Model structure,” J. Acoust. Soc. Am., vol. 99, no. 6, pp. 3615–3622, 1996.
- G.J. Brown and M. Cooke “Computational auditory scene analysis,” Comput. Speech Lang., vol. 8, no. 4, pp. 297–336, 1994.
- R.V. Sharan and T.J. Moir “Acoustic event recognition using cochleagram image and convolutional neural networks,” Appl. Acoust., vol. 148, pp. 62–66, 2019.
- R.V. Sharan and T.J. Moir “Subband Time-Frequency Image Texture Features for Robust Audio Surveillance,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 12, pp. 2605–2615, 2015.
- R.V. Sharan, S. Berkovsky, and S. Liu “Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 998–1001, 2020.
- L. Josifovski “Robust Automatic Speech Recognition with Missing and Unreliable Data,” 2002.
- H. Meutzner, N. Ma, R. Nickel, C. Schymura, and D. Kolossa “Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5320–5324, 2017.
- H.E. Romero, N. Ma, G.J. Brown, A.V Beeston, and M. Hasan “Deep Learning Features for Robust Detection of Acoustic Events in Sleep-disordered Breathing,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 810–814, 2019.
- D. Wang and G.J. Brown “Fundamentals of computational auditory scene analysis,” in Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley-IEEE Press, pp. 1–44, 2006.
- R.D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand “Complex Sounds and Auditory Images,” in Auditory Physiology and Perception, CNBH, pp. 429–446, 1992.
- E. de Boer “On cochlear encoding: Potentialities and limitations of the reverse-correlation technique,” J. Acoust. Soc. Am., vol. 63, no. 1, p. 115, 1978.
- R.D. Patterson “SVOS final report, part B: Implementing a gammatone filterbank,” Appl. Psychol. Unit Rep. 2341, 1988.
- B.R. Glasberg and B.C. Moore “Derivation of auditory filter shapes from notched-noise data,” Hear. Res., vol. 47, no. 1–2, pp. 103–138, 1990.
- S. Das, S. Pal, and M. Mitra “Supervised model for Cochleagram feature based fundamental heart sound identification,” Biomed. Signal Process. Control, vol. 52, pp. 32–40, 2019.
- M. Russo, M. Stella, M. Sikora, and V. Pekić “Robust cochlear-model-based speech recognition,” Computers, vol. 8, no. 1, p. 5, 2019.
- A.V Beeston “Perceptual compensation for reverberation in human listeners and machines,” University of Sheffield, 2015.
- S. Jørgensen and T. Dau “Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing,” J. Acoust. Soc. Am., vol. 130, no. 3, pp. 1475–1487, 2011.
- D.H. Johnson “The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones,” J. Acoust. Soc. Am., vol. 68, no. 4, pp. 1115–1122, 1980.
- T.F. Weiss and C. Rose “A comparison of synchronization filters in different auditory receptor organs,” Hear. Res., vol. 33, no. 2, pp. 175–179, 1988.
- L.R. Bernstein and C. Trahiotis “The normalized correlation: Accounting for binaural detection across center frequency,” J. Acoust. Soc. Am., vol. 100, no. 6, pp. 3774–3784, 1996.
- A. Breebaart, J. Van De Par, S. and Kohlrausch “Binaural processing model based on contralateral inhibition. I. Model structure,” J. Acoust. Soc. Am., vol. 110, no. 2, pp. 1074–1088, 2001.
- D. Hilbert "Framework for a General Theory of Linear Integral Equations", New York, 1953.
- I. Ozer, S. B. Efe, and H. Ozbay, “A combined deep learning application for short term load forecasting,” Alexandria Eng. J., vol. 60, no. 4, pp. 3807–3818, 2021.
- İ. Özer, S.B. Efe, and H. Özbay “CNN/Bi-LSTM-based deep learning algorithm for classification of power quality disturbances by using spectrogram images,” Int. Trans. Electr. Energy Syst., vol. 31, no. 12, p. e13204, 2021.
- M. Bayram, Ö. İlyas “Deep learning methods for autism spectrum disorder diagnosis based on fMRI images,” Sakarya University Journal of Computer and Information Sciences, vol. 4, no. 1, 2021.
- K. He and J. Sun “Convolutional Neural Networks at Constrained Time Cost.” Accessed: Jan. 27, 2021. [Online]. Available: https://www.cv foundation.org/openaccess/content_cvpr_2015/html/He_Convolutional_Neural_Networks_2015_CVPR_paper.html.
- P. Agarwal and S. Kumar “Imagined word pairs recognition from non-invasive brain signals using Hilbert transform,” Int. J. Syst. Assur. Eng. Manag., vol. 13, no. 1, pp. 385–394, 2022.
- I. Ozer “Pseudo-colored rate map representation for speech emotion recognition,” Biomed. Signal Process. Control, vol. 66, 2021.
- J. Abeber, S. Gourishetti, A. Katai, T. Claub, P. Sharma, and J. Liebetrau “IDMT-Traffic: An Open Benchmark Dataset for Acoustic Traffic Monitoring Research,” in 2021 29th European Signal Processing Conference (EUSIPCO), pp. 551–555, 2021
Classification of Traffic-Related Sounds Using Auditory Models and Convolutional Neural Networks
Yıl 2023,
Cilt: 5 Sayı: 2, 233 - 242, 27.10.2023
Mariem Mine Cheikh Mohamed Fadel
,
Zeynep Özer
Öz
The study proposes a new approach to identify sources of acoustic events on highways for reducing noise pollution in urban areas. The proposed method uses rate map features in logarithmic scale, modeled on the human ear's characteristics, and includes a threshold function to focus on regions where spectral energy is concentrated. Four different inner hair-cell methods were compared in the proposed model, and the Joergensen IHC method provided a significant improvement in classification performance compared to other models. The proposed model achieved approximately 10% improvement in the F-score value compared to previous studies. Overall, this study presents a promising approach to acoustic traffic monitoring using machine learning techniques and auditory models.
Kaynakça
- K.S. Rao, S.G. Koolagudi, and R.R. Vempada “Emotion recognition from speech using global and local prosodic features,” Int. J. Speech Technol., vol. 16, no. 2, pp. 143–160, 2013.
- M. Valstar et al. “AVEC 2016 - Depression, mood, and emotion recognition workshop and challenge,” in AVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia, pp. 3–10, 2016.
- S.R. Bandela and T.K. Kumar “Speech emotion recognition using semi-NMF feature optimization,” Turkish J. Electr. Eng. Comput. Sci., vol. 27, no. 5, pp. 3741–3757, 2019.
- O. Martin, I. Kotsia, B. Macq, and I. Pitas “The eNTERFACE’05 Audio-Visual emotion database,” in ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006.
- Y. Wang and L. Guan “Recognizing human emotional state from audiovisual signals,” IEEE Trans. Multimed., vol. 10, no. 5, pp. 936–946, 2008.
- J.B. Alonso, J. Cabrera, M. Medina, and C.M. Travieso “New approach in quantification of emotional intensity from the speech signal: Emotional temperature,” Expert Syst. Appl., vol. 42, no. 24, pp. 9554–9564, 2015.
- C.K. Yogesh et al. “A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal,” Expert Syst. Appl., vol. 69, pp. 149–158, 2017.
- M. Huzaifah “Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks,” 2017.
- G. Lu, L. Yuan, W. Yang, J. Yan, and H. Li “Speech emotion recognition based on long short-term memory and convolutional neural networks,” Nanjing Youdian Daxue Xuebao (Ziran Kexue Ban)/Journal Nanjing Univ. Posts Telecommun. (Natural Sci.), vol. 38, no. 5, pp. 63–69, 2018.
- I. Ozer, Z. Ozer, and O. Findik “Noise robust sound event classification with convolutional neural network,” Neurocomputing, vol. 272, 2018.
- T. Dau, D. Püschel, and A. Kohlrausch “A quantitative model of the ‘“effective”’ signal processing in the auditory system. I. Model structure,” J. Acoust. Soc. Am., vol. 99, no. 6, pp. 3615–3622, 1996.
- G.J. Brown and M. Cooke “Computational auditory scene analysis,” Comput. Speech Lang., vol. 8, no. 4, pp. 297–336, 1994.
- R.V. Sharan and T.J. Moir “Acoustic event recognition using cochleagram image and convolutional neural networks,” Appl. Acoust., vol. 148, pp. 62–66, 2019.
- R.V. Sharan and T.J. Moir “Subband Time-Frequency Image Texture Features for Robust Audio Surveillance,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 12, pp. 2605–2615, 2015.
- R.V. Sharan, S. Berkovsky, and S. Liu “Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 998–1001, 2020.
- L. Josifovski “Robust Automatic Speech Recognition with Missing and Unreliable Data,” 2002.
- H. Meutzner, N. Ma, R. Nickel, C. Schymura, and D. Kolossa “Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5320–5324, 2017.
- H.E. Romero, N. Ma, G.J. Brown, A.V Beeston, and M. Hasan “Deep Learning Features for Robust Detection of Acoustic Events in Sleep-disordered Breathing,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 810–814, 2019.
- D. Wang and G.J. Brown “Fundamentals of computational auditory scene analysis,” in Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley-IEEE Press, pp. 1–44, 2006.
- R.D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand “Complex Sounds and Auditory Images,” in Auditory Physiology and Perception, CNBH, pp. 429–446, 1992.
- E. de Boer “On cochlear encoding: Potentialities and limitations of the reverse-correlation technique,” J. Acoust. Soc. Am., vol. 63, no. 1, p. 115, 1978.
- R.D. Patterson “SVOS final report, part B: Implementing a gammatone filterbank,” Appl. Psychol. Unit Rep. 2341, 1988.
- B.R. Glasberg and B.C. Moore “Derivation of auditory filter shapes from notched-noise data,” Hear. Res., vol. 47, no. 1–2, pp. 103–138, 1990.
- S. Das, S. Pal, and M. Mitra “Supervised model for Cochleagram feature based fundamental heart sound identification,” Biomed. Signal Process. Control, vol. 52, pp. 32–40, 2019.
- M. Russo, M. Stella, M. Sikora, and V. Pekić “Robust cochlear-model-based speech recognition,” Computers, vol. 8, no. 1, p. 5, 2019.
- A.V Beeston “Perceptual compensation for reverberation in human listeners and machines,” University of Sheffield, 2015.
- S. Jørgensen and T. Dau “Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing,” J. Acoust. Soc. Am., vol. 130, no. 3, pp. 1475–1487, 2011.
- D.H. Johnson “The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones,” J. Acoust. Soc. Am., vol. 68, no. 4, pp. 1115–1122, 1980.
- T.F. Weiss and C. Rose “A comparison of synchronization filters in different auditory receptor organs,” Hear. Res., vol. 33, no. 2, pp. 175–179, 1988.
- L.R. Bernstein and C. Trahiotis “The normalized correlation: Accounting for binaural detection across center frequency,” J. Acoust. Soc. Am., vol. 100, no. 6, pp. 3774–3784, 1996.
- A. Breebaart, J. Van De Par, S. and Kohlrausch “Binaural processing model based on contralateral inhibition. I. Model structure,” J. Acoust. Soc. Am., vol. 110, no. 2, pp. 1074–1088, 2001.
- D. Hilbert "Framework for a General Theory of Linear Integral Equations", New York, 1953.
- I. Ozer, S. B. Efe, and H. Ozbay, “A combined deep learning application for short term load forecasting,” Alexandria Eng. J., vol. 60, no. 4, pp. 3807–3818, 2021.
- İ. Özer, S.B. Efe, and H. Özbay “CNN/Bi-LSTM-based deep learning algorithm for classification of power quality disturbances by using spectrogram images,” Int. Trans. Electr. Energy Syst., vol. 31, no. 12, p. e13204, 2021.
- M. Bayram, Ö. İlyas “Deep learning methods for autism spectrum disorder diagnosis based on fMRI images,” Sakarya University Journal of Computer and Information Sciences, vol. 4, no. 1, 2021.
- K. He and J. Sun “Convolutional Neural Networks at Constrained Time Cost.” Accessed: Jan. 27, 2021. [Online]. Available: https://www.cv foundation.org/openaccess/content_cvpr_2015/html/He_Convolutional_Neural_Networks_2015_CVPR_paper.html.
- P. Agarwal and S. Kumar “Imagined word pairs recognition from non-invasive brain signals using Hilbert transform,” Int. J. Syst. Assur. Eng. Manag., vol. 13, no. 1, pp. 385–394, 2022.
- I. Ozer “Pseudo-colored rate map representation for speech emotion recognition,” Biomed. Signal Process. Control, vol. 66, 2021.
- J. Abeber, S. Gourishetti, A. Katai, T. Claub, P. Sharma, and J. Liebetrau “IDMT-Traffic: An Open Benchmark Dataset for Acoustic Traffic Monitoring Research,” in 2021 29th European Signal Processing Conference (EUSIPCO), pp. 551–555, 2021