Review
BibTex RIS Cite

Sound Scene and Events Detection using Deep Learning in the Scope of Cyber Security for Multimedia Systems

Year 2019, Volume: 3 Issue: 2, 60 - 82, 30.12.2019

Abstract

In addition to many natural sound sources in nature, synthetic sounds are also used in the multimedia systems of our modern world. Environments (i.e., sound scenes) with these sounds are important for biometric authorization, security requirements and robust/safer voice/video communication. Apart from audio formats that have special constraints such as speech/speaker recognition and verification, the separation of polyphonic sounds, noise reduction, detection of sound scenes/events and voice tagging processes are gaining importance in order to create safer information systems in terms of cyber security. In recent years deep learning has been preferred in the field of cyber security due to its layered infrastructure, which enables the easy extraction of attributes and semantic relationships in the raw data. In this study, the use of deep learning architecture models for voice (or speech) analysis and classification/prediction and detection as multimedia data in cyber security coverage is examined. In our study, deep neural networks, convolutional neural networks, recurrent neural networks, restricted Boltzmann machine and deep belief networks are systematically reviewed as prominent models in the publications between 2015 and 2019. Therefore, the orientation in the literature on voice/speech processing in cyber security, prevention of voice spoofing, and achieving consistent and high performance results is clearly demonstrated through discussions and comments based on scientific findings over fourty studies. 

References

  • Alisamir, S., Ahadi, S. M., & Seyedin, S. (2018). An end-to-end deep learning model to recognize Farsi speech from raw input. In Proceedings of IEEE 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (pp. 1–5). Tehran, Iran: IEEE. http://dx.doi.org/10.1109/ICSPIS.2018.8700538
  • Anand, P., Singh, A. K., Srivastava, S., & Lall, B. (2019). Few shot speaker recognition using deep neural networks. Electrical Engineering and Systems Science, Audio and Speech Processing(eess.AS), ArXiv. 1–5. Retrieved from https://arxiv.org/abs/1904.08775
  • Babaee, E., Anuar, N. B., Wahab, A. W. A., Shamshirband, S., & Chronopoulos, A. T. (2017). An overview of audio event detection methods from feature extraction to classification, Applied Artificial Intelligence, 31(9–10), 661–714. http://dx.doi.org/10.1080/08839514.2018.1430469.
  • Bhatt, G., Gupta, A., Arora, A., & Raman, B. (2018). Acoustic features fusion using attentive multi-channel deep architecture. Proceedings of CHIME 2018 Workshop on Speeech Processing in Everyday Environments, Hyderabad, India, 30–34. http://dx.doi.org/10.21437/CHiME.2018-7
  • Boddapati, V., Petef, A., Rasmusson, J., & Lundberg, L. (2017). Classifying environmental sounds using image recognition networks. Procedia Computer Science, 112, 2048–2056. http://dx.doi.org/10.1016/j.procs.2017.08.250.
  • Chen, K., Yan, Z-J., & Huo, Q. (2015). Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. In Proceedings of the INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association: Vol 1-5 (pp.3600–3604). Dresden, Germany: ISCA archive. Retrieved from https://www.isca-speech.org/archive/interspeech_2015/i15_3600.html
  • Chollet, F. (2017). Deep learning with python. Newyork, NY: Manning Publication.
  • Chung, H., Park, J. G., & Jung, H.-Y. (2019). Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model. ETRI Journal, 41(2), 235–241. http://dx.doi.org/10.4218/etrij.2018-0189
  • Chung, J. S., Nagrani, A., & Zisserman, A. (2018). VoxCeleb2: Deep speaker recognition. Computer Science, Sound (cs.SD), Electrical Engineering and Systems Science, Audio and Speech Processing(eess.AS). ArXiv. 1–6. Retrieved from https://arxiv.org/abs/1806.05622v2.
  • Çakır, E. (2019). Deep neural networks for sound event detection. (Doctoral Dissertation, Tampere University, Finland). Retrieved from https://tutcris. tut.fi/portal/files/17626487/cakir_12.pdf
  • Espi, M., Fujimoto, M., Kinoshita, K., & Nakatani, T. (2015). Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal On Audio Speech And Music Proccessing, 2015(26), 1–12. http://dx.doi.org/10.1186/s13636-015-0069-2.
  • Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., & Schmauch, B. (2018). CNN+LSTM architecture for speech emotion recognition with data augmentation. In Proceedings of the INTERSPEECH 2018 Workshop on Speech, Music and Mind (pp.21–25). Hyderabad, India:ISCA archive. http:// dx.doi.org/10.21437/SMM.2018-5.
  • Farhadipour, A., Veisi, H., Asgari, M., & Keyvanrad, M. A. (2018). Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks, ETRI Journal (Electronics and Telecommunications Research Institute), 40(5), 643–652. http://dx.doi.org/10.4218/ etrij.2017-0260
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press.
  • Han, K., Wang, Y., Wang, D, Woods, W. S., Merks, I., & Zhang, T. (2015). Learning spectral mapping for speech dereverberation and denoising. IEEE/ ACM Transactions on Audio, Speech, and Language Processing, 23(6), 982–992. http://dx.doi.org/10.1109/TASLP.2015.2416653.
  • Hanilçi, C., Kinnunen, T., Sahidullah, M., & Sizov, A. (2016). Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise. Speech Communication, 85, 83–97. http://dx.doi.org/10.1016/j.specom.2016.10.002
  • Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A. M. (2015). Automatic versus human speaker verification: The case of voice mimicry. Speech Communication, 72, 13–31. http://dx.doi.org/10.1016/j.specom.2015.05.002
  • Himawan, I., Villavicencio, F., Sridharan, S., & Fookes, C. (2019). Deep domain adaptation for anti-spoofing in speaker verification systems. Computer Speech & Language, 58, 377–402. http://dx.doi.org/ 10.1016/j.csl.2019.05.007
  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsburry, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. http://dx.doi.org/10.1109/MSP.2012.2205597.
  • Huzaifah, M. (2017). Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. Computing Research Repository (CoRR), ArXiv. 1–5. Retrieved from https://arxiv.org/abs/1706.07156v1.
  • Jayalakshmi, S. L., Chandrakala, S., & Nedunchelian, R. (2018). Global statistical features-based approach for acoustic event detection. Applied Acoustics, 139, 113–118. http://dx.doi.org/10.1016/j.apacoust.2018.04.026.
  • Kang, T. G., Shin, J. W., & Kim, N. S. (2018). DNN-based monaural speech enhancement with temporal and spectral variations equalization. Digital Signal Processing, 74, 102–110. http://dx.doi.org/10.1016/j.dsp.2017.12.002
  • Khodabakhsh, A., Mohammadi, A., & Demiroglu, C. (2017). Spoofing voice verification systems with statistical speech synthesis using limited adaptation data. Computer Speech & Language, 42, 20–37. http://dx.doi.org/10.1016/j.csl.2016.08.004
  • Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D.J. (2019). 1D convolutional neural networks and applications: A survey. Computing Research Repository (CoRR), ArXiv. 1–20. Retrieved from https://arxiv.org/abs/1905.03554v1.
  • Kong. Q., Xu, Y., Sobieraj, I., Wang, W., & Plumbley, M. D. (2019). Sound event detection and time–frequency segmentation from weakly labelled data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 777–787. http://dx.doi.org/10.1109/TASLP.2019.2895254.
  • Korkmaz, Y. ve Boyacı, A. (2018). Adli bilişim açısından ses incelemeleri. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 30, 329–343.
  • Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. http://dx.doi.org/10.1109/5.726791.
  • Li, R., Liu, Y., Shi, Y., Dong, L., & Cui, W. (2016). ILMSAF based speech enhancement with DNN and noise classification. Speech Communication, 85, 53-70. http://dx.doi.org/10.1016/j.specom.2016.10.008
  • Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., & Moreno, P. J. (2016). On the use of deep feedforward neural networks for automatic language identification. Computer Speech & Language, 40, 46–59. http://dx.doi.org/10.1016/j.csl.2016.03.001
  • Meral, H. M., Sankur, B., Özsoy, A. S., Güngör, T., & Sevinç, E. (2009). Natural language watermarking via morphosyntactic alterations. Computer Speech & Language, 23(1), 107–125. http://dx.doi.org/10.1016/j.csl.2008.04.001
  • Morfi, V., & Stowell, D. (2018). Deep learning for audio event detection and tagging on low-resource datasets. Applied Sciences, 8(8):1397, 1–16. http:// dx.doi.org/10.3390/app8081397.
  • Muratoğlu, O., Okul, Ş. & Aydın, M. A. & Bilge, H. S. (2018, September). Review on cyber risks relating to security management in smart cars. Proceedings 3rd International Conference on Computer Science and Engineering (UBMK18), Sarajevo, Bosnia and Herzegovina, 406–409. http://dx.doi.org/10.1109/ UBMK.2018.8566569.
  • Özer, İ., Özer, Z., & Fındık, O. (2018). Noise robust sound event classification with convolutional neural network. Neurocomputing, 272, 505–512. http:// dx.doi.org/10.1016/j.neucom.2017.07.021
  • Patterson, J. & Gibson, A. (2016). Deep learning. Sebastopol, CA: O’Reilly Media, Inc.
  • Qian, Y., Chen, N., & Yu, K. (2016). Deep features for automatic spoofing detection. Speech Communication, 85, 43–52, http://dx.doi.org/10.1016/j. specom.2016.10.007
  • Qian, Y., Evanini, K., Wang, X., Lee, C. M., & Mulholland, M., (2017). Bidirectional LSTM-RNN for improving automated assessment of non-native children’s speech. In Proceedings of the INTERSPEECH 2017 18th Annual Conference of the International Speech Communication Association (pp.1417–1421). Stockholm, Sweden:ISCA Archive. http://dx.doi.org/10.21437/Interspeech.2017-250
  • Sağıroğlu Ş. ve Koç, O. (2017). Büyük veri ve açık veri analitiği: Yöntemler ve uygulamalar. Ankara: Grafiker Yayınları.
  • Sainath, T. N., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A.-R., Dahl, G., & Ramabhadran, B. (2015). Deep convolutional neural networks for largescale speech tasks. Neural Networks, 64, 39–48. http://dx.doi.org/10.1016/j.neunet.2014.08.005
  • Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3), 279–283. http://dx.doi.org/ 10.1109/LSP.2017.2657381.
  • Samui, S., Chakrabarti, I., & Ghosh, S. K. (2017, August). Deep recurrent neural network based monaural speech separation using recurrent temporal restricted boltzmann machines. In Proceedings Interspeech 2017 18th Annual Conference of the International Speech Communication Association (pp.3622–3626). Stockholm, Sweden:ISCA archive. http://dx.doi.org/10.21437/Interspeech.2017-57.
  • Samui, S., Chakrabarti, I., & Ghosh, S. K. (2019). Time–frequency masking based supervised speech enhancement framework using fuzzy deep belief network. Applied Soft Computing, 74, 583–602. http://dx.doi.org/10.1016/j.asoc.2018.10.031.
  • Sharan, R. V., & Moir, T. J. (2017). Robust acoustic event classification using deep neural networks. Information Sciences, 396, 24–32. http://dx.doi. org/10.1016/j.ins.2017.02.013.
  • Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech & Language, 45, 516–535. http://dx.doi.org/10.1016/j.csl.2017.01.001.
  • Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2017, May). A convolutional neural network approach for acoustic scene classification. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) (pp.1547–1554). Anchorage, AK. http://dx.doi. org/10.1109/IJCNN.2017.7966035.
  • Virtanen, T., Plumbley, M. D., Ellis, D. (Eds.). (2018). Computational analysis of sound scenes and events. Cham, Switzerland: Springer International Publishing AG. http://dx.doi.org/10.1007/978-3-319-63450-0.
  • Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P. J. B., & Plumbley, M. D. (2017). Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1230–1241. http://dx.doi.org/10.1109/ TASLP.2017.2690563.
  • Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D. T., & Gonzalez-Rodriguez, J. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE, 11(1):e0146917, 1–17. http://dx.doi.org/10.1371/journal.pone.0146917.
  • Zeyer, A., Doetsch, P., Voigtlaender, P., Schlüter, R., & Ney, H. (2017). A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp.2462– 2466). New Orleans, LA:IEEE. http://dx.doi.org/10.1109/ICASSP.2017.7952599.
  • Zheng, W., Mo, Z., Xing, X., & Zhao, G. (2018). CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. Computing Research Repository (CoRR), ArXiv. 1–7. Retrieved from https://arxiv.org/abs/1809.01543v1.
  • Zhou, H., Bai, X., & Du, J. (2018, November). An investigation of transfer learning mechanism for acoustic scene classification. Proceedings of 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 404-408). Taipei City, Taiwan. http://dx.doi.org/10.1109/ ISCSLP.2018.8706712

Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti

Year 2019, Volume: 3 Issue: 2, 60 - 82, 30.12.2019

Abstract

Günümüzde doğadaki birçok doğal ses kaynağı yanısıra sentetik sesler de çoklu ortam sistemlerinde kullanılmaktadır. Bu seslerin bulunduğu ortamlar (sahneler) biyometrik yetkilendirme, güvenlik isterleri ve gürbüz/güvenli sesli/görüntülü iletişim için önem arz etmektedir. Konuşma/konuşmacı tanıma, doğrulama gibi özel kısıtlara sahip ses biçemleri haricinde çoklu seslerin ayrıştırılması, gürültü giderilmesi, ses sahnesi/ olaylarının tespiti ve ses etiketleme işlemleri siber güvenlik açısından daha güvenli bilişim sistemleri oluşturulması adına gün geçtikçe önem kazanmaktadır. Derin öğrenme katmanlı altyapısı gereği oldukça iyi bir biçimde ham verideki özniteliklerin ve anlamsal ilişkinin elde edilmesine olanak sunmasından dolayı son yılllarda siber güvenlik alanında da tercih edilir olmuştur. Bu çalışmada siber güvenlik kapsamında çoklu ortam verisi olarak ses (veya konuşma) analizi ve sınıflandırma/tahminleme ve tespit için derin öğrenme mimari modellerinin kullanımı irdelenmiştir. Çalışmamızda 2015 ilâ 2019 yılları arasındaki yayınlarda öne çıkan modeller olan derin sinir ağları, evrişimli sinir ağları, tekrarlayıcı sinir ağları, kısıtlanmış Boltzmann makinesi ve derin inanç ağları sistematik olarak incelenmiştir. Böylece siber güvenlikte ses/konuşma işleme, sesle aldatmayı önleme, tutarlı ve yüksek başarımlı sonuçları elde etmeye dair literatürdeki yönelim kırkı aşkın çalışma üzerinden bilimsel bulgulara dayanan tartışma ve yorumlarla açıkça ortaya konulmaktadır.

References

  • Alisamir, S., Ahadi, S. M., & Seyedin, S. (2018). An end-to-end deep learning model to recognize Farsi speech from raw input. In Proceedings of IEEE 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (pp. 1–5). Tehran, Iran: IEEE. http://dx.doi.org/10.1109/ICSPIS.2018.8700538
  • Anand, P., Singh, A. K., Srivastava, S., & Lall, B. (2019). Few shot speaker recognition using deep neural networks. Electrical Engineering and Systems Science, Audio and Speech Processing(eess.AS), ArXiv. 1–5. Retrieved from https://arxiv.org/abs/1904.08775
  • Babaee, E., Anuar, N. B., Wahab, A. W. A., Shamshirband, S., & Chronopoulos, A. T. (2017). An overview of audio event detection methods from feature extraction to classification, Applied Artificial Intelligence, 31(9–10), 661–714. http://dx.doi.org/10.1080/08839514.2018.1430469.
  • Bhatt, G., Gupta, A., Arora, A., & Raman, B. (2018). Acoustic features fusion using attentive multi-channel deep architecture. Proceedings of CHIME 2018 Workshop on Speeech Processing in Everyday Environments, Hyderabad, India, 30–34. http://dx.doi.org/10.21437/CHiME.2018-7
  • Boddapati, V., Petef, A., Rasmusson, J., & Lundberg, L. (2017). Classifying environmental sounds using image recognition networks. Procedia Computer Science, 112, 2048–2056. http://dx.doi.org/10.1016/j.procs.2017.08.250.
  • Chen, K., Yan, Z-J., & Huo, Q. (2015). Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. In Proceedings of the INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association: Vol 1-5 (pp.3600–3604). Dresden, Germany: ISCA archive. Retrieved from https://www.isca-speech.org/archive/interspeech_2015/i15_3600.html
  • Chollet, F. (2017). Deep learning with python. Newyork, NY: Manning Publication.
  • Chung, H., Park, J. G., & Jung, H.-Y. (2019). Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model. ETRI Journal, 41(2), 235–241. http://dx.doi.org/10.4218/etrij.2018-0189
  • Chung, J. S., Nagrani, A., & Zisserman, A. (2018). VoxCeleb2: Deep speaker recognition. Computer Science, Sound (cs.SD), Electrical Engineering and Systems Science, Audio and Speech Processing(eess.AS). ArXiv. 1–6. Retrieved from https://arxiv.org/abs/1806.05622v2.
  • Çakır, E. (2019). Deep neural networks for sound event detection. (Doctoral Dissertation, Tampere University, Finland). Retrieved from https://tutcris. tut.fi/portal/files/17626487/cakir_12.pdf
  • Espi, M., Fujimoto, M., Kinoshita, K., & Nakatani, T. (2015). Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal On Audio Speech And Music Proccessing, 2015(26), 1–12. http://dx.doi.org/10.1186/s13636-015-0069-2.
  • Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., & Schmauch, B. (2018). CNN+LSTM architecture for speech emotion recognition with data augmentation. In Proceedings of the INTERSPEECH 2018 Workshop on Speech, Music and Mind (pp.21–25). Hyderabad, India:ISCA archive. http:// dx.doi.org/10.21437/SMM.2018-5.
  • Farhadipour, A., Veisi, H., Asgari, M., & Keyvanrad, M. A. (2018). Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks, ETRI Journal (Electronics and Telecommunications Research Institute), 40(5), 643–652. http://dx.doi.org/10.4218/ etrij.2017-0260
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press.
  • Han, K., Wang, Y., Wang, D, Woods, W. S., Merks, I., & Zhang, T. (2015). Learning spectral mapping for speech dereverberation and denoising. IEEE/ ACM Transactions on Audio, Speech, and Language Processing, 23(6), 982–992. http://dx.doi.org/10.1109/TASLP.2015.2416653.
  • Hanilçi, C., Kinnunen, T., Sahidullah, M., & Sizov, A. (2016). Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise. Speech Communication, 85, 83–97. http://dx.doi.org/10.1016/j.specom.2016.10.002
  • Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A. M. (2015). Automatic versus human speaker verification: The case of voice mimicry. Speech Communication, 72, 13–31. http://dx.doi.org/10.1016/j.specom.2015.05.002
  • Himawan, I., Villavicencio, F., Sridharan, S., & Fookes, C. (2019). Deep domain adaptation for anti-spoofing in speaker verification systems. Computer Speech & Language, 58, 377–402. http://dx.doi.org/ 10.1016/j.csl.2019.05.007
  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsburry, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. http://dx.doi.org/10.1109/MSP.2012.2205597.
  • Huzaifah, M. (2017). Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. Computing Research Repository (CoRR), ArXiv. 1–5. Retrieved from https://arxiv.org/abs/1706.07156v1.
  • Jayalakshmi, S. L., Chandrakala, S., & Nedunchelian, R. (2018). Global statistical features-based approach for acoustic event detection. Applied Acoustics, 139, 113–118. http://dx.doi.org/10.1016/j.apacoust.2018.04.026.
  • Kang, T. G., Shin, J. W., & Kim, N. S. (2018). DNN-based monaural speech enhancement with temporal and spectral variations equalization. Digital Signal Processing, 74, 102–110. http://dx.doi.org/10.1016/j.dsp.2017.12.002
  • Khodabakhsh, A., Mohammadi, A., & Demiroglu, C. (2017). Spoofing voice verification systems with statistical speech synthesis using limited adaptation data. Computer Speech & Language, 42, 20–37. http://dx.doi.org/10.1016/j.csl.2016.08.004
  • Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D.J. (2019). 1D convolutional neural networks and applications: A survey. Computing Research Repository (CoRR), ArXiv. 1–20. Retrieved from https://arxiv.org/abs/1905.03554v1.
  • Kong. Q., Xu, Y., Sobieraj, I., Wang, W., & Plumbley, M. D. (2019). Sound event detection and time–frequency segmentation from weakly labelled data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 777–787. http://dx.doi.org/10.1109/TASLP.2019.2895254.
  • Korkmaz, Y. ve Boyacı, A. (2018). Adli bilişim açısından ses incelemeleri. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 30, 329–343.
  • Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. http://dx.doi.org/10.1109/5.726791.
  • Li, R., Liu, Y., Shi, Y., Dong, L., & Cui, W. (2016). ILMSAF based speech enhancement with DNN and noise classification. Speech Communication, 85, 53-70. http://dx.doi.org/10.1016/j.specom.2016.10.008
  • Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., & Moreno, P. J. (2016). On the use of deep feedforward neural networks for automatic language identification. Computer Speech & Language, 40, 46–59. http://dx.doi.org/10.1016/j.csl.2016.03.001
  • Meral, H. M., Sankur, B., Özsoy, A. S., Güngör, T., & Sevinç, E. (2009). Natural language watermarking via morphosyntactic alterations. Computer Speech & Language, 23(1), 107–125. http://dx.doi.org/10.1016/j.csl.2008.04.001
  • Morfi, V., & Stowell, D. (2018). Deep learning for audio event detection and tagging on low-resource datasets. Applied Sciences, 8(8):1397, 1–16. http:// dx.doi.org/10.3390/app8081397.
  • Muratoğlu, O., Okul, Ş. & Aydın, M. A. & Bilge, H. S. (2018, September). Review on cyber risks relating to security management in smart cars. Proceedings 3rd International Conference on Computer Science and Engineering (UBMK18), Sarajevo, Bosnia and Herzegovina, 406–409. http://dx.doi.org/10.1109/ UBMK.2018.8566569.
  • Özer, İ., Özer, Z., & Fındık, O. (2018). Noise robust sound event classification with convolutional neural network. Neurocomputing, 272, 505–512. http:// dx.doi.org/10.1016/j.neucom.2017.07.021
  • Patterson, J. & Gibson, A. (2016). Deep learning. Sebastopol, CA: O’Reilly Media, Inc.
  • Qian, Y., Chen, N., & Yu, K. (2016). Deep features for automatic spoofing detection. Speech Communication, 85, 43–52, http://dx.doi.org/10.1016/j. specom.2016.10.007
  • Qian, Y., Evanini, K., Wang, X., Lee, C. M., & Mulholland, M., (2017). Bidirectional LSTM-RNN for improving automated assessment of non-native children’s speech. In Proceedings of the INTERSPEECH 2017 18th Annual Conference of the International Speech Communication Association (pp.1417–1421). Stockholm, Sweden:ISCA Archive. http://dx.doi.org/10.21437/Interspeech.2017-250
  • Sağıroğlu Ş. ve Koç, O. (2017). Büyük veri ve açık veri analitiği: Yöntemler ve uygulamalar. Ankara: Grafiker Yayınları.
  • Sainath, T. N., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A.-R., Dahl, G., & Ramabhadran, B. (2015). Deep convolutional neural networks for largescale speech tasks. Neural Networks, 64, 39–48. http://dx.doi.org/10.1016/j.neunet.2014.08.005
  • Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3), 279–283. http://dx.doi.org/ 10.1109/LSP.2017.2657381.
  • Samui, S., Chakrabarti, I., & Ghosh, S. K. (2017, August). Deep recurrent neural network based monaural speech separation using recurrent temporal restricted boltzmann machines. In Proceedings Interspeech 2017 18th Annual Conference of the International Speech Communication Association (pp.3622–3626). Stockholm, Sweden:ISCA archive. http://dx.doi.org/10.21437/Interspeech.2017-57.
  • Samui, S., Chakrabarti, I., & Ghosh, S. K. (2019). Time–frequency masking based supervised speech enhancement framework using fuzzy deep belief network. Applied Soft Computing, 74, 583–602. http://dx.doi.org/10.1016/j.asoc.2018.10.031.
  • Sharan, R. V., & Moir, T. J. (2017). Robust acoustic event classification using deep neural networks. Information Sciences, 396, 24–32. http://dx.doi. org/10.1016/j.ins.2017.02.013.
  • Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech & Language, 45, 516–535. http://dx.doi.org/10.1016/j.csl.2017.01.001.
  • Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2017, May). A convolutional neural network approach for acoustic scene classification. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) (pp.1547–1554). Anchorage, AK. http://dx.doi. org/10.1109/IJCNN.2017.7966035.
  • Virtanen, T., Plumbley, M. D., Ellis, D. (Eds.). (2018). Computational analysis of sound scenes and events. Cham, Switzerland: Springer International Publishing AG. http://dx.doi.org/10.1007/978-3-319-63450-0.
  • Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P. J. B., & Plumbley, M. D. (2017). Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1230–1241. http://dx.doi.org/10.1109/ TASLP.2017.2690563.
  • Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D. T., & Gonzalez-Rodriguez, J. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE, 11(1):e0146917, 1–17. http://dx.doi.org/10.1371/journal.pone.0146917.
  • Zeyer, A., Doetsch, P., Voigtlaender, P., Schlüter, R., & Ney, H. (2017). A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp.2462– 2466). New Orleans, LA:IEEE. http://dx.doi.org/10.1109/ICASSP.2017.7952599.
  • Zheng, W., Mo, Z., Xing, X., & Zhao, G. (2018). CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. Computing Research Repository (CoRR), ArXiv. 1–7. Retrieved from https://arxiv.org/abs/1809.01543v1.
  • Zhou, H., Bai, X., & Du, J. (2018, November). An investigation of transfer learning mechanism for acoustic scene classification. Proceedings of 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 404-408). Taipei City, Taiwan. http://dx.doi.org/10.1109/ ISCSLP.2018.8706712
There are 50 citations in total.

Details

Primary Language Turkish
Subjects Computer Software
Journal Section Review
Authors

Bahadir Karasulu 0000-0001-8524-874X

Publication Date December 30, 2019
Submission Date July 11, 2019
Published in Issue Year 2019 Volume: 3 Issue: 2

Cite

APA Karasulu, B. (2019). Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti. Acta Infologica, 3(2), 60-82.
AMA Karasulu B. Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti. ACIN. December 2019;3(2):60-82.
Chicago Karasulu, Bahadir. “Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne Ve Olaylarının Tespiti”. Acta Infologica 3, no. 2 (December 2019): 60-82.
EndNote Karasulu B (December 1, 2019) Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti. Acta Infologica 3 2 60–82.
IEEE B. Karasulu, “Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti”, ACIN, vol. 3, no. 2, pp. 60–82, 2019.
ISNAD Karasulu, Bahadir. “Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne Ve Olaylarının Tespiti”. Acta Infologica 3/2 (December 2019), 60-82.
JAMA Karasulu B. Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti. ACIN. 2019;3:60–82.
MLA Karasulu, Bahadir. “Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne Ve Olaylarının Tespiti”. Acta Infologica, vol. 3, no. 2, 2019, pp. 60-82.
Vancouver Karasulu B. Çoklu Ortam Sistemleri İçin Siber Güvenlik Kapsamında Derin Öğrenme Kullanarak Ses Sahne ve Olaylarının Tespiti. ACIN. 2019;3(2):60-82.