TY  - JOUR
T1  - Classification of Emotion with Audio Analysis
TT  - Ses Analiziyle Duyguların Sınıflandırılması
AU  - Büyükyıldız, Coşkucan
AU  - Sarıtas, Ismail
AU  - Yaşar, Ali
PY  - 2023
DA  - August
DO  - 10.53433/yyufbed.1219879
JF  - Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi
JO  - YYUFBED
PB  - Van Yüzüncü Yıl Üniversitesi
WT  - DergiPark
SN  - 1300-5413
SP  - 467
EP  - 481
VL  - 28
IS  - 2
LA  - en
AB  - Classification is an important technique used to distinguish data samples. The aim of this study is to classify according to emotions by extracting audio features. Two male and two female individuals expressed four different emotions as &quot;fun&quot;, &quot;angry&quot;, &quot;neutral&quot; and &quot;sleepy&quot; in the voice data. We used to “MFCC” as a Cepstral feature, “Centroid, Flatness, Skewness, Crest, Flux, Slope, Decrease, Kurtosis, Spread, Entropy, roll off point” as Spectral Feature, “Pitch, Harmonic ratio” as Periodicity Features in the sound features. After, we applied to the data that all the classification algorithms located in the classification learner toolbox in Matlab and we tried to classify the emotion with the algorithm that provides the highest accuracy. Each data in the classification study has twenty-six features inputs and one labeled output value. According to the results, support vector machine algorithm provided the highest accuracy performance. Considering the performances obtained, this study reveals that it is possible to distinguish and classify sounds using sentimental data and sound feature parameters.
KW  - Audio Features
KW  - Classification
KW  - Emotion Identifier
KW  - Machine Learning Database
KW  - Support Vector Machine
N2  - Sınıflandırma, veri örneklerini ayırt edebilmek için kullanılan önemli bir tekniktir. Bu çalışmada öz nitelikler çıkartılarak, duygulara göre sesin sınıflandırılması amaçlanmıştır. Neşeli, sinirli, nötr ve uykulu olmak üzere dört farklı duyguda konuşan iki erkek ve iki kadın bireyden alınan ses verileri kullanılmıştır. Sesin özniteliklerinde; Kepstral özellik olarak “Mel-Frekansı Kepstral Katsayıları”, Spektral Özellik olarak “Ağırlık Merkezi, Pürüzsüzlük, Çarpıklık, Tepe, Akış, Eğim, Azalma, Basıklık, Yayılma, Entropi, Yuvarlanma noktası”, Periyodisite Özelliği olarak “Ses perdesi, Harmonik oran” kullandık. Daha sonra, Matlab’da bulunan “sınıflandırma öğrenici” araç kutusunda yer alan tüm sınıflandırma algoritmalarını veriye uyguladık ve en yüksek doğruluğu sağlayan algoritmayla duyguyu tahmin etmeye çalıştık. Sınıflandırma çalışmasında yer alan her bir veri, yirmi altı öz nitelik girdisi ve bir etiketli çıktı değerine sahiptir. Performans sonuçlarına göre, destek vektör makine algoritması en yüksek doğruluk değerini sağlamıştır. Elde edilen performans çıktıları göz önüne alındığında, bu çalışma, duyusal veriler ve ses öznitelikleri kullanılarak sesleri ayırt etmenin ve sınıflandırmanın mümkün olduğunu ortaya koymaktadır.
CR  - Adigwe, A., Tits, N., Haddad, K. E., Ostadabbas, S., &amp; Dutoit, T. (2018). The emotional voices database: Towards controlling the emotion dimension in voice generation systems. arXiv preprint arXiv:1806.09514. doi:10.48550/arXiv.1806.09514
CR  - Antoni, J. (2006). The spectral kurtosis: A useful tool for characterising non-stationary signals. Mechanical Systems and Signal Processing, 20(2), 282-307. doi:10.1016/j.ymssp.2004.09.001
CR  - Aouani, H., &amp; Ayed, Y. B. (2018, March). Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. 2018 4th International conference on advanced technologies for signal and image processing (ATSIP), Sousse, Tunisia. doi:10.1109/ATSIP.2018.8364518
CR  - Chatterjee, J., Mukesh, V., Hsu, H.-H., Vyas, G., &amp; Liu, Z. (2018, August). Speech emotion recognition using cross-correlation and acoustic features. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/  PiCom/ DataCom/ CyberSciTech), Athens, Greece. doi:10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00050
CR  - Dubnov, S. (2004). Generalization of spectral flatness measure for non-gaussian linear processes. IEEE Signal Processing Letters, 11(8), 698-701. doi:10.1109/LSP.2004.831663
CR  - Eskidere, Ö., &amp; Ertaş, F. (2009). Mel frekansı kepstrum katsayılarındaki değişimlerin konuşmacı tanımaya etkisi. Uludağ University Journal of The Faculty of Engineering, 14(2), 93-110.
CR  - Giannakopoulos, T. &amp; Pikrakis, A. (2014). Introduction to audio analysis: A MATLAB® approach. Orlando, FL, USA: Academic Press Inc.
CR  - Giannoulis, D., Massberg, M. &amp; Reiss, J. D. (2013). Parameter automation in a dynamic range compressor. Journal of the Audio Engineering Society, 61(10), 716-726.
CR  - Grey, J. M., &amp; Gordon, J. W. (1978). Perceptual effects of spectral modifications on musical timbres. The Journal of the Acoustical Society of America, 63(5), 1493-1500. doi:10.1121/1.381843
CR  - Jain, U., Nathani, K., Ruban, N., Raj, A. N. J., Zhuang, Z., &amp; Mahesh, V. G. V. (2018, October). Cubic SVM classifier based feature extraction and emotion detection from speech signals. 2018 International Conference on Sensor Networks and Signal Processing (SNSP), Xi&#039;an, China. doi:10.1109/SNSP.2018.00081
CR  - Kaynar, O., Görmez, Y., Yıldız, M., &amp; Albayrak, A. (2016, September). Makine öğrenmesi yöntemleri ile duygu analizi. International Artificial Intelligence and Data Processing Symposium (IDAP&#039;16), Malatya, Türkiye.
CR  - Kishore, B., Yasar, A., Taspinar, Y. S., Kursun, R., Cinar, I., Shankar, V. G., … &amp; Ofori, I. (2022). Computer-aided multiclass classification of corn from corn images integrating deep feature extraction. Computational Intelligence and Neuroscience, 2022, 2062944. doi:10.1155/2022/2062944
CR  - Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., &amp; Rao, K. S. (2009). IITKGP-SESC: Speech Database for Emotion Analysis. In S. Ranka et al. (Eds). Contemporary Computing: Second International Conference (pp. 485-492). Noida, India: Springer Berlin Heidelberg. doi:10.1007/978-3-642-03547-0_46
CR  - Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification  techniques. Informatica (Slovenia), 31(3), 249-268.
CR  - Krüger, F. (2016). Activity, context, and plan recognition with computational causal behaviour models. (PhD), University of Rostock, Institute of Communications Engineering, Rostock, Germany.
CR  - Lech, M., Stolar, M., Best, C., &amp; Bolia, R. (2020). Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding. Frontiers in Computer Science, 2, 14. doi:10.3389/fcomp.2020.00014
CR  - Lerch, A. (2012). An introduction to audio content analysis: Applications in signal processing and music informatics. New Jersey, USA: Wiley-IEEE Press.
CR  - Metlek, S., &amp; Kayaalp, K., 2020. Makine Öğrenmesinde, Teoriden Örnek MATLAB Uygulamalarına Kadar Destek Vektör Makineleri. Ankara, Türkiye: İksad Yayınevi.
CR  - Milton, A., Roy, S. S., &amp; Selvi, S. T. (2013). SVM scheme for speech emotion recognition using MFCC feature. International Journal of Computer Applications, 69(9), 34-39. doi:10.5120/11872-7667
CR  - Misra, H., Ikbal, S., Bourlard, H., &amp; Hermansky, H. (2004, May). Spectral entropy based feature for robust ASR. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada. doi:10.1109/ICASSP.2004.1325955
CR  - Mitrović, D., Zeppelzauer, M., &amp; Breiteneder, C. (2010). Chapter 3- Features for content-based audio retrieval. In M. V. Zelkowitz (Ed.), Advances in Computers, Vol. 78 (pp. 71-150). Burlington, USA: Elsevier. doi:10.1016/S0065-2458(10)78003-7
CR  - Mohamad Nezami, O., Jamshid Lou, P., &amp; Karami, M. (2019). ShEMO: a large-scale validated database for Persian speech emotion detection. Language Resources and Evaluation, 53, 1-16. doi:10.1007/s10579-018-9427-x
CR  - Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO Ist Project Report (pp. 1-25). Paris, France: Icram.
CR  - Peeters, G., Giordano, B. L., Susini, P., Misdariis, N., &amp; McAdams, S. (2011). The timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5), 2902-2916. doi:10.1121/1.3642604
CR  - Rebala, G., Ravi, A., &amp; Churiwala, S. (2019). An Introduction to Machine Learning. Cham, Switzerland: Springer.
CR  - Sonawane, A., Inamdar, M. U., &amp; Bhangale, K. B. (2017, August). Sound based human emotion recognition using MFCC &amp; multiple SVM. 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC), Indore, India. doi:10.1109/ICOMICON.2017.8279046
CR  - Tharwat, A. (2020). Classification assessment methods. Applied Computing and Informatics, 17(1), 168-192. doi:10.1016/j.aci.2018.08.003
CR  - Tuncer, T., Dogan, S., &amp; Acharya, U. R. (2021). Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowledge-Based Systems, 211, 106547. doi:10.1016/j.knosys.2020.106547
CR  - Vyas, G., &amp; Kumari, B. (2013). Speaker recognition system based on mfcc and dct. International Journal of Engineering and Advanced Technology (IJEAT), 2(5), 167-169.
CR  - Yasar, A., Saritas, I., &amp; Korkmaz, H. (2018). Determination of intestinal mass by region growing method. Preprints, 2018, 2018050449. doi:10.20944/preprints201805.0449.v1
CR  - Yasar, A. (2022). Benchmarking analysis of CNN models for bread wheat varieties. European Food Research and Technology, 249, 749-758. doi:10.1007/s00217-022-04172-y
UR  - https://doi.org/10.53433/yyufbed.1219879
L1  - https://dergipark.org.tr/tr/download/article-file/2835257
ER  -