Araştırma Makalesi

TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish

Cilt: 10 Sayı: 1 30 Haziran 2026
PDF İndir
EN TR

TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish

Öz

Speech emotion recognition is a growing area focused on enhancing human-computer interaction by precisely recognizing emotions from speech signals. In recent years, advancements in deep learning have led to highly successful studies on speech emotion recognition in the literature. Especially, audio embeddings obtained from self-supervised models have significantly improved emotion recognition performance by capturing rich and meaningful representations of speech signals. While notable advancements have been achieved with self-supervised learning models for languages like English and German, high-quality datasets are still missing for other languages, such as Turkish. This research introduces a new Turkish SER dataset, TURK-SER, which includes 2150 recordings of phonetically varied sentences produced by 90 speakers across five emotional categories. Furthermore, we explore how to adapt the Wav2Vec2 model for Turkish SER using two fine-tuning methods: half-fine tuning, which only updates the transformer encoder, and full-fine tuning, which trains both the convolutional and transformer encoders. Experimental findings indicate that full fine-tuning enhances classification performance, reaching an accuracy of 85.44%. These findings underscore the promise of Wav2Vec2 for SER in low-resource languages and offer valuable insights into optimizing self-supervised learning-based models for emotion detection. This research highlights the effectiveness of Wav2Vec2 in Turkish SER and paves the way for future studies to investigate its applicability across other low-resource languages.

Anahtar Kelimeler

Speech Emotion Recognition, Self-Supervised Learning, TURK-SER Dataset, Speech Embeddings, Wav2Vec2 Fine-tuning

Kaynakça

  1. Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98–125.
  2. Kocaman, O., Yıldız, M., & Kamaz, B. (2018). Use of vocabulary learning strategies in Turkish as a foreign language context. International Journal of Psychology and Educational Studies, 5(2), 54–63.
  3. Kına, E., & Biçek, E. (2023). Duygu analizinde denetimli makine öğrenme algoritmalarının karşılaştırılmaları (Kahramanmaraş depremi örneği). Batman Üniversitesi Yaşam Bilimleri Dergisi, 13(1), 21–31.
  4. Kına, E., & Biçek, E. (2023). Tweetlerin duygu analizi için hibrit bir yaklaşım. Doğu Fen Bilimleri Dergisi, 6(1), 57–68.
  5. Shang, Y., & Fu, T. (2024). Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning. Intelligent Systems with Applications, 24, 200436.
  6. Yurtay, Y., Demirci, H., Tiryaki, H., & Altun, T. (2024). Emotion recognition on call center voice data. Applied Sciences, 14(20), 9458.
  7. Chavhan, Y., Dhore, M., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.
  8. Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics (Vol. 8, pp. 4898–4901).
  9. Tanko, D., Dogan, S., Demir, F. B., Baygin, M., Sahin, S. E., & Tuncer, T. (2022). Shoelace pattern-based speech emotion recognition of the lecturers in distance education: ShoePat23. Applied Acoustics, 190, 108637.
  10. Tuncer, T., Dogan, S., & Acharya, U. R. (2021). Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowledge-Based Systems, 211, 106547.

Kaynak Göster

APA
Eriş, M., Güneş Eriş, F., & Akbal, E. (2026). TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish. International Journal of Innovative Engineering Applications, 10(1), 90-101. https://doi.org/10.46460/ijiea.1887612
AMA
1.Eriş M, Güneş Eriş F, Akbal E. TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish. ijiea, IJIEA. 2026;10(1):90-101. doi:10.46460/ijiea.1887612
Chicago
Eriş, Mustafa, Fatma Güneş Eriş, ve Erhan Akbal. 2026. “TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish”. International Journal of Innovative Engineering Applications 10 (1): 90-101. https://doi.org/10.46460/ijiea.1887612.
EndNote
Eriş M, Güneş Eriş F, Akbal E (01 Haziran 2026) TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish. International Journal of Innovative Engineering Applications 10 1 90–101.
IEEE
[1]M. Eriş, F. Güneş Eriş, ve E. Akbal, “TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish”, ijiea, IJIEA, c. 10, sy 1, ss. 90–101, Haz. 2026, doi: 10.46460/ijiea.1887612.
ISNAD
Eriş, Mustafa - Güneş Eriş, Fatma - Akbal, Erhan. “TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish”. International Journal of Innovative Engineering Applications 10/1 (01 Haziran 2026): 90-101. https://doi.org/10.46460/ijiea.1887612.
JAMA
1.Eriş M, Güneş Eriş F, Akbal E. TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish. ijiea, IJIEA. 2026;10:90–101.
MLA
Eriş, Mustafa, vd. “TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish”. International Journal of Innovative Engineering Applications, c. 10, sy 1, Haziran 2026, ss. 90-101, doi:10.46460/ijiea.1887612.
Vancouver
1.Mustafa Eriş, Fatma Güneş Eriş, Erhan Akbal. TURK-SER: A Speech Emotion Recognition Dataset and Benchmark for Turkish. ijiea, IJIEA. 01 Haziran 2026;10(1):90-101. doi:10.46460/ijiea.1887612