Research Article

A systematic testbed for evaluating emotion classification in large language models

Number: 013 August 31, 2025
EN

A systematic testbed for evaluating emotion classification in large language models

Abstract

The advent of large language models (LLMs) in the domain of natural language processing (NLP) has engendered novel opportunities for the resolution of intricate tasks, such as emotion classification. However, achieving effective emotion analysis with LLMs requires more than simply choosing a ready-made model. In addition, the implementation of specially designed prompt structures, the alignment of the model with tokenisers, the meticulous formatting of both input and output data, and the regulated management of the generation process are imperative. The present paper sets out a technically detailed, reproducible framework for zero-shot and few-shot emotion classification using generative LLMs. The objective of this study is not to assess the efficacy of a given model, but rather to furnish researchers with a comprehensive manual outlining the essential components necessary to construct an LLM-based emotion recognition system from its fundamental principles. Utilising the Meta-LLaMA3 8B Instruct model and the DailyDialog dataset, the study demonstrates that prompt engineering tailored to the purpose, vocabulary-compatible tokenisation strategies, logit-level output constraint mechanisms and structured output normalisation can enable accurate and interpretable emotion classification, even in environments with limited or no labels. The objective of this paper is to furnish a practical and adaptive resource on the construction of LLM infrastructures that are context-sensitive, resilient to class imbalances and suitable for flexible task-oriented applications.

Keywords

Project Number

This study was not supported by any specific research project.

References

  1. [1] A. Uçan, “TÜRKÇE HİS ANALİZİNDE OPTİMİZASYON VE ÖN-EĞİTİMLİ MODELLERİN KULLANIMI,” Hacettepe University, Ankara, Turkey, 2020.
  2. [2] E. Akçapınar Sezer et al., “Türkçe bilgisayarlı dil bilimi çalışmalarında his analizi,” tday, no. 70, pp. 193–210, Dec. 2020, doi: 10.32925/tday.2020.48.
  3. [3] B. Pang et al., “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP ’02, Not Known: Association for Computational Linguistics, 2002, pp. 79–86. doi: 10.3115/1118693.1118704.
  4. [4] J. Wiebe et al., “Annotating Expressions of Opinions and Emotions in Language,” Language Res Eval, vol. 39, no. 2–3, pp. 165–210, May 2005, doi: 10.1007/s10579-005-7880-9.
  5. [5] S. Aman et al., “Identifying Expressions of Emotion in Text,” in Text, Speech and Dialogue, vol. 4629, V. Matoušek and P. Mautner, Eds., in Lecture Notes in Computer Science, vol. 4629. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 196–205. doi: 10.1007/978-3-540-74628-7_27.
  6. [6] W. Medhat et al., “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, Dec. 2014, doi: 10.1016/j.asej.2014.04.011.
  7. [7] R. Li et al., “EmoMix: Building an Emotion Lexicon for Compound Emotion Analysis,” in Computational Science – ICCS 2019, vol. 11536, J. M. F. Rodrigues, P. J. S. Cardoso, J. Monteiro, R. Lam, V. V. Krzhizhanovskaya, M. H. Lees, J. J. Dongarra, and P. M. A. Sloot, Eds., in Lecture Notes in Computer Science, vol. 11536. , Cham: Springer International Publishing, 2019, pp. 353–368. doi: 10.1007/978-3-030-22734-0_26.
  8. [8] M. A. Tocoglu et al., “Emotion Analysis From Turkish Tweets Using Deep Neural Networks,” IEEE Access, vol. 7, pp. 183061–183069, 2019, doi: 10.1109/ACCESS.2019.2960113.

Details

Primary Language

English

Subjects

Natural Language Processing

Journal Section

Research Article

Publication Date

August 31, 2025

Submission Date

May 20, 2025

Acceptance Date

June 23, 2025

Published in Issue

Year 2025 Number: 013

APA
Altun, S. N., & Dörterler, M. (2025). A systematic testbed for evaluating emotion classification in large language models. Journal of Scientific Reports-B, 013, 1-19. https://izlik.org/JA98MA47ZB
AMA
1.Altun SN, Dörterler M. A systematic testbed for evaluating emotion classification in large language models. Journal of Scientific Reports-B. 2025;(013):1-19. https://izlik.org/JA98MA47ZB
Chicago
Altun, Seda Nur, and Murat Dörterler. 2025. “A Systematic Testbed for Evaluating Emotion Classification in Large Language Models”. Journal of Scientific Reports-B, nos. 013: 1-19. https://izlik.org/JA98MA47ZB.
EndNote
Altun SN, Dörterler M (August 1, 2025) A systematic testbed for evaluating emotion classification in large language models. Journal of Scientific Reports-B 013 1–19.
IEEE
[1]S. N. Altun and M. Dörterler, “A systematic testbed for evaluating emotion classification in large language models”, Journal of Scientific Reports-B, no. 013, pp. 1–19, Aug. 2025, [Online]. Available: https://izlik.org/JA98MA47ZB
ISNAD
Altun, Seda Nur - Dörterler, Murat. “A Systematic Testbed for Evaluating Emotion Classification in Large Language Models”. Journal of Scientific Reports-B. 013 (August 1, 2025): 1-19. https://izlik.org/JA98MA47ZB.
JAMA
1.Altun SN, Dörterler M. A systematic testbed for evaluating emotion classification in large language models. Journal of Scientific Reports-B. 2025;:1–19.
MLA
Altun, Seda Nur, and Murat Dörterler. “A Systematic Testbed for Evaluating Emotion Classification in Large Language Models”. Journal of Scientific Reports-B, no. 013, Aug. 2025, pp. 1-19, https://izlik.org/JA98MA47ZB.
Vancouver
1.Seda Nur Altun, Murat Dörterler. A systematic testbed for evaluating emotion classification in large language models. Journal of Scientific Reports-B [Internet]. 2025 Aug. 1;(013):1-19. Available from: https://izlik.org/JA98MA47ZB