Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Ahmet Hamdi Özkurt; Emrah Aydemir; Yasin Sönmez

doi:10.17694/bajece.1652268

TR EN

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Abstract

Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms. In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B (9-billion-parameter) model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs that provide satisfactory explanations are considered acceptable, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively. In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication.

Keywords

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Abstract

With the expansion of internet access and the widespread adoption of smartphones, the intensity of social media platform usage has significantly increased. Among these platforms, Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms. In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs with satisfactory explanations were accepted, whereas in cases of insufficient or unsatisfactory explanations, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment. The classification results were evaluated using accuracy, precision, recall, and F1-score metrics. A detailed comparison was conducted, and the obtained results were recorded. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively. In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication. The limitations of this study include the examination of a single social media platform and a limited dataset. Future research may focus on comparative analyses of data collected from multiple social media platforms and the enhancement of the model using larger datasets.

Keywords

References

[1] AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
[2] Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
[3] Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
[4] G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
[5] C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB'11) Bildirileri, 2011, pp. 51–58.
[6] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
[7] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
[8] O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570

[9] B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
[10]  E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
[11]  M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
[12]  L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
[13]  J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
[14]  C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
[15]  O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
[16]  A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
[17] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
[18] K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.

Details

Primary Language

Turkish

Subjects

Computer Software

Journal Section

Research Article

Authors

Ahmet Hamdi Özkurt ^*
0009-0008-3220-4143
Türkiye

Emrah Aydemir
0000-0002-8380-7891
Türkiye

Yasin Sönmez
0000-0003-0710-0867
Türkiye

Early Pub Date

July 11, 2025

Publication Date

June 30, 2025

Submission Date

March 5, 2025

Acceptance Date

April 14, 2025

Published in Issue

Year 2025 Volume: 13 Number: 2

DOI

https://doi.org/10.17694/bajece.1652268

IZ

https://izlik.org/JA78ZU34SW

Cite

RIS / Bibtex

APA

Özkurt, A. H., Aydemir, E., & Sönmez, Y. (2025). Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering, 13(2), 174-182. https://doi.org/10.17694/bajece.1652268

AMA

1.Özkurt AH, Aydemir E, Sönmez Y. Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering. 2025;13(2):174-182. doi:10.17694/bajece.1652268

Chicago

Özkurt, Ahmet Hamdi, Emrah Aydemir, and Yasin Sönmez. 2025. “Large Language Models Vs. Human Interpretation: Which Is More Accurate in Text Classification?”. Balkan Journal of Electrical and Computer Engineering 13 (2): 174-82. https://doi.org/10.17694/bajece.1652268.

EndNote

Özkurt AH, Aydemir E, Sönmez Y (June 1, 2025) Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering 13 2 174–182.

IEEE

[1]A. H. Özkurt, E. Aydemir, and Y. Sönmez, “Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?”, Balkan Journal of Electrical and Computer Engineering, vol. 13, no. 2, pp. 174–182, June 2025, doi: 10.17694/bajece.1652268.

ISNAD

Özkurt, Ahmet Hamdi - Aydemir, Emrah - Sönmez, Yasin. “Large Language Models Vs. Human Interpretation: Which Is More Accurate in Text Classification?”. Balkan Journal of Electrical and Computer Engineering 13/2 (June 1, 2025): 174-182. https://doi.org/10.17694/bajece.1652268.

JAMA

1.Özkurt AH, Aydemir E, Sönmez Y. Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering. 2025;13:174–182.

MLA

Özkurt, Ahmet Hamdi, et al. “Large Language Models Vs. Human Interpretation: Which Is More Accurate in Text Classification?”. Balkan Journal of Electrical and Computer Engineering, vol. 13, no. 2, June 2025, pp. 174-82, doi:10.17694/bajece.1652268.

Vancouver

1.Ahmet Hamdi Özkurt, Emrah Aydemir, Yasin Sönmez. Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering. 2025 Jun. 1;13(2):174-82. doi:10.17694/bajece.1652268