TY  - JOUR
T1  - Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?
TT  - Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?
AU  - Özkurt, Ahmet Hamdi
AU  - Aydemir, Emrah
AU  - Sönmez, Yasin
PY  - 2025
DA  - June
Y2  - 2025
DO  - 10.17694/bajece.1652268
JF  - Balkan Journal of Electrical and Computer Engineering
PB  - MUSA YILMAZ
WT  - DergiPark
SN  - 2147-284X
SP  - 174
EP  - 182
VL  - 13
IS  - 2
LA  - tr
AB  - Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms. In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B (9-billion-parameter) model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs that provide satisfactory explanations are considered acceptable, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively. In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication.
KW  - Doğal Dil İşleme
KW  - Metin Sınıflandırma
KW  - Ekşi Sözlük
KW  - Gemma2 9B
N2  - With the expansion of internet access and the widespread adoption of smartphones, the intensity of social media platform usage has significantly increased. Among these platforms, Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms.In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs with satisfactory explanations were accepted, whereas in cases of insufficient or unsatisfactory explanations, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment.The classification results were evaluated using accuracy, precision, recall, and F1-score metrics. A detailed comparison was conducted, and the obtained results were recorded. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively.In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication. The limitations of this study include the examination of a single social media platform and a limited dataset. Future research may focus on comparative analyses of data collected from multiple social media platforms and the enhancement of the model using larger datasets.
CR  - [1]	AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
CR  - [2]	Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
CR  - [3]	Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
CR  - [4]	G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
CR  - [5]	C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB&#039;11) Bildirileri, 2011, pp. 51–58.
CR  - [6]	J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
CR  - [7]	M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
CR  - [8]	O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570
CR  - [9]	B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
CR  - [10]	 E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
CR  - [11]	 M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
CR  - [12]	 L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
CR  - [13]	 J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
CR  - [14]	 C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
CR  - [15]	 O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
CR  - [16]	 A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
CR  - [17]	S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What 
twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
CR  - [18]	K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.
UR  - https://doi.org/10.17694/bajece.1652268
L1  - https://dergipark.org.tr/tr/download/article-file/4665454
ER  -