Research Article
BibTex RIS Cite

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Year 2025, Volume: 13 Issue: 2, 174 - 182, 30.06.2025
https://doi.org/10.17694/bajece.1652268

Abstract

Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms. In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B (9-billion-parameter) model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs that provide satisfactory explanations are considered acceptable, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively. In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication.

References

  • [1] AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
  • [2] Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
  • [3] Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
  • [4] G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
  • [5] C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB'11) Bildirileri, 2011, pp. 51–58.
  • [6] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
  • [7] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
  • [8] O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570
  • [9] B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
  • [10]  E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
  • [11]  M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
  • [12]  L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
  • [13]  J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
  • [14]  C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
  • [15]  O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
  • [16]  A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  • [17] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
  • [18] K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Year 2025, Volume: 13 Issue: 2, 174 - 182, 30.06.2025
https://doi.org/10.17694/bajece.1652268

Abstract

With the expansion of internet access and the widespread adoption of smartphones, the intensity of social media platform usage has significantly increased. Among these platforms, Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms.
In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs with satisfactory explanations were accepted, whereas in cases of insufficient or unsatisfactory explanations, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment.
The classification results were evaluated using accuracy, precision, recall, and F1-score metrics. A detailed comparison was conducted, and the obtained results were recorded. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively.
In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication. The limitations of this study include the examination of a single social media platform and a limited dataset. Future research may focus on comparative analyses of data collected from multiple social media platforms and the enhancement of the model using larger datasets.

References

  • [1] AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
  • [2] Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
  • [3] Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
  • [4] G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
  • [5] C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB'11) Bildirileri, 2011, pp. 51–58.
  • [6] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
  • [7] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
  • [8] O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570
  • [9] B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
  • [10]  E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
  • [11]  M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
  • [12]  L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
  • [13]  J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
  • [14]  C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
  • [15]  O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
  • [16]  A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  • [17] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
  • [18] K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.
There are 18 citations in total.

Details

Primary Language Turkish
Subjects Computer Software
Journal Section Araştırma Articlessi
Authors

Ahmet Hamdi Özkurt 0009-0008-3220-4143

Emrah Aydemir 0000-0002-8380-7891

Yasin Sönmez 0000-0003-0710-0867

Early Pub Date July 11, 2025
Publication Date June 30, 2025
Submission Date March 5, 2025
Acceptance Date April 14, 2025
Published in Issue Year 2025 Volume: 13 Issue: 2

Cite

APA Özkurt, A. H., Aydemir, E., & Sönmez, Y. (2025). Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering, 13(2), 174-182. https://doi.org/10.17694/bajece.1652268

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.Creative Commons Lisansı