Araştırma Makalesi
BibTex RIS Kaynak Göster

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Yıl 2025, Cilt: 13 Sayı: 2, 174 - 182, 30.06.2025
https://doi.org/10.17694/bajece.1652268

Öz

Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms. In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B (9-billion-parameter) model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs that provide satisfactory explanations are considered acceptable, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively. In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication.

Kaynakça

  • [1] AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
  • [2] Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
  • [3] Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
  • [4] G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
  • [5] C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB'11) Bildirileri, 2011, pp. 51–58.
  • [6] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
  • [7] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
  • [8] O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570
  • [9] B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
  • [10]  E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
  • [11]  M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
  • [12]  L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
  • [13]  J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
  • [14]  C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
  • [15]  O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
  • [16]  A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  • [17] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
  • [18] K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.

Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification?

Yıl 2025, Cilt: 13 Sayı: 2, 174 - 182, 30.06.2025
https://doi.org/10.17694/bajece.1652268

Öz

With the expansion of internet access and the widespread adoption of smartphones, the intensity of social media platform usage has significantly increased. Among these platforms, Ekşi Sözlük is a widely used social network where numerous unusual events are discussed. In this context, it serves as a real-time news source for emergency response teams and digital news platforms.
In this study, a dataset was compiled from comments shared on the Ekşi Sözlük platform regarding the Kahramanmaraş earthquake on February 6, 2023. These comments were classified into four categories: Source-Based Information, Emotional Reaction, Social Inference, and Personal Experience using the Gemma2 9B model, developed by Google with advanced natural language processing capabilities. A dataset of 500 comments in Excel format was analyzed, comparing the model outputs with human evaluations to assess classification accuracy. For this purpose, four evaluation columns were created for each comment based on category classification. The consistency between model-assigned categories and manually determined categories was examined using these columns. In cases where inconsistencies were detected, the model-generated explanations were subjected to qualitative evaluation. Model outputs with satisfactory explanations were accepted, whereas in cases of insufficient or unsatisfactory explanations, the manually classified category was assigned as the final evaluation. This process systematically resolved inconsistencies between model and human assessments, ensuring the final and validated category assignments for each comment.
The classification results were evaluated using accuracy, precision, recall, and F1-score metrics. A detailed comparison was conducted, and the obtained results were recorded. The highest accuracy values were observed for Social Inference (0.99), Source-Based Information (0.98), Personal Experience (0.88), and Emotional Reaction (0.83), respectively.
In conclusion, this study presents a methodology for improving model performance through human supervision, contributing to the development of strategies for disaster management and crisis communication. The limitations of this study include the examination of a single social media platform and a limited dataset. Future research may focus on comparative analyses of data collected from multiple social media platforms and the enhancement of the model using larger datasets.

Kaynakça

  • [1] AFAD, “06 Şubat 2023 Pazarcık-Elbistan Kahramanmaraş (Mw 7.7; Mw 7.6) depremleri raporu,” Deprem ve Risk Azaltma Genel Müdürlüğü, 2023.
  • [2] Y. Argüden and B. Erşahin, Veri madenciliği: Veriden bilgiye, masraftan değere, ARGE Danışmanlık Yayınları, 2008.
  • [3] Z. Bakan and F. Kanbay, “Makine öğrenmesi yöntemleri ile eğitim başarısına etki eden faktörlerin modellenmesi,” İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol. 23, no. 45, pp. 27–41, 2024. [Online]. Available: https://doi.org/10.55071/ticaretfbd.1442084
  • [4] G. Burel and H. Alani, “Crisis event extraction service (CREES)—Automatic detection and classification of crisis-related content on social media,” in Proc. 15th Int. Conf. Inf. Syst. Crisis Response and Manage., 2018.
  • [5] C. Coşkun and A. Baykal, “Veri madenciliğinde sınıflandırma algoritmalarının bir örnek üzerinde karşılaştırılması,” in Akademik Bilişim Konferansı (AB'11) Bildirileri, 2011, pp. 51–58.
  • [6] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2019.
  • [7] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR: Artificial intelligence for disaster response,” in Proc. 22nd Int. Conf. World Wide Web, 2015, pp. 159–162.
  • [8] O. H. Kwon et al., “Sentiment analysis of the United States public support of nuclear power on social media using large language models,” Renewable and Sustainable Energy Reviews, vol. 200, 114570, 2024. [Online]. Available: https://doi.org/10.1016/j.rser.2024.114570
  • [9] B. R. Lindsay, “Social media and disasters: Recent United States experiences,” J. Contingencies Crisis Manage., vol. 19, no. 1, pp. 1–7, 2011. [Online]. Available: https://doi.org/10.1111/j.1468-5973.2011.00639.x
  • [10]  E. L. McDaniel, S. Scheele, and J. Liu, “Zero-shot classification of crisis tweets using instruction-finetuned large language models,” in 2024 IEEE Int. Humanitarian Technol. Conf. (IHTC), Nov. 2024, pp. 1–7.
  • [11]  M. Özkan and G. Kar, “Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, vol. 10, no. 2, pp. 504–519, 2022. [Online]. Available: https://doi.org/10.21923/jesd.973181
  • [12]  L. Palen and S. B. Liu, “Citizen communications in crisis: Anticipating a future of ICT-supported public participation,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2007, pp. 727–736.
  • [13]  J. Pereira, R. Lotufo, and R. Nogueira, “Large language models in summarizing social media for emergency management,” arXiv preprint arXiv:2401.03158, 2024.
  • [14]  C. Reuter and M. A. Kaufhold, “Fifteen years of social media in emergencies: A retrospective review and future directions for crisis informatics,” J. Contingencies Crisis Manage., vol. 26, no. 1, pp. 41–57, 2018. [Online]. Available: https://doi.org/10.1111/1468-5973.12196
  • [15]  O. Sevli and N. Kemaloğlu, “Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak Google BERT modeli ile sınıflandırılması,” Veri Bilimi, vol. 4, no. 1, pp. 31–37, 2021.
  • [16]  A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  • [17] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: What twitter may contribute to situational awareness,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088.
  • [18] K. Yang et al., “MentaLLaMA: Interpretable mental health analysis on social media with large language models,” in Proc. ACM Web Conf. 2024, May 2024, pp. 4489–4500.
Toplam 18 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgisayar Yazılımı
Bölüm Araştırma Makalesi
Yazarlar

Ahmet Hamdi Özkurt 0009-0008-3220-4143

Emrah Aydemir 0000-0002-8380-7891

Yasin Sönmez 0000-0003-0710-0867

Erken Görünüm Tarihi 11 Temmuz 2025
Yayımlanma Tarihi 30 Haziran 2025
Gönderilme Tarihi 5 Mart 2025
Kabul Tarihi 14 Nisan 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 13 Sayı: 2

Kaynak Göster

APA Özkurt, A. H., Aydemir, E., & Sönmez, Y. (2025). Large Language Models vs. Human Interpretation: Which is More Accurate in Text Classification? Balkan Journal of Electrical and Computer Engineering, 13(2), 174-182. https://doi.org/10.17694/bajece.1652268

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.Creative Commons Lisans