PIDMUS: A Pipeline for Identifying Disaster Tweets on Twitter with Multilingual Support Using TF-IDF and Ensemble Learning Model
Öz
In emergencies, X (formerly Twitter) has emerged as an essential and fast information-sharing tool. A mechanism based on TF-IDF (Term Frequency-Inverse Document Frequency), WordNet, an ensemble (stacking) learning model, and multilingual support is proposed to classify disaster-related tweets obtained from the X platform. Moreover, this proposed model is evaluated on eight popular classifiers for classifying textual data. An ensemble of MLP, MNB, CNB, and SVC yields a good overall model with 81.36% accuracy and an AUC of 0.80. The proposed model can predict whether a tweet is related to a disaster by using an unlabeled dataset of tweets. It can classify tweets in multiple languages due to its multilingual support. The model uses Turkish, a low-resource language, to classify disaster-related tweets. Current research shows the application of this model in English and Turkish. This research uses two publicly available datasets of disaster-related tweets. The proposed model achieved good performance metrics on both labeled and unlabeled datasets. The model can be utilized by disaster relief authorities to filter out actual disaster-related tweets automatically and efficiently from many other fake and “clickbait tweets”.
Anahtar Kelimeler
Kaynakça
- [1] Karimiziarani, Mohammadsepehr, et al. "Hazard risk awareness and disaster management: Extracting the information content of twitter data." Sustainable Cities and Society 77 (2022): 103577. https://doi.org/10.1016/j.scs.2021.103577
- [2] DiCarlo, M.F.; Berglund, E.Z. Connected communities improve hazard response: An agent-based model of social media behaviors during hurricanes. Sustain. Cities Soc. 2021, 69, 102836. https://doi.org/10.1016/j.scs.2021.102836
- [3] Roy, P.K.; Kumar, A.; Singh, J.P.; Dwivedi, Y.K.; Rana, N.P.; Raman, R. Disaster related social media content processing for sustainable cities. Sustain. Cities Soc. 2021, 75, 103363. https://doi.org/10.1016/j.scs.2021.103363
- [4] Son, J.; Lee, H.K.; Jin, S.; Lee, J. Content features of tweets for effective communication during disasters: A media synchronicity theory perspective. Int. J. Inf. Manag. 2019, 45, 56–68. https://doi.org/10.1016/j.ijinfomgt.2018.10.012
- [5] Zhai, W.; Peng, Z.R.; Yuan, F. Examine the effects of neighborhood equity on disaster situational awareness: Harness machine learning and geotagged Twitter data. Int. J. Disaster Risk Reduct. 2021, 48, 101611. https://doi.org/10.1016/j.ijdrr.2020.101611
- [6] Karimiziarani, M.; Jafarzadegan, K.; Abbaszadeh, P.; Shao, W.; Moradkhani, H. Hazard risk awareness and disaster management: Extracting the information content of twitter data. Sustain. Cities Soc. 2022, 77, 103577. https://doi.org/10.1016/j.scs.2021.103577
- [7] Balakrishnan, V.; Shi, Z.; Law, C.L.; Lim, R.; Teh, L.L.; Fan, Y.; Periasamy, J. A Comprehensive Analysis of Transformer-Deep Neural Network Models in Twitter Disaster Detection. Mathematics 2022, 10, 4664. https://doi.org/10.3390/math10244664
- [8] Robertson, B.W.; Johnson, M.; Murthy, D.; Smith, W.R.; Stephes, K.K. Using a combination of human insights and ‘deep learning’ for real-time disaster communication. Prog. Disaster Sci. 2019, 2, 100030. https://doi.org/10.1016/j.pdisas.2019.100030
Ayrıntılar
Birincil Dil
İngilizce
Konular
Bilgi Sistemleri Geliştirme Metodolojileri ve Uygulamaları
Bölüm
Araştırma Makalesi
Yazarlar
Kubilay Ayturan
*
0000-0001-9406-4694
Türkiye
Firat Hardalac
0000-0003-1358-0756
Türkiye
Haad Akmal
0000-0003-3666-3888
Türkiye
Erdem Yanar
0009-0008-0764-098X
Türkiye
Muhammet Koçak
0000-0001-6387-0765
Türkiye
Erken Görünüm Tarihi
11 Haziran 2026
Yayımlanma Tarihi
-
Gönderilme Tarihi
1 Mayıs 2026
Kabul Tarihi
1 Haziran 2026
Yayımlandığı Sayı
Yıl 2026 Sayı: Advanced Online Publication
