Detection of COVID-19 Anti-Vaccination from Twitter Data Using Deep Learning and Feature Selection Approaches
Year 2024,
, 116 - 133, 12.06.2024
Serdar Ertem
,
Erdal Özbay
Abstract
The COVID-19 pandemic has evolved into a crisis significantly impacting health, the economy, and social life worldwide. During this crisis, anti-vaccination sentiment poses a considerable obstacle to controlling the epidemic and the effectiveness of vaccination campaigns. This study aimed to detect COVID-19 anti-vaccination sentiment from Twitter data using a combination of deep learning and feature selection approaches. The proposed method integrates a deep learning model with feature selection techniques to identify anti-vaccination sentiment by pinpointing important features in text data. Hybrid TF-IDF and N-gram methods were utilized for feature extraction, followed by Chi-square feature selection. The dataset comprises Twitter text data and two labels. The Synthetic Minority Oversampling Technique (SMOTE) was applied to balance the labels. Long Short-Term Memory (LSTM), a deep learning architecture, was employed for the classification process. The experimental results, obtained by leveraging the proposed feature extraction, feature selection, and LSTM methods, achieved the highest accuracy value of 99.23%. These findings demonstrate the proposed methods' success in effectively detecting COVID-19 anti-vaccination sentiment in text data. The study's results can offer valuable insights for developing health policies and public information strategies, presenting a new and powerful tool for detecting anti-vaccine sentiment in planning vaccination campaigns and public health interventions.
Ethical Statement
“There is no need for an ethics committee approval in the prepared article”
“There is no conflict of interest with any person/institution in the prepared article”
Supporting Institution
Fırat University (FUBAP)
Thanks
This study was funded by Fırat University (FUBAP) with the scientific research project number MF.23.37.
References
- C. H. van Werkhoven, A. W. Valk, B. Smagge, H. E. de Melker, M. J. Knol, S. J. Hahné and B. de GierEarly, “COVID-19 vaccine effectiveness of XBB. 1.5 vaccine against hospitalisation and admission to intensive care, the Netherlands”, Eurosurveillance, 29(1), 2300703, 9 October to 5 December 2023.
- P. Xu, D. A. Broniatowski and M. Dredze, “Twitter social mobility data reveal demographic variations in social distancing practices during the COVID-19 pandemic”, Scientific reports, vol. 14, no 1, pp. 1165, 2024.
- M. Umer, Z. Imtiaz, M. Ahmad, M. Nappi, C. Medaglia, G. S. Choi and A. Mehmood, “Impact of convolutional neural network and FastText embedding on text classification”, Multimedia Tools and Applications, vol. 82, no 4, pp. 5569-5585, 2023.
- K. R. S. N. Kariyapperuma, K. Banujan, P. M. A. K. Wijeratna and B. T. G. S. Kumara, “Classification of covid19 vaccine-related tweets using deep learning”, In 2022 International Conference on Data Analytics for Business and Industry (ICDABI), IEEE, pp. 1-5, October, 2022.
- Q. G. To, K. G. To, V. A. N. Huynh, N. T. Nguyen, D. T. Ngo, S. J. Alley and C. Vandelanotte, “Applying machine learning to identify anti-vaccination tweets during the covid-19 pandemic,”, International journal of environmental research and public health, vol. 18, no 8, pp. 4069, 2021.
- A. Mallik and S. Kumar, “Word2Vec and LSTM based deep learning technique for context-free fake news detection”, Multimedia Tools and Applications, vol. 83, no 1, pp. 919-940, 2024.
- M. Qorib, T. Oladunni, M. Denis, E. Ososanya and P. Cotae, “Covid-19 vaccine hesitancy: text mining, sentiment analysis and machine learning on covid-19 vaccination twitter dataset”, Expert Systems with Applications, vol. 212, pp. 118715, 2023.
- K. Hayawi, S. Shahriar, M. A. Serhani, I. Taleb and S. S. Mathew, “ANTi-Vax: a novel Twitter dataset for covid-19 vaccine misinformation detection”, Public health, vol. 203, pp. 23-30, 2022.
- I. Aygün, B. Kaya and M. Kaya, “Aspect based twitter sentiment analysis on vaccination and vaccine types in covid-19 pandemic with deep learning”, IEEE Journal of Biomedical and Health Informatics, vol. 26, no 5, pp. 2360-2369, 2021.
- Ö. Çelik and G. Kaplan, “Yeniden Örnekleme Teknikleri Kullanarak SMS Verisi Üzerinde Metin Sınıflandırma Çalışması”, Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 36, no 3, pp. 433-442, 2020.
- A. Avvaru, S. Vobilisetty and R. Mamidi, “Detecting sarcasm in conversation context using transformer-based models”, In Proceedings of the second workshop on figurative language processing, pp. 98-103, July, 2020.
- E. Özbay, “Transformatör-tabanlı evrişimli sinir ağı modeli kullanarak twıtter verisinde saldırganlık tespiti”, Konya Journal of Engineering Sciences, vol. 10, no 4, pp. 986-1001, 2022.
- S. A. Alex, N. Z. Jhanjhi, M. Humayun, A. O. Ibrahim and A. W. Abulfaraj, “Deep lstm model for diabetes prediction with class balancing by smote”, Electronics, vol. 11, no 17, pp. 2737, 2022.
- P. Bhatti, Z. Jalil and A. Majeed, “Email Classification using LSTM: A Deep Learning Technique”, In 2021 International Conference on Cyber Warfare and Security (ICCWS), IEEE, pp. 100-105, November, 2021.
- N. Azzahra, D. Murdiansyah and K. Lhaksmana, “Toxic comment classification on social media using support vector machine and chi square feature selection”, International Journal on Information and Communication Technology (IJoICT), vol. 7, no 1, pp. 64-76, 2021.
- M. Hussein and F. Özyurt, “A new technique for sentiment analysis system based on deep learning using Chi-Square feature selection methods”, Balkan Journal of Electrical and Computer Engineering, vol. 9, no 4, pp. 320-326, 2021.
- P. K. Roy, J. P. Singh and S. Banerjee, “Deep learning to filter sms spam. future generation computer systems”, vol. 102, pp. 524-533, 2020.
- Y. Zhang and Z. Rao, “n-bilstm: bilstm with n-gram features for text classification”, In 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), IEEE, pp. 1056-1059, June, 2020.
- M. I. Alfarizi, L. Syafaah and M. Lestandy, “Emotional text classification using tf-idf (term frequency-inverse document frequency) and lstm (long short-term memory)”, JUITA: Jurnal Informatika, vol. 10, no 2, pp. 225-232, 2022.
- F. A. Özbay and B. Alataş, “Çevrimiçi sosyal medyada sahte haber tespiti”, Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, vol. 11, no 1, pp. 91-103, 2020.
- A. Ciran and E. Özbay, “Optimization-based feature selection in deep learning methods for monkeypox skin lesion detection”, In 2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), IEEE, pp. 1-6, October, 2023.
- İ. Sel, C. Yeroğlu and D. Hanbay, “Feature selection by using heuristic methods for text classification”, In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) IEEE, pp. 1-6, September, 2019.
- X. Jin, A. Xu, R. Bie and P. Guo, “Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles”, In Data Mining for Biomedical Applications: PAKDD 2006 Workshop, BioDM 2006, Singapore, Springer Berlin Heidelberg, pp. 106-115, April 9, 2006.
- M. Yildirim, “Detection of COVID-19 fake news in online social networks with the developed CNN-LSTM based hybrid model”. Review of Computer Engineering Studies, vol. 9, no. 2, pp. 41-48, 2022.
- Y. Eroglu, M. Yildirim and A. Cinar, “Diagnosis of periventricular leukomalacia in children with artificial intelligence-based models developed using brain magnetic resonance images”, Signal, Image and Video Processing, vol. 17, no. 8, pp. 4543-4550, 2023.
- F. B. Demir, M. Baygin, I. Tuncer, P. D. Barua, S. Dogan, T. Tuncer and U. R. Acharya, “MNPDenseNet: automated monkeypox detection using multiple nested patch division and pretrained densenet201,”, Multimedia Tools and Applications, pp. 1-23, 2024 .
- I. O. Quintana, M. Cheong, M. Alfano, R. Reimann and C. Klein, “Automated clustering of covid-19 anti-vaccine discourse on twitter,”, arXiv preprint arXiv:2203.01549, 2022.
Derin Öğrenme ve Özellik Seçimi Yaklaşımları Kullanılarak Twitter Verilerinden COVID-19 Aşı Karşıtlığı Tespiti
Year 2024,
, 116 - 133, 12.06.2024
Serdar Ertem
,
Erdal Özbay
Abstract
COVID-19 pandemisi, dünya genelinde sağlık, ekonomi ve toplumsal yaşamı derinden etkileyen bir krize dönüşmüştür. Bu kriz sırasında aşı karşıtlığı, salgının kontrolü ve aşılama kampanyalarının etkinliği açısından önemli bir engel oluşturmaktadır. Bu çalışmada, derin öğrenme ve özellik seçimi yaklaşımlarının birleşimi kullanılarak COVID-19 aşı karşıtlığının twitter verilerinden tespiti amaçlanmıştır. Önerilen yöntem, derin öğrenme modeli ile özellik seçimi tekniklerinin entegrasyonunu içermekte ve metin verilerindeki önemli özellikleri belirleyerek aşı karşıtlığını tanımlamaktadır. Özellik çıkarımı için TF-IDF ve N-gram yöntemleri hibrit kullanılmış, ardından Ki-kare özellik seçimi gerçekleştirilmiştir. Veri seti twitter text verilerinden ve iki etiketten oluşmaktadır. Etiketlerin dengelenmesi için Sentetik Azınlık Aşırı Örnekleme Tekniği (SAAÖT) yöntemi uygulanmıştır. Sınıflandırma işlemi için derin öğrenme mimarilerinden Uzun Kısa-Süreli Bellek (UKSB) kullanılmıştır. Önerilen özellik çıkarımı, özellik seçimi ve UKSB yöntemlerinin birlikte kullanılmasıyla elde edilen deneysel sonuçlara göre %99.23 ile en yüksek doğruluk değerine ulaşılmıştır. Bu sonuçlar, COVID-19 aşı karşıtlığının metin verileri üzerinde etkili bir şekilde tespit edilmesi için önerilen yöntemlerin başarılı bir şekilde kullanılabileceğini göstermektedir. Çalışmanın sonuçları, sağlık politikalarının ve kamuoyu bilgilendirme stratejilerinin geliştirilmesine yönelik değerli bilgiler sunabilmektedir. Bu bakımdan, aşılama kampanyaları ve halk sağlığı müdahaleleri planlanırken, aşı karşıtlığı belirlemede yeni ve güçlü bir araç geliştirilmiştir.
References
- C. H. van Werkhoven, A. W. Valk, B. Smagge, H. E. de Melker, M. J. Knol, S. J. Hahné and B. de GierEarly, “COVID-19 vaccine effectiveness of XBB. 1.5 vaccine against hospitalisation and admission to intensive care, the Netherlands”, Eurosurveillance, 29(1), 2300703, 9 October to 5 December 2023.
- P. Xu, D. A. Broniatowski and M. Dredze, “Twitter social mobility data reveal demographic variations in social distancing practices during the COVID-19 pandemic”, Scientific reports, vol. 14, no 1, pp. 1165, 2024.
- M. Umer, Z. Imtiaz, M. Ahmad, M. Nappi, C. Medaglia, G. S. Choi and A. Mehmood, “Impact of convolutional neural network and FastText embedding on text classification”, Multimedia Tools and Applications, vol. 82, no 4, pp. 5569-5585, 2023.
- K. R. S. N. Kariyapperuma, K. Banujan, P. M. A. K. Wijeratna and B. T. G. S. Kumara, “Classification of covid19 vaccine-related tweets using deep learning”, In 2022 International Conference on Data Analytics for Business and Industry (ICDABI), IEEE, pp. 1-5, October, 2022.
- Q. G. To, K. G. To, V. A. N. Huynh, N. T. Nguyen, D. T. Ngo, S. J. Alley and C. Vandelanotte, “Applying machine learning to identify anti-vaccination tweets during the covid-19 pandemic,”, International journal of environmental research and public health, vol. 18, no 8, pp. 4069, 2021.
- A. Mallik and S. Kumar, “Word2Vec and LSTM based deep learning technique for context-free fake news detection”, Multimedia Tools and Applications, vol. 83, no 1, pp. 919-940, 2024.
- M. Qorib, T. Oladunni, M. Denis, E. Ososanya and P. Cotae, “Covid-19 vaccine hesitancy: text mining, sentiment analysis and machine learning on covid-19 vaccination twitter dataset”, Expert Systems with Applications, vol. 212, pp. 118715, 2023.
- K. Hayawi, S. Shahriar, M. A. Serhani, I. Taleb and S. S. Mathew, “ANTi-Vax: a novel Twitter dataset for covid-19 vaccine misinformation detection”, Public health, vol. 203, pp. 23-30, 2022.
- I. Aygün, B. Kaya and M. Kaya, “Aspect based twitter sentiment analysis on vaccination and vaccine types in covid-19 pandemic with deep learning”, IEEE Journal of Biomedical and Health Informatics, vol. 26, no 5, pp. 2360-2369, 2021.
- Ö. Çelik and G. Kaplan, “Yeniden Örnekleme Teknikleri Kullanarak SMS Verisi Üzerinde Metin Sınıflandırma Çalışması”, Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 36, no 3, pp. 433-442, 2020.
- A. Avvaru, S. Vobilisetty and R. Mamidi, “Detecting sarcasm in conversation context using transformer-based models”, In Proceedings of the second workshop on figurative language processing, pp. 98-103, July, 2020.
- E. Özbay, “Transformatör-tabanlı evrişimli sinir ağı modeli kullanarak twıtter verisinde saldırganlık tespiti”, Konya Journal of Engineering Sciences, vol. 10, no 4, pp. 986-1001, 2022.
- S. A. Alex, N. Z. Jhanjhi, M. Humayun, A. O. Ibrahim and A. W. Abulfaraj, “Deep lstm model for diabetes prediction with class balancing by smote”, Electronics, vol. 11, no 17, pp. 2737, 2022.
- P. Bhatti, Z. Jalil and A. Majeed, “Email Classification using LSTM: A Deep Learning Technique”, In 2021 International Conference on Cyber Warfare and Security (ICCWS), IEEE, pp. 100-105, November, 2021.
- N. Azzahra, D. Murdiansyah and K. Lhaksmana, “Toxic comment classification on social media using support vector machine and chi square feature selection”, International Journal on Information and Communication Technology (IJoICT), vol. 7, no 1, pp. 64-76, 2021.
- M. Hussein and F. Özyurt, “A new technique for sentiment analysis system based on deep learning using Chi-Square feature selection methods”, Balkan Journal of Electrical and Computer Engineering, vol. 9, no 4, pp. 320-326, 2021.
- P. K. Roy, J. P. Singh and S. Banerjee, “Deep learning to filter sms spam. future generation computer systems”, vol. 102, pp. 524-533, 2020.
- Y. Zhang and Z. Rao, “n-bilstm: bilstm with n-gram features for text classification”, In 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), IEEE, pp. 1056-1059, June, 2020.
- M. I. Alfarizi, L. Syafaah and M. Lestandy, “Emotional text classification using tf-idf (term frequency-inverse document frequency) and lstm (long short-term memory)”, JUITA: Jurnal Informatika, vol. 10, no 2, pp. 225-232, 2022.
- F. A. Özbay and B. Alataş, “Çevrimiçi sosyal medyada sahte haber tespiti”, Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, vol. 11, no 1, pp. 91-103, 2020.
- A. Ciran and E. Özbay, “Optimization-based feature selection in deep learning methods for monkeypox skin lesion detection”, In 2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), IEEE, pp. 1-6, October, 2023.
- İ. Sel, C. Yeroğlu and D. Hanbay, “Feature selection by using heuristic methods for text classification”, In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) IEEE, pp. 1-6, September, 2019.
- X. Jin, A. Xu, R. Bie and P. Guo, “Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles”, In Data Mining for Biomedical Applications: PAKDD 2006 Workshop, BioDM 2006, Singapore, Springer Berlin Heidelberg, pp. 106-115, April 9, 2006.
- M. Yildirim, “Detection of COVID-19 fake news in online social networks with the developed CNN-LSTM based hybrid model”. Review of Computer Engineering Studies, vol. 9, no. 2, pp. 41-48, 2022.
- Y. Eroglu, M. Yildirim and A. Cinar, “Diagnosis of periventricular leukomalacia in children with artificial intelligence-based models developed using brain magnetic resonance images”, Signal, Image and Video Processing, vol. 17, no. 8, pp. 4543-4550, 2023.
- F. B. Demir, M. Baygin, I. Tuncer, P. D. Barua, S. Dogan, T. Tuncer and U. R. Acharya, “MNPDenseNet: automated monkeypox detection using multiple nested patch division and pretrained densenet201,”, Multimedia Tools and Applications, pp. 1-23, 2024 .
- I. O. Quintana, M. Cheong, M. Alfano, R. Reimann and C. Klein, “Automated clustering of covid-19 anti-vaccine discourse on twitter,”, arXiv preprint arXiv:2203.01549, 2022.