Research Article
BibTex RIS Cite

Sentiment analysis of coronavirus data with ensemble and machine learning methods

Year 2024, , 175 - 185, 30.04.2024
https://doi.org/10.31127/tuje.1352481

Abstract

The coronavirus pandemic has distanced people from social life and increased the use of social media. People's emotions can be determined with text data collected from social media applications. This is used in many fields, especially in commerce. This study aims to predict people's sentiments about the pandemic by applying sentiment analysis to Twitter tweets about the pandemic using single machine learning classifiers (Decision Tree-DT, K-Nearest Neighbor-KNN, Logistic Regression-LR, Naïve Bayes-NB, Random Forest-RF) and ensemble learning methods (Majority Voting (MV), Probabilistic Voting (PV), and Stacking (STCK)). After vectorizing the tweets using two predictive methods, Word2Vec (W2V) and Doc2Vec, and two traditional word representation methods, Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BOW), classification models built using single machine learning classifiers were compared to models built using ensemble learning methods (MV, PV and STCK) by heterogeneously combining single machine classifier algorithms. Accuracy (ACC), F-measure (F), precision (P), and recall (R) were used as performance measures, with training/test separation rates of 70%-30% and 80%-20%, respectively. Among these models, the ACC of ensemble learning models ranged from 89% to 73%, while the ACC of single classifier models ranged from 60% to 80%. Among the ensemble learning methods, STCK with Doc2Vec text representation/embedding method gave the best ACC result of 89%. According to the experimental results, ensemble models built with heterogeneous machine learning classifier algorithms gave better results than single machine learning classifier algorithms.

Ethical Statement

This study was not conducted on any animals or humans.

Supporting Institution

None

Project Number

None

Thanks

There is no organization providing any financial support within the scope of the study

References

  • Cauberghe, V., Van Wesenbeeck, I., De Jans, S., Hudders, L., & Ponnet, K. (2021). How adolescents use social media to cope with feelings of loneliness and anxiety during COVID-19 lockdown. Cyberpsychology, Behavior, and Social Networking, 24(4), 250-257. https://doi.org/10.1089/cyber.2020.0478
  • Vernikou, S., Lyras, A., & Kanavos, A. (2022). Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Computing and Applications, 34(22), 19615-19627. https://doi.org/10.1007/s00521-022-07650-2
  • Antonio, V. D., Efendi, S., & Mawengkang, H. (2022). Sentiment analysis for Covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent. International Journal of Nonlinear Analysis and Applications, 13(1), 1367-1373. https://doi.org/10.22075/IJNAA.2021.5735
  • Machuca, C. R., Gallardo, C., & Toasa, R. M. (2021). Twitter sentiment analysis on coronavirus: Machine learning approach. In Journal of Physics: Conference Series, 1828(1), 012104. https://doi.org/10.1088/1742-6596/1828/1/012104
  • Barkur, G., & Kamath, G. B. (2020). Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian Journal of Psychiatry, 51, 102089. https://doi.org/10.1016/j.ajp.2020.102089
  • Isnain, A. R., Marga, N. S., & Alita, D. (2021). Sentiment analysis of government policy on corona case using Naive Bayes Algorithm. Indonesian Journal of Computing and Cybernetics Systems, 15(1), 55-64. https://doi.org/10.22146/ijccs.60718
  • Siddiqua, U. A., Ahsan, T., & Chy, A. N. (2016). Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog. In 2016 19th International Conference on Computer and Information Technology, 304-309. https://doi.org/10.1109/ICCITECHN.2016.7860214
  • Mahendrajaya, R., Buntoro, G. A., & Setyawan, M. B. (2019). Analisis Sentimen Pengguna Gopay Menggunakan Metode Lexicon Based Dan Support Vector Machine. Komputek, 3 (2), 52.
  • Rahman, M. M., & Islam, M. N. (2022). Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 tweets. In Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021, 383-396. https://doi.org/10.1007/978-981-16-5157-1_30
  • Bania, R. K. (2020). COVID-19 public tweets sentiment analysis using TF-IDF and inductive learning models. INFOCOMP Journal of Computer Science, 19(2), 23-41.
  • Antonio, V. D. (2021). Performance analysis of TF-IDF feature extraction for stochastic gradient descent classification algorithm on sentiment analysis of Indonesian texts. [Doctoral Dissertation, Universitas Sumatera Utara].
  • Amalia, C., & Sibaroni, Y. (2020). Analisis sentimen data tweet menggunakan model jaringan saraf tiruan dengan pembobotan delta tf-idf. eProceedings of Engineering, 7(2), 7810-7820.
  • Ly, D., & Saad Abdul Malik, T. (2021). How can a module for sentiment analysis be designed to classify tweets about covid19. [Student thesis, University of Borås].
  • Bhardwaj, M., Mishra, P., Badhani, S., & Muttoo, S. K. (2023). Sentiment analysis and topic modeling of COVID-19 tweets of India. International Journal of System Assurance Engineering and Management, 1-21. https://doi.org/10.1007/s13198-023-02082-0
  • AlZoubi, O., Shatnawi, F., Rawashdeh, S., Yassein, M. B., & Hmeidi, I. (2022). Detecting COVID-19 Implication on Education and Economic in Arab World Using Sentiment Analysis Techniques of Twitter Data. In 2022 13th International Conference on Information and Communication Systems, 352-357. https://doi.org/10.1109/ICICS55353.2022.9811166
  • Miglani, A. (2020). Coronavirus tweets nlp-text classification.https://www.kaggle.com/datatattle/covid-19-nlp-textclassification
  • Huanling, T., Hui, Z., Hongmin, W., Han, Z., Xueli, M., Mingyu, L., & Jin, G. (2023). Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model. Chinese Journal of Electronics, 32(3), 647-654. https://doi.org/10.23919/cje.2021.00.113
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/10.1016/j.procs.2021.12.187
  • Dündar, A., & Kakışım, A. (2021). Kıyafet Öneri Sistemi için Giyim Metaverilerine dayalı Temsil Öğrenimi. Avrupa Bilim ve Teknoloji Dergisi, (29), 105-110. https://doi.org/10.31590/ejosat.1008736
  • Başarslan, M. S., & Kayaalp, F. (2019). Performance analysis of fuzzy rough set-based and correlation-based attribute selection methods on detection of chronic kidney disease with various classifiers. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science, 1-5. https://doi.org/10.1109/EBBT.2019.8741688
  • Turgut, Z., & Kakisim, A. G. (2024). An explainable hybrid deep learning architecture for WiFi-based indoor localization in Internet of Things environment. Future Generation Computer Systems, 151, 196-213. https://doi.org/10.1016/j.future.2023.10.003
  • Basarslan, M. S., Bakir, H., & Yücedağ, İ. (2019). Fuzzy logic and correlation-based hybrid classification on hepatitis disease data set. The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 787-800. https://doi.org/10.1007/978-3-030-36178-5_68
  • Rahardi, M., Aminuddin, A., Abdulloh, F. F., & Nugroho, R. A. (2022). Sentiment analysis of Covid-19 vaccination using support vector machine in Indonesia. International Journal of Advanced Computer Science and Applications, 13(6), 534-539.
  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
  • Kakisim, A. G. (2022). Enhancing attributed network embedding via enriched attribute representations. Applied Intelligence, 52(2), 1566-1580. https://doi.org/10.1007/s10489-021-02498-w
  • Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2), 757-774. https://doi.org/10.1016/j.jksuci.2023.01.014
  • Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. https://doi.org/10.1002/cae.22179
  • Kakisim, A. G., Turgut, Z., & Atmaca, T. (2023). XAI empowered dual band Wi-Fi based indoor localization via ensemble learning. In 2023 14th International Conference on Network of the Future (NoF), 150-158. https://doi.org/10.1109/NoF58724.2023.10302788
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  • Öztürk, T., Turgut, Z., Akgün, G., & Köse, C. (2022). Machine learning-based intrusion detection for SCADA systems in healthcare. Network Modeling Analysis in Health Informatics and Bioinformatics, 11, 47. https://doi.org/10.1007/s13721-022-00390-2
  • Kayaalp, F., Basarslan, M. S., & Polat, K. (2018). A hybrid classification example in describing chronic kidney disease. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), 1-4. https://doi.org/10.1109/EBBT.2018.8391444
Year 2024, , 175 - 185, 30.04.2024
https://doi.org/10.31127/tuje.1352481

Abstract

Project Number

None

References

  • Cauberghe, V., Van Wesenbeeck, I., De Jans, S., Hudders, L., & Ponnet, K. (2021). How adolescents use social media to cope with feelings of loneliness and anxiety during COVID-19 lockdown. Cyberpsychology, Behavior, and Social Networking, 24(4), 250-257. https://doi.org/10.1089/cyber.2020.0478
  • Vernikou, S., Lyras, A., & Kanavos, A. (2022). Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Computing and Applications, 34(22), 19615-19627. https://doi.org/10.1007/s00521-022-07650-2
  • Antonio, V. D., Efendi, S., & Mawengkang, H. (2022). Sentiment analysis for Covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent. International Journal of Nonlinear Analysis and Applications, 13(1), 1367-1373. https://doi.org/10.22075/IJNAA.2021.5735
  • Machuca, C. R., Gallardo, C., & Toasa, R. M. (2021). Twitter sentiment analysis on coronavirus: Machine learning approach. In Journal of Physics: Conference Series, 1828(1), 012104. https://doi.org/10.1088/1742-6596/1828/1/012104
  • Barkur, G., & Kamath, G. B. (2020). Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian Journal of Psychiatry, 51, 102089. https://doi.org/10.1016/j.ajp.2020.102089
  • Isnain, A. R., Marga, N. S., & Alita, D. (2021). Sentiment analysis of government policy on corona case using Naive Bayes Algorithm. Indonesian Journal of Computing and Cybernetics Systems, 15(1), 55-64. https://doi.org/10.22146/ijccs.60718
  • Siddiqua, U. A., Ahsan, T., & Chy, A. N. (2016). Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog. In 2016 19th International Conference on Computer and Information Technology, 304-309. https://doi.org/10.1109/ICCITECHN.2016.7860214
  • Mahendrajaya, R., Buntoro, G. A., & Setyawan, M. B. (2019). Analisis Sentimen Pengguna Gopay Menggunakan Metode Lexicon Based Dan Support Vector Machine. Komputek, 3 (2), 52.
  • Rahman, M. M., & Islam, M. N. (2022). Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 tweets. In Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021, 383-396. https://doi.org/10.1007/978-981-16-5157-1_30
  • Bania, R. K. (2020). COVID-19 public tweets sentiment analysis using TF-IDF and inductive learning models. INFOCOMP Journal of Computer Science, 19(2), 23-41.
  • Antonio, V. D. (2021). Performance analysis of TF-IDF feature extraction for stochastic gradient descent classification algorithm on sentiment analysis of Indonesian texts. [Doctoral Dissertation, Universitas Sumatera Utara].
  • Amalia, C., & Sibaroni, Y. (2020). Analisis sentimen data tweet menggunakan model jaringan saraf tiruan dengan pembobotan delta tf-idf. eProceedings of Engineering, 7(2), 7810-7820.
  • Ly, D., & Saad Abdul Malik, T. (2021). How can a module for sentiment analysis be designed to classify tweets about covid19. [Student thesis, University of Borås].
  • Bhardwaj, M., Mishra, P., Badhani, S., & Muttoo, S. K. (2023). Sentiment analysis and topic modeling of COVID-19 tweets of India. International Journal of System Assurance Engineering and Management, 1-21. https://doi.org/10.1007/s13198-023-02082-0
  • AlZoubi, O., Shatnawi, F., Rawashdeh, S., Yassein, M. B., & Hmeidi, I. (2022). Detecting COVID-19 Implication on Education and Economic in Arab World Using Sentiment Analysis Techniques of Twitter Data. In 2022 13th International Conference on Information and Communication Systems, 352-357. https://doi.org/10.1109/ICICS55353.2022.9811166
  • Miglani, A. (2020). Coronavirus tweets nlp-text classification.https://www.kaggle.com/datatattle/covid-19-nlp-textclassification
  • Huanling, T., Hui, Z., Hongmin, W., Han, Z., Xueli, M., Mingyu, L., & Jin, G. (2023). Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model. Chinese Journal of Electronics, 32(3), 647-654. https://doi.org/10.23919/cje.2021.00.113
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/10.1016/j.procs.2021.12.187
  • Dündar, A., & Kakışım, A. (2021). Kıyafet Öneri Sistemi için Giyim Metaverilerine dayalı Temsil Öğrenimi. Avrupa Bilim ve Teknoloji Dergisi, (29), 105-110. https://doi.org/10.31590/ejosat.1008736
  • Başarslan, M. S., & Kayaalp, F. (2019). Performance analysis of fuzzy rough set-based and correlation-based attribute selection methods on detection of chronic kidney disease with various classifiers. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science, 1-5. https://doi.org/10.1109/EBBT.2019.8741688
  • Turgut, Z., & Kakisim, A. G. (2024). An explainable hybrid deep learning architecture for WiFi-based indoor localization in Internet of Things environment. Future Generation Computer Systems, 151, 196-213. https://doi.org/10.1016/j.future.2023.10.003
  • Basarslan, M. S., Bakir, H., & Yücedağ, İ. (2019). Fuzzy logic and correlation-based hybrid classification on hepatitis disease data set. The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 787-800. https://doi.org/10.1007/978-3-030-36178-5_68
  • Rahardi, M., Aminuddin, A., Abdulloh, F. F., & Nugroho, R. A. (2022). Sentiment analysis of Covid-19 vaccination using support vector machine in Indonesia. International Journal of Advanced Computer Science and Applications, 13(6), 534-539.
  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
  • Kakisim, A. G. (2022). Enhancing attributed network embedding via enriched attribute representations. Applied Intelligence, 52(2), 1566-1580. https://doi.org/10.1007/s10489-021-02498-w
  • Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2), 757-774. https://doi.org/10.1016/j.jksuci.2023.01.014
  • Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. https://doi.org/10.1002/cae.22179
  • Kakisim, A. G., Turgut, Z., & Atmaca, T. (2023). XAI empowered dual band Wi-Fi based indoor localization via ensemble learning. In 2023 14th International Conference on Network of the Future (NoF), 150-158. https://doi.org/10.1109/NoF58724.2023.10302788
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  • Öztürk, T., Turgut, Z., Akgün, G., & Köse, C. (2022). Machine learning-based intrusion detection for SCADA systems in healthcare. Network Modeling Analysis in Health Informatics and Bioinformatics, 11, 47. https://doi.org/10.1007/s13721-022-00390-2
  • Kayaalp, F., Basarslan, M. S., & Polat, K. (2018). A hybrid classification example in describing chronic kidney disease. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), 1-4. https://doi.org/10.1109/EBBT.2018.8391444
There are 33 citations in total.

Details

Primary Language English
Subjects Communications Engineering (Other)
Journal Section Articles
Authors

Muhammet Sinan Başarslan 0000-0002-7996-9169

Fatih Kayaalp 0000-0002-8752-3335

Project Number None
Early Pub Date April 7, 2024
Publication Date April 30, 2024
Published in Issue Year 2024

Cite

APA Başarslan, M. S., & Kayaalp, F. (2024). Sentiment analysis of coronavirus data with ensemble and machine learning methods. Turkish Journal of Engineering, 8(2), 175-185. https://doi.org/10.31127/tuje.1352481
AMA Başarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. April 2024;8(2):175-185. doi:10.31127/tuje.1352481
Chicago Başarslan, Muhammet Sinan, and Fatih Kayaalp. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering 8, no. 2 (April 2024): 175-85. https://doi.org/10.31127/tuje.1352481.
EndNote Başarslan MS, Kayaalp F (April 1, 2024) Sentiment analysis of coronavirus data with ensemble and machine learning methods. Turkish Journal of Engineering 8 2 175–185.
IEEE M. S. Başarslan and F. Kayaalp, “Sentiment analysis of coronavirus data with ensemble and machine learning methods”, TUJE, vol. 8, no. 2, pp. 175–185, 2024, doi: 10.31127/tuje.1352481.
ISNAD Başarslan, Muhammet Sinan - Kayaalp, Fatih. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering 8/2 (April 2024), 175-185. https://doi.org/10.31127/tuje.1352481.
JAMA Başarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. 2024;8:175–185.
MLA Başarslan, Muhammet Sinan and Fatih Kayaalp. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering, vol. 8, no. 2, 2024, pp. 175-8, doi:10.31127/tuje.1352481.
Vancouver Başarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. 2024;8(2):175-8.
Flag Counter