Revealing the Reflections of the Pandemic by Investigating COVID-19 Related News Articles Using Machine Learning and Network Analysis

Ulya Bayram

doi:10.17671/gazibtd.949599

Araştırma Makalesi

Revealing the Reflections of the Pandemic by Investigating COVID-19 Related News Articles Using Machine Learning and Network Analysis

Yıl 2022, Cilt: 15 Sayı: 2, 209 - 220, 30.04.2022

Ulya Bayram

https://doi.org/10.17671/gazibtd.949599

Cited By: 2

Öz

Social media data can provide a general idea of people’s response towards the COVID-19 outbreak and its reflections, but it cannot be as objective as the news articles as a source of information. They are valuable sources of data for natural language processing research as they can reveal various paradigms about different phenomena related to the pandemic. This study uses a news collection spanning nine months from 2019 to 2020, containing COVID-19 related articles from various organizations around the world. The investigation conducted on the collection aims at revealing the repercussions of the pandemic at multiple levels. The first investigation discloses the most mentioned problems covered during the pandemic using statistics. Meanwhile, the second investigation utilizes machine learning to determine the most prevalent topics present within the articles to provide a better picture of the pandemic-induced issues. The results show that the economy was among the most prevalent problems. The third investigation constructs lexical networks from the articles, and reveals how every problem is related through nodes and weighted connections. The findings exhibit the need for more research using machine learning and natural language processing techniques on similar data collections to unveil the full repercussions of the pandemic.

Anahtar Kelimeler

LDA, BERT, machine learning, news articles, natural language processing, network analysis

Kaynakça

A. Khattar, P. R. Jain, S. M. K. Quadri, “Effects of the Disastrous Pandemic COVID-19 on Learning Styles, Activities and Mental Health of Young Indian Students - A Machine Learning Approach,” In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1190–1195, 2020.
M. Yang, C. Han, “Revealing Industry Challenge and Business Response to COVID-19: A Text Mining Approach,” International Journal of Contemporary Hospitality Management, 33(4), 1230-1248, 2021.
N. Tuna, A. Sebatlı Sağlam, F. Çavdur, “Covid-19 Salgını ile İlgili Paylaşımlar Üzerinde Veri Analizi”, Bilişim Teknolojileri Dergisi, 15(1), 13-23, 2022, doi:10.17671/gazibtd.928990.
Internet: AYLIEN Coronavirus News Dataset. http://info.aylien.com/coronavirus-dataset, 2020-08-09.
J. Jensen, S. Naidu, E. Kaplan, L. Wilse-Samson, D. Gergen, M. Zuckerman, A. Spirling, “Political Polarization and the Dynamics of Political Language: Evidence from 130 Years of Partisan Speech”, Brookings Papers on Economic Activity, 1–81, 2012.
U. Bayram, J. Pestian, D. Santel, A. A. Minai, “What’s in a Word? Detecting Partisan Affiliation from Word Use in Congressional Speeches”, In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8, 2019.
D. M. Blei, A. Y. Ng, M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, 3, 993–1022, 2003.
S. P. Borgatti, A. Mehra, D. J. Brass, G. Labianca, “Network Analysis in the Social Sciences,” Science, 323(5916), 892–895, 2009.
U. Bayram, A. A. Minai, J. Pestian, “A Lexical Network Approach for Identifying Suicidal Ideation in Clinical Interview Transcripts”, In International Conference on Complex Systems, 165–172, 2018.
P. Patwa, S. Sharma, S., S. Pykl, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das, T. Chakraborty, “Fighting an Infodemic: Covid-19 Fake News Dataset”, In International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer, Cham, 21-29, 2021.
R. Varma, Y. Verma, P. Vijayvargiya, P. P. Churi, “A Systematic Survey on Deep Learning and Machine Learning Approaches of Fake News Detection In The Pre-and Post-COVID-19 Pandemic”, International Journal of Intelligent Computing and Cybernetics.
N. L. Kolluri, D. Murthy, “CoVerifi: A COVID-19 News Verification System”, Online Social Networks and Media, 22, 100123, 2021.
M. Costola, M. Nofer, O. Hinz, L. Pelizzon, “Machine Learning Sentiment Analysis, COVID-19 News and Stock Market Reactions”, SAFE Working Paper, 288, 2020.
T. de Melo, C. M. Figueiredo, “Comparing News Articles and Tweets About COVID-19 In Brazil: Sentiment Analysis and Topic Modeling Approach”, JMIR Public Health and Surveillance, 7(2), e24585, 2021.
P. K. Bogović, A. Meštrović, S. Beliga, S. Martinčić-Ipšić, “Topic Modelling of Croatian News During COVID-19 Pandemic”, International Convention on Information, Communication and Electronic Technology (MIPRO), 1044-1051, IEEE, 2021.
Y. Li, P. Nair, Z. Wen, I. Chafi, A. Okhmatovskaia, G. Powell, Y. Shen, D. Buckeridge, “Global Surveillance of COVID-19 by Mining News Media Using a Multi-Source Dynamic Embedded Topic Model”, In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 1-14, 2020.
A. Gupta, R. Katarya, “PAN-LDA: A Latent Dirichlet Allocation Based Novel Feature Extraction Model for COVID-19 Data Using Machine Learning”, Computers in biology and medicine, 138, 104920, 2021.
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. Vander-Plas, A. Joly, B. Holt, G. Varoquaux, “API Design for Machine Learning Software: Experiences from the Scikit-Learn Project”, In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122, 2013.
A. Haghighi, L. Vanderwende, “Exploring Content Models for Multi-Document Summarization”, In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 362-370, 2009.
M. Hoffman, F. R. Bach, D. M. Blei, “Online Learning for Latent Dirichlet Allocation”, In Advances in Neural Information Processing Systems, 856–864, 2010.
M. D. Hoffman, D. M. Blei, C. Wang, J. Paisley, “Stochastic Variational Inference”, The Journal of Machine Learning Research, 14(1), 1303–1347, 2013.
K. Deng, P. K. Bol, K. J. Li, J. S. Liu, “On the Unsupervised Analysis of Domain-Specific Chinese Texts”, Proceedings of the National Academy of Sciences, 113(22), 6154–6159, 2016.
J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186, 2019.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, D., …, V. Stoyanov, “ROBERTA: A Robustly Optimized BERT Pretraining Approach”, arXiv preprint, arXiv:1907.11692, 2019.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut,. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”, arXiv preprint arXiv:1909.11942, 2019.
V. Sanh, L. Debut, J. Chaumond, T. Wolf, “DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter”, arXiv preprint arXiv:1910.01108, 2019.
M. Grootendorst, “BERTopic: Leveraging BERT and c-TF-IDF to Create Easily Interpretable Topics”, Zenodo, Version v0.9.4, 2020.
L. McInnes, J. Healy, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction”, ArXiv e-prints 1802.03426, 2018.
R. J. Campello, D. Moulavi, J. Sander, J. “Density-based clustering based on hierarchical density estimates”, In Pacific-Asia conference on knowledge discovery and data mining, Springer, Berlin, Heidelberg, 160-172, 2013.
U. Bayram, R. Roy, A. Assalil, L. BenHiba, “The Unknown Knowns: A Graph-Based Approach for Temporal COVID-19 Literature Mining”, Online Information Review, 45(4), 687–708, 2021.
G. Bouma, “Normalized (Pointwise) Mutual Information in Collocation Extraction”, Proceedings of GSCL, 31–40, 2009.
L. Nassif-Pires, L. L. Xavier, T. Masterson, M. Nikiforos, F. Rios-Avila, Pandemic of Inequality, Technical Report, Levy Economics Institute, 2020.
R. Rehurek, P. Sojka, “Software Framework for Topic Modelling with Large Corpora”, In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, 45-50, 2010.
M. E. J. Newman, “Modularity and Community Structure in Networks”, Proceedings of the National Academy of Sciences, 103(23), 8577–8582, 2006.
M. D. Humphries, K. Gurney, “Network ‘Small-World-Ness’: A Quantitative Method for Determining Canonical Network Equivalence”, PloS One, 3(4):e0002051, 2008.
Q. K. Telesford, K. E. Joyce, S. Hayasaka, J. H. Burdette, P. J. Laurienti, “The Ubiquity of Small-World Networks”, Brain Connectivity, 1(5), 367–375, 2011.
D. A. Schult, P. Swart, “Exploring Network Structure, Dynamics, and Function Using NetworkX,” In Proceedings of the 7th Python in Science Conferences (SciPy 2008), 11–16, Pasadena, CA, 2008.
Y. N. Kenett, O. Levy, D. Y. Kenett, H. E. Stanley, M. Faust, S. Havlin, “Flexibility of Thought in High Creative Individuals Represented by Percolation Analysis”, Proceedings of the National Academy of Sciences, 115(5), 867–872, 2018.
Internet: World Health Organization (WHO). Global experts of new WHO Council on the Economics of Health for All Announced. https://www.who.int/news/item/06-05-2021-global-experts-of-new-who-council-on-the-economics-of-health-for-all-announced, 2021-06-05.

Pandeminin Yansımalarını Ortaya Çıkarmak için COVID-19 ile İlgili Gazete Makalelerinin Makine Öğrenimi ve Ağ Analizi Yöntemleri ile İncelenmesi

Yıl 2022, Cilt: 15 Sayı: 2, 209 - 220, 30.04.2022

Ulya Bayram

https://doi.org/10.17671/gazibtd.949599

Cited By: 2

Öz

Sosyal medya platformlarından elde edilmiş veriler, insanların COVID-19 pandemisine karşı gösterdiği tepkiler hakkında bilgi verse de gazete makaleleri kadar objektif bir şekilde bilgi içeremezler. Pandemi sürecinde yayınlanmış makaleler genel halkın yaşanılan krizden nasıl etkilendiği hakkında bilgi vermekle birlikte, aynı zamanda siyasi ve daha farklı alanlardaki etkilerden de bahsederler. Bu makaleler, pandemiyle ilgili çok farklı paradigmaları içermeleri sebebiyle doğal dil işleme araştırmaları için faydalı veri kaynaklarıdır. Bu çalışmada, 2019 ve 2020 yıllarındaki COVID-19 ile ilgili uluslararası haber organizasyonları tarafından dokuz ay boyunca yayınlanmış gazete makaleleri koleksiyonunu kullanmaktadır. Bu koleksiyon üzerine üç kademeli bir inceleme çalışması uygulayarak pandeminin sebep olduğu sonuçları farklı derecelerde açığa çıkarmayı amaçlamaktadır. İlk çalışma, kelime istatistiklerini kullanarak pandemi sürecinde makalelerde en çok bahsedilen problemleri ortaya çıkarır. İkinci çalışma ise, makalelerden pandeminin sebep olduğu problemleri daha iyi ortaya çıkarmak için makine öğrenimi yöntemleriyle konu modelleme yapar. Sonuçlara göre en sık bahsedilen pandemi sebepli problemlerden biri ekonomik olanlardır. Üçüncü çalışma da gazete makalelerinden sözlüksel ağ oluşturarak düğüm ve ağırlıklı bağlantılar üzerinden pandemi sürecinde birçok problemin nasıl bağlantılı olduğunu gösterir. Buluntulara göre makine öğrenimi ve doğal dil işleme yöntemleri ile benzer veri setleri üzerinde pandeminin tüm etkilerinin daha çok araştırılması gerektiği görülmektedir.

Anahtar Kelimeler

LDA, BERT, BERT, makine öğrenimi, gazete makaleleri, doğal dil işleme, ağ analizi

Kaynakça

A. Khattar, P. R. Jain, S. M. K. Quadri, “Effects of the Disastrous Pandemic COVID-19 on Learning Styles, Activities and Mental Health of Young Indian Students - A Machine Learning Approach,” In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 1190–1195, 2020.
M. Yang, C. Han, “Revealing Industry Challenge and Business Response to COVID-19: A Text Mining Approach,” International Journal of Contemporary Hospitality Management, 33(4), 1230-1248, 2021.
N. Tuna, A. Sebatlı Sağlam, F. Çavdur, “Covid-19 Salgını ile İlgili Paylaşımlar Üzerinde Veri Analizi”, Bilişim Teknolojileri Dergisi, 15(1), 13-23, 2022, doi:10.17671/gazibtd.928990.
Internet: AYLIEN Coronavirus News Dataset. http://info.aylien.com/coronavirus-dataset, 2020-08-09.
J. Jensen, S. Naidu, E. Kaplan, L. Wilse-Samson, D. Gergen, M. Zuckerman, A. Spirling, “Political Polarization and the Dynamics of Political Language: Evidence from 130 Years of Partisan Speech”, Brookings Papers on Economic Activity, 1–81, 2012.
U. Bayram, J. Pestian, D. Santel, A. A. Minai, “What’s in a Word? Detecting Partisan Affiliation from Word Use in Congressional Speeches”, In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8, 2019.
D. M. Blei, A. Y. Ng, M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, 3, 993–1022, 2003.
S. P. Borgatti, A. Mehra, D. J. Brass, G. Labianca, “Network Analysis in the Social Sciences,” Science, 323(5916), 892–895, 2009.
U. Bayram, A. A. Minai, J. Pestian, “A Lexical Network Approach for Identifying Suicidal Ideation in Clinical Interview Transcripts”, In International Conference on Complex Systems, 165–172, 2018.
P. Patwa, S. Sharma, S., S. Pykl, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das, T. Chakraborty, “Fighting an Infodemic: Covid-19 Fake News Dataset”, In International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer, Cham, 21-29, 2021.
R. Varma, Y. Verma, P. Vijayvargiya, P. P. Churi, “A Systematic Survey on Deep Learning and Machine Learning Approaches of Fake News Detection In The Pre-and Post-COVID-19 Pandemic”, International Journal of Intelligent Computing and Cybernetics.
N. L. Kolluri, D. Murthy, “CoVerifi: A COVID-19 News Verification System”, Online Social Networks and Media, 22, 100123, 2021.
M. Costola, M. Nofer, O. Hinz, L. Pelizzon, “Machine Learning Sentiment Analysis, COVID-19 News and Stock Market Reactions”, SAFE Working Paper, 288, 2020.
T. de Melo, C. M. Figueiredo, “Comparing News Articles and Tweets About COVID-19 In Brazil: Sentiment Analysis and Topic Modeling Approach”, JMIR Public Health and Surveillance, 7(2), e24585, 2021.
P. K. Bogović, A. Meštrović, S. Beliga, S. Martinčić-Ipšić, “Topic Modelling of Croatian News During COVID-19 Pandemic”, International Convention on Information, Communication and Electronic Technology (MIPRO), 1044-1051, IEEE, 2021.
Y. Li, P. Nair, Z. Wen, I. Chafi, A. Okhmatovskaia, G. Powell, Y. Shen, D. Buckeridge, “Global Surveillance of COVID-19 by Mining News Media Using a Multi-Source Dynamic Embedded Topic Model”, In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 1-14, 2020.
A. Gupta, R. Katarya, “PAN-LDA: A Latent Dirichlet Allocation Based Novel Feature Extraction Model for COVID-19 Data Using Machine Learning”, Computers in biology and medicine, 138, 104920, 2021.
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. Vander-Plas, A. Joly, B. Holt, G. Varoquaux, “API Design for Machine Learning Software: Experiences from the Scikit-Learn Project”, In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122, 2013.
A. Haghighi, L. Vanderwende, “Exploring Content Models for Multi-Document Summarization”, In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 362-370, 2009.
M. Hoffman, F. R. Bach, D. M. Blei, “Online Learning for Latent Dirichlet Allocation”, In Advances in Neural Information Processing Systems, 856–864, 2010.
M. D. Hoffman, D. M. Blei, C. Wang, J. Paisley, “Stochastic Variational Inference”, The Journal of Machine Learning Research, 14(1), 1303–1347, 2013.
K. Deng, P. K. Bol, K. J. Li, J. S. Liu, “On the Unsupervised Analysis of Domain-Specific Chinese Texts”, Proceedings of the National Academy of Sciences, 113(22), 6154–6159, 2016.
J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186, 2019.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, D., …, V. Stoyanov, “ROBERTA: A Robustly Optimized BERT Pretraining Approach”, arXiv preprint, arXiv:1907.11692, 2019.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut,. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”, arXiv preprint arXiv:1909.11942, 2019.
V. Sanh, L. Debut, J. Chaumond, T. Wolf, “DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter”, arXiv preprint arXiv:1910.01108, 2019.
M. Grootendorst, “BERTopic: Leveraging BERT and c-TF-IDF to Create Easily Interpretable Topics”, Zenodo, Version v0.9.4, 2020.
L. McInnes, J. Healy, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction”, ArXiv e-prints 1802.03426, 2018.
R. J. Campello, D. Moulavi, J. Sander, J. “Density-based clustering based on hierarchical density estimates”, In Pacific-Asia conference on knowledge discovery and data mining, Springer, Berlin, Heidelberg, 160-172, 2013.
U. Bayram, R. Roy, A. Assalil, L. BenHiba, “The Unknown Knowns: A Graph-Based Approach for Temporal COVID-19 Literature Mining”, Online Information Review, 45(4), 687–708, 2021.
G. Bouma, “Normalized (Pointwise) Mutual Information in Collocation Extraction”, Proceedings of GSCL, 31–40, 2009.
L. Nassif-Pires, L. L. Xavier, T. Masterson, M. Nikiforos, F. Rios-Avila, Pandemic of Inequality, Technical Report, Levy Economics Institute, 2020.
R. Rehurek, P. Sojka, “Software Framework for Topic Modelling with Large Corpora”, In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, 45-50, 2010.
M. E. J. Newman, “Modularity and Community Structure in Networks”, Proceedings of the National Academy of Sciences, 103(23), 8577–8582, 2006.
M. D. Humphries, K. Gurney, “Network ‘Small-World-Ness’: A Quantitative Method for Determining Canonical Network Equivalence”, PloS One, 3(4):e0002051, 2008.
Q. K. Telesford, K. E. Joyce, S. Hayasaka, J. H. Burdette, P. J. Laurienti, “The Ubiquity of Small-World Networks”, Brain Connectivity, 1(5), 367–375, 2011.
D. A. Schult, P. Swart, “Exploring Network Structure, Dynamics, and Function Using NetworkX,” In Proceedings of the 7th Python in Science Conferences (SciPy 2008), 11–16, Pasadena, CA, 2008.
Y. N. Kenett, O. Levy, D. Y. Kenett, H. E. Stanley, M. Faust, S. Havlin, “Flexibility of Thought in High Creative Individuals Represented by Percolation Analysis”, Proceedings of the National Academy of Sciences, 115(5), 867–872, 2018.
Internet: World Health Organization (WHO). Global experts of new WHO Council on the Economics of Health for All Announced. https://www.who.int/news/item/06-05-2021-global-experts-of-new-who-council-on-the-economics-of-health-for-all-announced, 2021-06-05.

Toplam 39 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Makaleler
Yazarlar	Ulya Bayram 0000-0002-8150-4053
Yayımlanma Tarihi	30 Nisan 2022
Gönderilme Tarihi	11 Haziran 2021
Yayımlandığı Sayı	Yıl 2022 Cilt: 15 Sayı: 2

Kaynak Göster

APA	Bayram, U. (2022). Revealing the Reflections of the Pandemic by Investigating COVID-19 Related News Articles Using Machine Learning and Network Analysis. Bilişim Teknolojileri Dergisi, 15(2), 209-220. https://doi.org/10.17671/gazibtd.949599

Cited By

Use of Internet of Things in Detection of COVID-19: A Review with Bibliometric Analysis

Bilişim Teknolojileri Dergisi

https://doi.org/10.17671/gazibtd.1111392

Classification of Customer Complaints Using BERTopic Topic Modelling Technique

İzmir Sosyal Bilimler Dergisi

https://doi.org/10.47899/ijss.1167719

Kapak Resmi İndir

Makale Dosyaları

Tam Metin