Araştırma Makalesi
BibTex RIS Kaynak Göster

Detecting misinformation on social networks with NLP

Yıl 2024, Cilt: 1 Sayı: 1, 11 - 16, 30.05.2024

Öz

In this study, we conduct a data-driven study to detect misinformation in social media. Our aim is to apply natural language processing (NLP) techniques to detect fake news in Turkish. To this end, we have found a publicly accessible English dataset of fake news articles, consisting of 20,800 samples and translate it into Turkish. We have applied sentence-transformer models to vectorize our textual content. Then we simply applied Logistic Regression
algorithm for fake news detection with different inputs. Our observations indicate that the title of a news article holds greater significance than its content when it comes to the detection of fake news. However, enhanced detection performance can be attained through the combined utilization of both the title and content. Interestingly, our findings reveal that the removal of stopwords does not lead to improved accuracy. We also discuss that the more advanced
transformer-based approaches would offer superior performance, particularly in scenarios characterized by data drift. But we leave it for future work.

Kaynakça

  • Çetin, U, Aslantaş, S, Gündoğmuş E, (2023). Challenges and Opportunities Related to Data Drift Problem in Sentiment, 8th International Conference on Computer Science and Engineering (UBMK), Burdur, Turkiye, pp. 86-90, doi: 10.1109/UBMK59864.2023.10286687.
  • Ihsan A., Nizam Bin Ayub M., Shivakumara P., Fazmidar Binti Mohd Noor N, (2022). Fake News Detection Techniques on Social Media: A Survey, Wireless Communications and Mobile Computing, vol. 2022, Article ID 6072084, 17 pages, https://doi.org/10.1155/2022/6072084.
  • Sufanpreet K., Sandeep, R, (2024). Comparative Analysis of Supervised and Unsupervised Machine Learning Algorithms for Fake News Detection: Performance, Efficiency, and Robustness.
  • Hu, L., Wei, S., Zhao, Z, Wu, B, (2022). Deep learning for fake news detection: A comprehensive survey. AI Open, 3, 133-155.
  • Hu, B., Mao, Z., Zhang, Y. (2024). An Overview of Fake News Detection: From A New Perspective. Fundamental Research.
  • Cetin, U, Gundogmus, YE, (2018). A Glimpse to Turkish Political Climate with Statistical Machine Learning. In 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 537-541.
  • Cetin, U. Gundoğmuş, YE, (2022). A Glimpse to the Digital Social Universe in the Times of War. In 30th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Nasir, JA, Khan, OS, Varlamis, I, (2021). Fake news detection: A hybrid CNN-RNN based Deep Learning Approach. International Journal of Information Management DataInsights,1(1),100007. https://doi.org/10.1016/j.jjimei.2020.100007.
  • Reimers, N., & Gurevych, I, (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  • Oliva, C, Palacio-Marín, I, Lago-Fernández, LF, & Arroyo, D, (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. Proceedings of the 17th International Conference on Availability, Reliability and Security. https://doi.org/10.1145/3538969.3543791.
  • Ozcelik, E, Ozcelik, NS, (2021). Comparison of Traditional Classifiers and BERTurk for Fake News Identification in Turkish. Journal of Computer Science and Engineering, 9(2), 164-174.
  • Cetin, U, Gundogmus, YE, (2019, September). Feature selection with evolving, fast and slow using two parallel genetic algorithms. In 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 699-703.
  • Mallick, A, Hsieh, K, Arzani, B, Joshi, G, (2022). Matchmaker: Data drift mitigation in machine learning for large-scale systems. Proceedings of Machine Learning and Systems, 4, 77-94.
  • Cetin, U, Tasgin, M, (2020). Anomaly detection with multivariate K-sigma score using Monte Carlo. In 2020 5th International Conference on Computer Science and Engineering (UBMK), pp. 94-98.
  • Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Polosukhin, I, (2017). Attention is all you need. Advances in neural information processing systems, 30.
  • Ghafoor A. et al, (2021). "The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing," in IEEE Access, vol. 9, pp. 124478-124490, doi: 10.1109/ACCESS.2021.3110285.
  • Ghag KV, Shah, K, (2015). "Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4), Indore, India, pp. 1-6, doi: 10.1109/IC4.2015.7375527.
  • Zhao, B, (2017). Web scraping. Encyclopedia of big data, 1.
Yıl 2024, Cilt: 1 Sayı: 1, 11 - 16, 30.05.2024

Öz

Kaynakça

  • Çetin, U, Aslantaş, S, Gündoğmuş E, (2023). Challenges and Opportunities Related to Data Drift Problem in Sentiment, 8th International Conference on Computer Science and Engineering (UBMK), Burdur, Turkiye, pp. 86-90, doi: 10.1109/UBMK59864.2023.10286687.
  • Ihsan A., Nizam Bin Ayub M., Shivakumara P., Fazmidar Binti Mohd Noor N, (2022). Fake News Detection Techniques on Social Media: A Survey, Wireless Communications and Mobile Computing, vol. 2022, Article ID 6072084, 17 pages, https://doi.org/10.1155/2022/6072084.
  • Sufanpreet K., Sandeep, R, (2024). Comparative Analysis of Supervised and Unsupervised Machine Learning Algorithms for Fake News Detection: Performance, Efficiency, and Robustness.
  • Hu, L., Wei, S., Zhao, Z, Wu, B, (2022). Deep learning for fake news detection: A comprehensive survey. AI Open, 3, 133-155.
  • Hu, B., Mao, Z., Zhang, Y. (2024). An Overview of Fake News Detection: From A New Perspective. Fundamental Research.
  • Cetin, U, Gundogmus, YE, (2018). A Glimpse to Turkish Political Climate with Statistical Machine Learning. In 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 537-541.
  • Cetin, U. Gundoğmuş, YE, (2022). A Glimpse to the Digital Social Universe in the Times of War. In 30th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Nasir, JA, Khan, OS, Varlamis, I, (2021). Fake news detection: A hybrid CNN-RNN based Deep Learning Approach. International Journal of Information Management DataInsights,1(1),100007. https://doi.org/10.1016/j.jjimei.2020.100007.
  • Reimers, N., & Gurevych, I, (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  • Oliva, C, Palacio-Marín, I, Lago-Fernández, LF, & Arroyo, D, (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. Proceedings of the 17th International Conference on Availability, Reliability and Security. https://doi.org/10.1145/3538969.3543791.
  • Ozcelik, E, Ozcelik, NS, (2021). Comparison of Traditional Classifiers and BERTurk for Fake News Identification in Turkish. Journal of Computer Science and Engineering, 9(2), 164-174.
  • Cetin, U, Gundogmus, YE, (2019, September). Feature selection with evolving, fast and slow using two parallel genetic algorithms. In 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 699-703.
  • Mallick, A, Hsieh, K, Arzani, B, Joshi, G, (2022). Matchmaker: Data drift mitigation in machine learning for large-scale systems. Proceedings of Machine Learning and Systems, 4, 77-94.
  • Cetin, U, Tasgin, M, (2020). Anomaly detection with multivariate K-sigma score using Monte Carlo. In 2020 5th International Conference on Computer Science and Engineering (UBMK), pp. 94-98.
  • Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Polosukhin, I, (2017). Attention is all you need. Advances in neural information processing systems, 30.
  • Ghafoor A. et al, (2021). "The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing," in IEEE Access, vol. 9, pp. 124478-124490, doi: 10.1109/ACCESS.2021.3110285.
  • Ghag KV, Shah, K, (2015). "Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4), Indore, India, pp. 1-6, doi: 10.1109/IC4.2015.7375527.
  • Zhao, B, (2017). Web scraping. Encyclopedia of big data, 1.
Toplam 18 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Doğal Dil İşleme
Bölüm Research Article
Yazarlar

Masis Zovikoğlu Bu kişi benim

Uzay Çetin

Yayımlanma Tarihi 30 Mayıs 2024
Gönderilme Tarihi 1 Mayıs 2024
Kabul Tarihi 16 Mayıs 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 1 Sayı: 1

Kaynak Göster

APA Zovikoğlu, M., & Çetin, U. (2024). Detecting misinformation on social networks with NLP. Transactions on Computer Science and Applications, 1(1), 11-16.