Research Article
BibTex RIS Cite

Detecting misinformation on social networks with NLP

Year 2024, Volume: 1 Issue: 1, 11 - 16, 30.05.2024

Abstract

In this study, we conduct a data-driven study to detect misinformation in social media. Our aim is to apply natural language processing (NLP) techniques to detect fake news in Turkish. To this end, we have found a publicly accessible English dataset of fake news articles, consisting of 20,800 samples and translate it into Turkish. We have applied sentence-transformer models to vectorize our textual content. Then we simply applied Logistic Regression
algorithm for fake news detection with different inputs. Our observations indicate that the title of a news article holds greater significance than its content when it comes to the detection of fake news. However, enhanced detection performance can be attained through the combined utilization of both the title and content. Interestingly, our findings reveal that the removal of stopwords does not lead to improved accuracy. We also discuss that the more advanced
transformer-based approaches would offer superior performance, particularly in scenarios characterized by data drift. But we leave it for future work.

References

  • Çetin, U, Aslantaş, S, Gündoğmuş E, (2023). Challenges and Opportunities Related to Data Drift Problem in Sentiment, 8th International Conference on Computer Science and Engineering (UBMK), Burdur, Turkiye, pp. 86-90, doi: 10.1109/UBMK59864.2023.10286687.
  • Ihsan A., Nizam Bin Ayub M., Shivakumara P., Fazmidar Binti Mohd Noor N, (2022). Fake News Detection Techniques on Social Media: A Survey, Wireless Communications and Mobile Computing, vol. 2022, Article ID 6072084, 17 pages, https://doi.org/10.1155/2022/6072084.
  • Sufanpreet K., Sandeep, R, (2024). Comparative Analysis of Supervised and Unsupervised Machine Learning Algorithms for Fake News Detection: Performance, Efficiency, and Robustness.
  • Hu, L., Wei, S., Zhao, Z, Wu, B, (2022). Deep learning for fake news detection: A comprehensive survey. AI Open, 3, 133-155.
  • Hu, B., Mao, Z., Zhang, Y. (2024). An Overview of Fake News Detection: From A New Perspective. Fundamental Research.
  • Cetin, U, Gundogmus, YE, (2018). A Glimpse to Turkish Political Climate with Statistical Machine Learning. In 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 537-541.
  • Cetin, U. Gundoğmuş, YE, (2022). A Glimpse to the Digital Social Universe in the Times of War. In 30th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Nasir, JA, Khan, OS, Varlamis, I, (2021). Fake news detection: A hybrid CNN-RNN based Deep Learning Approach. International Journal of Information Management DataInsights,1(1),100007. https://doi.org/10.1016/j.jjimei.2020.100007.
  • Reimers, N., & Gurevych, I, (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  • Oliva, C, Palacio-Marín, I, Lago-Fernández, LF, & Arroyo, D, (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. Proceedings of the 17th International Conference on Availability, Reliability and Security. https://doi.org/10.1145/3538969.3543791.
  • Ozcelik, E, Ozcelik, NS, (2021). Comparison of Traditional Classifiers and BERTurk for Fake News Identification in Turkish. Journal of Computer Science and Engineering, 9(2), 164-174.
  • Cetin, U, Gundogmus, YE, (2019, September). Feature selection with evolving, fast and slow using two parallel genetic algorithms. In 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 699-703.
  • Mallick, A, Hsieh, K, Arzani, B, Joshi, G, (2022). Matchmaker: Data drift mitigation in machine learning for large-scale systems. Proceedings of Machine Learning and Systems, 4, 77-94.
  • Cetin, U, Tasgin, M, (2020). Anomaly detection with multivariate K-sigma score using Monte Carlo. In 2020 5th International Conference on Computer Science and Engineering (UBMK), pp. 94-98.
  • Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Polosukhin, I, (2017). Attention is all you need. Advances in neural information processing systems, 30.
  • Ghafoor A. et al, (2021). "The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing," in IEEE Access, vol. 9, pp. 124478-124490, doi: 10.1109/ACCESS.2021.3110285.
  • Ghag KV, Shah, K, (2015). "Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4), Indore, India, pp. 1-6, doi: 10.1109/IC4.2015.7375527.
  • Zhao, B, (2017). Web scraping. Encyclopedia of big data, 1.
Year 2024, Volume: 1 Issue: 1, 11 - 16, 30.05.2024

Abstract

References

  • Çetin, U, Aslantaş, S, Gündoğmuş E, (2023). Challenges and Opportunities Related to Data Drift Problem in Sentiment, 8th International Conference on Computer Science and Engineering (UBMK), Burdur, Turkiye, pp. 86-90, doi: 10.1109/UBMK59864.2023.10286687.
  • Ihsan A., Nizam Bin Ayub M., Shivakumara P., Fazmidar Binti Mohd Noor N, (2022). Fake News Detection Techniques on Social Media: A Survey, Wireless Communications and Mobile Computing, vol. 2022, Article ID 6072084, 17 pages, https://doi.org/10.1155/2022/6072084.
  • Sufanpreet K., Sandeep, R, (2024). Comparative Analysis of Supervised and Unsupervised Machine Learning Algorithms for Fake News Detection: Performance, Efficiency, and Robustness.
  • Hu, L., Wei, S., Zhao, Z, Wu, B, (2022). Deep learning for fake news detection: A comprehensive survey. AI Open, 3, 133-155.
  • Hu, B., Mao, Z., Zhang, Y. (2024). An Overview of Fake News Detection: From A New Perspective. Fundamental Research.
  • Cetin, U, Gundogmus, YE, (2018). A Glimpse to Turkish Political Climate with Statistical Machine Learning. In 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 537-541.
  • Cetin, U. Gundoğmuş, YE, (2022). A Glimpse to the Digital Social Universe in the Times of War. In 30th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Nasir, JA, Khan, OS, Varlamis, I, (2021). Fake news detection: A hybrid CNN-RNN based Deep Learning Approach. International Journal of Information Management DataInsights,1(1),100007. https://doi.org/10.1016/j.jjimei.2020.100007.
  • Reimers, N., & Gurevych, I, (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  • Oliva, C, Palacio-Marín, I, Lago-Fernández, LF, & Arroyo, D, (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. Proceedings of the 17th International Conference on Availability, Reliability and Security. https://doi.org/10.1145/3538969.3543791.
  • Ozcelik, E, Ozcelik, NS, (2021). Comparison of Traditional Classifiers and BERTurk for Fake News Identification in Turkish. Journal of Computer Science and Engineering, 9(2), 164-174.
  • Cetin, U, Gundogmus, YE, (2019, September). Feature selection with evolving, fast and slow using two parallel genetic algorithms. In 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 699-703.
  • Mallick, A, Hsieh, K, Arzani, B, Joshi, G, (2022). Matchmaker: Data drift mitigation in machine learning for large-scale systems. Proceedings of Machine Learning and Systems, 4, 77-94.
  • Cetin, U, Tasgin, M, (2020). Anomaly detection with multivariate K-sigma score using Monte Carlo. In 2020 5th International Conference on Computer Science and Engineering (UBMK), pp. 94-98.
  • Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Polosukhin, I, (2017). Attention is all you need. Advances in neural information processing systems, 30.
  • Ghafoor A. et al, (2021). "The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing," in IEEE Access, vol. 9, pp. 124478-124490, doi: 10.1109/ACCESS.2021.3110285.
  • Ghag KV, Shah, K, (2015). "Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4), Indore, India, pp. 1-6, doi: 10.1109/IC4.2015.7375527.
  • Zhao, B, (2017). Web scraping. Encyclopedia of big data, 1.
There are 18 citations in total.

Details

Primary Language English
Subjects Natural Language Processing
Journal Section Research Article
Authors

Masis Zovikoğlu This is me

Uzay Çetin

Publication Date May 30, 2024
Submission Date May 1, 2024
Acceptance Date May 16, 2024
Published in Issue Year 2024 Volume: 1 Issue: 1

Cite

APA Zovikoğlu, M., & Çetin, U. (2024). Detecting misinformation on social networks with NLP. Transactions on Computer Science and Applications, 1(1), 11-16.