In this study, we conduct a data-driven study to detect misinformation in social media. Our aim is to apply natural language processing (NLP) techniques to detect fake news in Turkish. To this end, we have found a publicly accessible English dataset of fake news articles, consisting of 20,800 samples and translate it into Turkish. We have applied sentence-transformer models to vectorize our textual content. Then we simply applied Logistic Regression
algorithm for fake news detection with different inputs. Our observations indicate that the title of a news article holds greater significance than its content when it comes to the detection of fake news. However, enhanced detection performance can be attained through the combined utilization of both the title and content. Interestingly, our findings reveal that the removal of stopwords does not lead to improved accuracy. We also discuss that the more advanced
transformer-based approaches would offer superior performance, particularly in scenarios characterized by data drift. But we leave it for future work.
Primary Language | English |
---|---|
Subjects | Natural Language Processing |
Journal Section | Research Article |
Authors | |
Publication Date | May 30, 2024 |
Submission Date | May 1, 2024 |
Acceptance Date | May 16, 2024 |
Published in Issue | Year 2024 Volume: 1 Issue: 1 |