In this study, we conduct a data-driven study to detect misinformation in social media. Our aim is to apply natural language processing (NLP) techniques to detect fake news in Turkish. To this end, we have found a publicly accessible English dataset of fake news articles, consisting of 20,800 samples and translate it into Turkish. We have applied sentence-transformer models to vectorize our textual content. Then we simply applied Logistic Regression
algorithm for fake news detection with different inputs. Our observations indicate that the title of a news article holds greater significance than its content when it comes to the detection of fake news. However, enhanced detection performance can be attained through the combined utilization of both the title and content. Interestingly, our findings reveal that the removal of stopwords does not lead to improved accuracy. We also discuss that the more advanced
transformer-based approaches would offer superior performance, particularly in scenarios characterized by data drift. But we leave it for future work.
Machine Learning NLP Data Mining Text Classification Fake News Detection
Birincil Dil | İngilizce |
---|---|
Konular | Doğal Dil İşleme |
Bölüm | Research Article |
Yazarlar | |
Yayımlanma Tarihi | 30 Mayıs 2024 |
Gönderilme Tarihi | 1 Mayıs 2024 |
Kabul Tarihi | 16 Mayıs 2024 |
Yayımlandığı Sayı | Yıl 2024 Cilt: 1 Sayı: 1 |