In light of recent advances in online journalism, the diversity, abundance, and accessibility of news have increased exponentially. However, the growth of online journalism also brings issues, especially regarding the reliability of the news. Notably, news widely shared on social media during the US presidential election campaign and the UK Brexit referendum caused millions of reactions from the public. This concerning scenario prompted industry and academia to address the pressing issue of fake news. Detecting fake news is a meticulous, time-consuming, and labor-intensive task that requires expert judgment. To mitigate this challenge, this study proposes a linguistic based model for Turkish fake news detection. In this dataset was collected from TRT's RSS service and through web scraping from the Teyit.org platform. It contains news titles and summaries related to significant events in Türkiye between 2015 and 2023. The research compares classical machine learning classifiers including SVM, Logistic Regression, Random Forest, k-NN, Decision Tree, and Naive Bayes, against a neural based sequential learning model such as LSTM using real world datasets. Furthermore, the research investigates the impacts of different word representation techniques, including TF-IDF and CountVectorizer, and also hyperparameter optimization on the classification results. The findings revealed that using hyperparameter tuning, the TF-IDF method yielded the highest accuracy rate of 93.12% on the SVM model and that TF-IDF is more effective.
Ethics committee approval was not required for this study because of there was no study on animals or humans.
This research is based on a master's thesis.
In light of recent advances in online journalism, the diversity, abundance, and accessibility of news have increased exponentially. However, the growth of online journalism also brings issues, especially regarding the reliability of the news. Notably, news widely shared on social media during the US presidential election campaign and the UK Brexit referendum caused millions of reactions from the public. This concerning scenario prompted industry and academia to address the pressing issue of fake news. Detecting fake news is a meticulous, time-consuming, and labor-intensive task that requires expert judgment. To mitigate this challenge, this study proposes a linguistic based model for Turkish fake news detection. In this dataset was collected from TRT's RSS service and through web scraping from the Teyit.org platform. It contains news titles and summaries related to significant events in Türkiye between 2015 and 2023. The research compares classical machine learning classifiers including SVM, Logistic Regression, Random Forest, k-NN, Decision Tree, and Naive Bayes, against a neural based sequential learning model such as LSTM using real world datasets. Furthermore, the research investigates the impacts of different word representation techniques, including TF-IDF and CountVectorizer, and also hyperparameter optimization on the classification results. The findings revealed that using hyperparameter tuning, the TF-IDF method yielded the highest accuracy rate of 93.12% on the SVM model and that TF-IDF is more effective.
Ethics committee approval was not required for this study because of there was no study on animals or humans.
This research is based on a master's thesis.
Primary Language | English |
---|---|
Subjects | Decision Support and Group Support Systems |
Journal Section | Research Articles |
Authors | |
Publication Date | January 15, 2025 |
Submission Date | August 7, 2024 |
Acceptance Date | December 19, 2024 |
Published in Issue | Year 2025 |