Detecting Fake News on Big Data
Year 2021,
Volume: 01 Issue: 02, 1 - 5, 31.12.2021
Begüm Subaşı
,
Hilalnur Beral
,
Nilüfer Güleç
,
Tansel Dökeroğlu
Abstract
In this study, we developed a new framework for detecting fake news, which has recently become a significant problem in social media. We compared the performances of different machine learning approaches. It becomes a challenging problem to detect fake news effectively. Apache Spark’s machine learning environment, where many processors can work simultaneously, offers a very suitable environment for dealing with big data classification problems. After experiments using Naïve Bayes, Neural Network, Logistic regression, and Support Vector Machine on large datasets we obtained on Kaggle showed that our software can report up to 99% accuracy rates.
References
- [1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
- [2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
- [3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
- [4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
- [5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
- [6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
- [7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
- [8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
- [9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
- [10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
- [11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
- [12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
- [13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
- [14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
- [15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
- [16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.
Büyük Veri Üzerindeki Sahte Haberlerin Tespit Edilmesi
Year 2021,
Volume: 01 Issue: 02, 1 - 5, 31.12.2021
Begüm Subaşı
,
Hilalnur Beral
,
Nilüfer Güleç
,
Tansel Dökeroğlu
Abstract
Bu çalışmada, son zamanlarda sosyal medyada önemli bir sorun haline gelen yalan haberlerin tespiti için yeni bir çerçeve geliştirilmiştir. Farklı makine öğrenimi yaklaşımlarının performanslarını karşılaştırılmıştır. Sahte haberleri etkili bir şekilde tespit etmek zorlu bir sorun haline gelmektedir. Birçok işlemcinin aynı anda çalışabildiği Apache Spark makine öğrenme ortamı, büyük veri sınıflandırma problemlerinin üstesinden gelmek için oldukça uygun bir ortam sunmaktadır. Naive Bayes, Yapay Sinir Ağı, Lojistik regresyon ve Destek Vektör Makinesi kullanılarak Kaggle'da elde ettiğimiz büyük veri kümeleri üzerinde yapılan deneylerden sonra, yazılımımızın %99'a varan doğruluk oranları rapor edebildiğini göstermiştir.
References
- [1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
- [2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
- [3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
- [4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
- [5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
- [6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
- [7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
- [8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
- [9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
- [10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
- [11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
- [12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
- [13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
- [14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
- [15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
- [16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.