Research Article
BibTex RIS Cite

Detecting Fake News on Big Data

Year 2021, Volume: 01 Issue: 02, 1 - 5, 31.12.2021

Abstract

In this study, we developed a new framework for detecting fake news, which has recently become a significant problem in social media. We compared the performances of different machine learning approaches. It becomes a challenging problem to detect fake news effectively. Apache Spark’s machine learning environment, where many processors can work simultaneously, offers a very suitable environment for dealing with big data classification problems. After experiments using Naïve Bayes, Neural Network, Logistic regression, and Support Vector Machine on large datasets we obtained on Kaggle showed that our software can report up to 99% accuracy rates.

References

  • [1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
  • [2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
  • [3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
  • [4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
  • [5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
  • [6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
  • [7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
  • [8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
  • [9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
  • [10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
  • [11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
  • [12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
  • [13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
  • [14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
  • [15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
  • [16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.

Büyük Veri Üzerindeki Sahte Haberlerin Tespit Edilmesi

Year 2021, Volume: 01 Issue: 02, 1 - 5, 31.12.2021

Abstract

Bu çalışmada, son zamanlarda sosyal medyada önemli bir sorun haline gelen yalan haberlerin tespiti için yeni bir çerçeve geliştirilmiştir. Farklı makine öğrenimi yaklaşımlarının performanslarını karşılaştırılmıştır. Sahte haberleri etkili bir şekilde tespit etmek zorlu bir sorun haline gelmektedir. Birçok işlemcinin aynı anda çalışabildiği Apache Spark makine öğrenme ortamı, büyük veri sınıflandırma problemlerinin üstesinden gelmek için oldukça uygun bir ortam sunmaktadır. Naive Bayes, Yapay Sinir Ağı, Lojistik regresyon ve Destek Vektör Makinesi kullanılarak Kaggle'da elde ettiğimiz büyük veri kümeleri üzerinde yapılan deneylerden sonra, yazılımımızın %99'a varan doğruluk oranları rapor edebildiğini göstermiştir.

References

  • [1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
  • [2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
  • [3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
  • [4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
  • [5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
  • [6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
  • [7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
  • [8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
  • [9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
  • [10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
  • [11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
  • [12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
  • [13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
  • [14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
  • [15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
  • [16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.
There are 16 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Reviews
Authors

Begüm Subaşı 0000-0003-1665-5928

Hilalnur Beral 0000-0003-1665-5928

Nilüfer Güleç 0000-0003-1665-5928

Tansel Dökeroğlu 0000-0003-1665-5928

Publication Date December 31, 2021
Published in Issue Year 2021 Volume: 01 Issue: 02

Cite

IEEE B. Subaşı, H. Beral, N. Güleç, and T. Dökeroğlu, “Detecting Fake News on Big Data”, Researcher, vol. 01, no. 02, pp. 1–5, 2021.

The journal "Researcher: Social Sciences Studies" (RSSS), which started its publication life in 2013, continues its activities under the name of "Researcher" as of August 2020, under Ankara Bilim University.
It is an internationally indexed, nationally refereed, scientific and electronic journal that publishes original research articles aiming to contribute to the fields of Engineering and Science in 2021 and beyond.
The journal is published twice a year, except for special issues.
Candidate articles submitted for publication in the journal can be written in Turkish and English. Articles submitted to the journal must not have been previously published in another journal or sent to another journal for publication.