TY - JOUR T1 - Detecting Fake News on Big Data TT - Büyük Veri Üzerindeki Sahte Haberlerin Tespit Edilmesi AU - Dökeroğlu, Tansel AU - Subaşı, Begüm AU - Beral, Hilalnur AU - Güleç, Nilüfer PY - 2021 DA - December JF - Researcher JO - Researcher PB - Ankara Bilim Üniversitesi WT - DergiPark SN - 2717-9494 SP - 1 EP - 5 VL - 01 IS - 02 LA - en AB - In this study, we developed a new framework for detecting fake news, which has recently become a significant problem in social media. We compared the performances of different machine learning approaches. It becomes a challenging problem to detect fake news effectively. Apache Spark’s machine learning environment, where many processors can work simultaneously, offers a very suitable environment for dealing with big data classification problems. After experiments using Naïve Bayes, Neural Network, Logistic regression, and Support Vector Machine on large datasets we obtained on Kaggle showed that our software can report up to 99% accuracy rates. KW - Fake news KW - Machine learning KW - Big data KW - classification N2 - Bu çalışmada, son zamanlarda sosyal medyada önemli bir sorun haline gelen yalan haberlerin tespiti için yeni bir çerçeve geliştirilmiştir. Farklı makine öğrenimi yaklaşımlarının performanslarını karşılaştırılmıştır. Sahte haberleri etkili bir şekilde tespit etmek zorlu bir sorun haline gelmektedir. Birçok işlemcinin aynı anda çalışabildiği Apache Spark makine öğrenme ortamı, büyük veri sınıflandırma problemlerinin üstesinden gelmek için oldukça uygun bir ortam sunmaktadır. Naive Bayes, Yapay Sinir Ağı, Lojistik regresyon ve Destek Vektör Makinesi kullanılarak Kaggle'da elde ettiğimiz büyük veri kümeleri üzerinde yapılan deneylerden sonra, yazılımımızın %99'a varan doğruluk oranları rapor edebildiğini göstermiştir. CR - [1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36. CR - [2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16 CR - [3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10. CR - [4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021 CR - [5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf CR - [6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac CR - [7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d CR - [8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020. CR - [9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366 CR - [10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008. CR - [11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d CR - [12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92 CR - [13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017. CR - [14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1 CR - [15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984. CR - [16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427. UR - https://dergipark.org.tr/tr/pub/researcher/issue//984460 L1 - https://dergipark.org.tr/tr/download/article-file/1931902 ER -