Detecting Fake News on Big Data

Begüm Subaşı; Hilalnur Beral; Nilüfer Güleç; Tansel Dökeroğlu

Araştırma Makalesi

Detecting Fake News on Big Data

Yıl 2021, Cilt: 01 Sayı: 02, 1 - 5, 31.12.2021

Begüm Subaşı , Hilalnur Beral , Nilüfer Güleç , Tansel Dökeroğlu

Öz

In this study, we developed a new framework for detecting fake news, which has recently become a significant problem in social media. We compared the performances of different machine learning approaches. It becomes a challenging problem to detect fake news effectively. Apache Spark’s machine learning environment, where many processors can work simultaneously, offers a very suitable environment for dealing with big data classification problems. After experiments using Naïve Bayes, Neural Network, Logistic regression, and Support Vector Machine on large datasets we obtained on Kaggle showed that our software can report up to 99% accuracy rates.

Anahtar Kelimeler

Fake news, Machine learning, Big data, classification

Kaynakça

[1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
[2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
[3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
[4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
[5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
[6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
[7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
[8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
[9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
[10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
[11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
[12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
[13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
[14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
[15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
[16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.

Büyük Veri Üzerindeki Sahte Haberlerin Tespit Edilmesi

Yıl 2021, Cilt: 01 Sayı: 02, 1 - 5, 31.12.2021

Begüm Subaşı , Hilalnur Beral , Nilüfer Güleç , Tansel Dökeroğlu

Öz

Bu çalışmada, son zamanlarda sosyal medyada önemli bir sorun haline gelen yalan haberlerin tespiti için yeni bir çerçeve geliştirilmiştir. Farklı makine öğrenimi yaklaşımlarının performanslarını karşılaştırılmıştır. Sahte haberleri etkili bir şekilde tespit etmek zorlu bir sorun haline gelmektedir. Birçok işlemcinin aynı anda çalışabildiği Apache Spark makine öğrenme ortamı, büyük veri sınıflandırma problemlerinin üstesinden gelmek için oldukça uygun bir ortam sunmaktadır. Naive Bayes, Yapay Sinir Ağı, Lojistik regresyon ve Destek Vektör Makinesi kullanılarak Kaggle'da elde ettiğimiz büyük veri kümeleri üzerinde yapılan deneylerden sonra, yazılımımızın %99'a varan doğruluk oranları rapor edebildiğini göstermiştir.

Anahtar Kelimeler

Sahte Haberler, makine öğrenmesi, büyük veri, sınıflandırma

Kaynakça

[1] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
[2] Nikiforos M.N., Vergis S., Stylidou A., Augoustis N., Kermanidis K.L., Maragoudakis M. (2020) Fake News Detection Regarding the Hong Kong Events from Tweets. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020 IFIP WG 12.5 International Workshops. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-030-49190-1_16
[3] Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3), 10.
[4] Sahoo, S. R., & Gupta, B. B., Multiple features based approach for automatic fake news detection on social networks using deep learning, Applied Soft Computing, 100, 106983, 2021
[5] Nada, F , Khan, B , Maryam, A , Zuha, N,Ahmed,Z . (2019). Fake news detection using logistic regression. International Research Journal of Engineering and Technology (IRJET).https://www.irjet.net/archives/V6/i5/IRJET-V6I5733.pdf
[6] Medium. “Decision Tree Classification”. Access: 4 June 2021. https://medium.com/swlh/decision-tree-classification-de64fc4d5aac
[7] Towards Data science. “Gradient Boosting Classification explained through Python” Access: 4 June 2021. https://towardsdatascience.com/gradient-boosting-classification-explained-through-python-60cc980eeb3d
[8] Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020.
[9] Erdi, B , Şahin, E , Toydemir, M , Dökeroğlu, T . (2021). Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti . Düzce Üniversitesi Bilim ve Teknoloji Dergisi , 9 (1) , 430-442 . DOI: 10.29130/dubited.748366
[10] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
[11] Medium. “Introduction to Naive Bayes for Classification”. Access: 4 June 2021. https://medium.com/@srishtisawla/introduction-to-naive-bayes-for-classification-baefefb43a2d
[12] Towards Data science. “Introduction to Naive Bayes Classifier” Access: 4 June 2021.https://towardsdatascience.com/introduction-to-naive-bayes-classifier-f5c202c97f92
[13] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and variable importance in random forests,” Statistics and Computing, vol. 27, no. 3, pp. 659–678, 2017.
[14] Medium. “Chapter 5: Random Forest Classifier”. Access: 4 June 2021. https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1
[15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Springer, Berlin, Germany, 1984.
[16] Sevinc, E. (2019). A novel evolutionary algorithm for data classification problem with extreme learning machines. IEEE Access, 7, 122419-122427.

Toplam 16 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yapay Zeka
Bölüm	Araştırma Makaleleri
Yazarlar	Begüm Subaşı 0000-0003-1665-5928 Hilalnur Beral 0000-0003-1665-5928 Nilüfer Güleç 0000-0003-1665-5928 Tansel Dökeroğlu 0000-0003-1665-5928
Yayımlanma Tarihi	31 Aralık 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 01 Sayı: 02

Kaynak Göster

IEEE	B. Subaşı, H. Beral, N. Güleç, ve T. Dökeroğlu, “Detecting Fake News on Big Data”, Researcher, c. 01, sy. 02, ss. 1–5, 2021.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Yayın hayatına 2013 yılında başlamış olan "Researcher: Social Sciences Studies" (RSSS) dergisi, 2020 Ağustos ayı itibariyle "Researcher" ismiyle Ankara Bilim Üniversitesi bünyesinde faaliyetlerini sürdürmektedir.
2021 yılı ve sonrasında Mühendislik ve Fen Bilimleri alanlarında katkıda bulunmayı hedefleyen özgün araştırma makalelerinin yayımlandığı uluslararası indeksli, ulusal hakemli, bilimsel ve elektronik bir dergidir.
Dergi özel sayılar dışında yılda iki kez yayımlanmaktadır. Amaçları doğrultusunda dergimizin yayın odağında; Endüstri Mühendisliği, Yazılım Mühendisliği, Bilgisayar Mühendisliği ve Elektrik Elektronik Mühendisliği alanları bulunmaktadır.
Dergide yayımlanmak üzere gönderilen aday makaleler Türkçe ve İngilizce dillerinde yazılabilir. Dergiye gönderilen makalelerin daha önce başka bir dergide yayımlanmamış veya yayımlanmak üzere başka bir dergiye gönderilmemiş olması gerekmektedir.