Twitter üzerinde Türkçe sahte haber tespiti

Süleyman Gökhan Taşkın; Ecir Uğur Küçüksille; Kamil Topal

doi:10.25092/baunfbed.843909

Research Article

Twitter üzerinde Türkçe sahte haber tespiti

Year 2021, Volume: 23 Issue: 1, 151 - 172, 29.01.2021

Süleyman Gökhan Taşkın , Ecir Uğur Küçüksille Kamil Topal

https://doi.org/10.25092/baunfbed.843909

Cited By: 7

Abstract

Son yıllarda internet kullanımının artmasıyla insanların bilgi ve haber alma kaynakları da değişmiştir. Radyo, televizyon, gazete ve dergi gibi geleneksel medya platformları yerine sosyal medya platformlarının kullanımı giderek artmaktadır. Geleneksel medyada haberler belirli bir kaynak tarafından gönderilirken, sosyal medyada her kullanıcı bir haber kaynağı olabilmektedir. Bu da sosyal medyadaki haberlerin bir süzgeçten geçirilmeden paylaşılmasına ve sahte haberlerin büyük bir hızla yayılmasına neden olmaktadır. Sahte haber; propaganda, provokasyon veya insanları yanıltma amacıyla sahte veya provokatif kullanıcılar tarafından yayılan haberlerdir. Dikkat çekici özellikte oldukları için sosyal medya aracılığı ile çok kısa sürede yayılabilmektedirler. Bu nedenle sahte haberlerin en kısa sürede tespit edilmesi büyük öneme sahiptir. Çoğu sosyal medya platformunda sahte haber tespiti uzmanlar tarafından yapılmaktadır. Çok yoğun paylaşım trafiği bulunan sosyal medya platformlarında uzmanlar tarafından kısa sürede sahte haber tespiti mümkün olmamaktadır. Bu da sahte haberin kısa sürede çok kişi tarafından paylaşılmasına neden olmaktadır. Bu nedenle yarı otomatik ve otomatik sahte haber tespiti sistemleri, uzmanlara göre daha kısa sürede sahte haber tespitini sağlayabilmektedir. Sahte haberleri kısa sürede tespit edebilmek için otomatik tespit sistemlerinin geliştirilmesi gerekmektedir. Bu çalışmada Türkçe dilinde, denetimli ve denetimsiz makine öğrenmesi algoritmaları kullanılarak Twitter üzerinde sahte haber tespiti yapılmış ve sonuçları incelenmiştir. Denetimsiz öğrenme algoritmalarından, K-ortalamalar (K-means), Negatif Olmayan Matris Çarpımı (Non-Negative Matrix Factorization-NMF) ve Doğrusal Diskriminant Analizi (Linear Discriminant Analysis-LDA); denetimli öğrenme algoritmalarından, K En Yakın Komşu (K Nearest Neighbor-KNN), Destek Vektör Makinaları (Support Vector Machines-SVM) ve Rassal Orman (Random Forest-RF) algoritmaları ile tahmin yapılmıştır. Her bir algoritma 100 defa çalıştırılarak ortalama F1 metrik değerleri incelenmiştir. Denetimli öğrenme algoritmalarında 0.86 F1-metrik değeriyle başarılı sonuçlar alınmıştır. Denetimsiz öğrenme algoritmalarının F1-metrik değeri ise 0.72'de kalmıştır.

Keywords

Sahte haber tespiti , makine öğrenmesi , yapay zeka

References

Del Vicario, M. vd., The spreading of misinformation online, Proceedings of the National Academy of Sciences, 113, 3, 554–559, (2016).
Twitter, "KAMUOYUNA DUYURU İletişim Başkanlığı, vatandaşlardan hiçbir şekilde kredi kartı bilgilerini talep etmez. Kurumumuzun adı ve logosu ile yayılan “elektrik ve doğal gaz fatura iadesi” bildirimi, dolandırıcıların milletimizin devletimize olan güvenini kötüye, [Tweet]" (2020). https://twitter.com/iletisim/status/1213530046733979649, (04.1.2020).
Twitter, "Yoğun kar yağışı,buzlanma ve soğuk nedeniyle, 07 Ocak 2020 Salı günü, il merkezi dışında kalan resmi ve özel tüm okul ve kurumlarımızda (okul öncesi, ilkokul, ortaokul, lise ve yaygın eğitim kurumları) eğitim öğretime bir gün ara verilmiştir. [Tweet]", (2020). https://twitter.com/eskvalilik/status/1214309576939573248, (07.1.2020).
Ihlas Haber Ajansi, Eskişehir’de sahte hesaptan kar tatili mesajı atıldı, 2020. https://www.iha.com.tr/haber-eskisehirde-sahte-hesaptan-kar-tatili-mesaji-atildi-821170/, (06.1.2020).
Shu, K., Sliva, A., Wang, S., Tang, J. ve Liu, H., Fake News Detection on Social Media, ACM SIGKDD Explorations Newsletter, 19, 1, 22–36, (2017).
Newman, N., Fletcher, R., Kalogeropoulos, A. ve Nielsen, R., Reuters Institute Digital News Report 2018, Teknik Rapor, Reuters Institute for the Study of Journalism, Oxford, (2018).
Newman, N., Fletcher, R., Kalogeropoulos, A. ve Nielsen, R., Reuters Institute Digital News Report 2019, Teknik Rapor, Reuters Institute for the Study of Journalism, Oxford, (2019).
Zhao, X. ve Jiang, J., An empirical comparison of topics in twitter and traditional media, Singapore Management University School of Information Systems Technical paper series, (2011).
Twitter, Twitter Inc., 2006. https://twitter.com/, (10.10.2018).
Alpaydin, E., Machine Learning: The New AI. Cambridge, MA: The MIT Press, (2016).
Rosten, E. ve Drummond, T., Machine learning for high-speed corner detection, European Conference on Computer Vision, Lecture Notes in Computer Science, 430–443, Graz- Austria, (2006).
Arganda-Carreras, I. vd., Trainable weka segmentation: a machine learning tool for microscopy pixel classification, Bioinformatics, 33, 15, 2424–2426, (2017).
Amodei, D. vd., Deep speech 2: end-to-end speech recognition in english and mandarin, Computing Research Repository,(2015).
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K. ve Bengio, Y., Attention-based models for speech recognition, Computing Research Repository, (2015).
Pang, B. ve Lee, L., Opinion mining and sentiment analysis, Foundations and Trends® in Information Retrieval, 2, 1–2, 1–135, (2008).
Pang, B. ve Lee, L., A Sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, (2004), doi: 0409058.
Pomerleau, D. ve Rao, D., Fake News Challenge, (2015). http://www.fakenewschallenge.org/, (11.07.2018).
Vosoughi, S., Automatic detection and verification of rumors on twitter, Yüksek Lisans Tezi, Massachusetts Institute of Technology, Cambridge, (2015).
Chen, Y. ve Chen, H., Opinion spam detection in web forum : a real case study, Www 2015, 1, 173–183, (2015).
Ahmed, H., Detecting opinion spam and fake news using n-gram analysis and semantic similarity, Yüksek Lisans Tezi, University of Ahram Canadian, Kahire, (2017).
Bajaj, S., “The Pope Has a New Baby !” Fake news detection using deep learning, 1–8, (2017).
Granik, M. ve Mesyura, V., Fake news detection using naive Bayes classifier, Electrical and Computer Engineering (UKRCON), 2017 IEEE First Ukraine Conference on, 900–903, (2017).
Patel, M., Detection of Maliciously Authored News Articles, Yüksek Lisans Tezi, The Cooper Union For The Advancement of Science and Art, New York, (2017).
Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S. ve de Alfaro, L., Some like it Hoax: Automated fake news detection in social networks, SoGood 2017 - Second Workshop on Data Science for Social Good, Skopje-Macedonia,(2017).
Ågren, A. ve Ågren, C., Combating fake news with stance detection using recurrent neural networks, Yüksek Lisans Tezi, University of Gothenburg, Gothenburg, (2018).
Rajendran, G., Chitturi, B. ve Poornachandran, P., Stance-in-depth deep neural approach to stance classification, International Conference on Computational Intelligence and Data Science (ICCIDS 2018), 132, 1646–1653, (2018).
Mertoğlu, U., Sever, H., ve Genç, B., Savunmada Yenilikçi bir Dijital Dönüşüm Alanı: Sahte Haber Tespit Modeli, SAVTEK 2018 - 9. Savunma Teknolojileri Kongresi, 771–778, (2018).
Mertoğlu, U., Genç, B., Sever, H. ve Sağlam, F., Auto-Tagging Model For Turkish News, içinde International Ankara Conference on Scientific Researches, 615–623, (2019).
Twitter Search API, Twitter Search API, Twitter, (2018). https://developer.twitter.com/en/docs/basics/getting-started, (10.06.2018).
Github, TweetScraper, (2019). https://github.com/jonbakerfish/TweetScraper, (10.06.2018).
Teyit.org, teyit.org, (2016). https://teyit.org/, (01.08.2018).
Levenshtein, V. I., Двоичные коды с исправлением выпадений, вставок и замещений символов (Binary Codes Capable of Correcting Deletions, Insertions, and Reversals), Доклады Академий Наук СССР, 163, 4, 845–848, (1965).
Bird, S., Klein, E. ve Loper, E., Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., (2009).
Mikolov, T., Chen, K., Corrado, G. ve Dean, J., Efficient estimation of word representations in vector space, (2013).
Le, Q. V. ve Mikolov, T., Distributed Representations of Sentences and Documents, (2014). http://arxiv.org/abs/1405.4053, (16.03.2019).
Goodfellow, I., Bengio, Y. ve Courville, A., Deep learning. Cambridge, MA: The MIT Press, (2017).
Cunningham, P. ve Delany, S. J., k-Nearest Neighbour Classifiers -- 2nd Edition, (2020). http://arxiv.org/abs/2004.04523, (10.11.2019).
V. Vapnik, The Nature of Statistical Learning Theory. Springer, (1995).
Breiman, L., Random Forests, Machine Learning, Springer, 5–32, (2001).
Arthur, D. ve Vassilvitskii, S., k-means++: The Advantages of Careful Seeding, (2006). http://ilpubs.stanford.edu:8090/778/, (08.11.2019).
Lee, D. D. ve Seung, H. S., Learning the parts of objects by non-negative matrix factorization, Nature, 401, 6755, 788–791, (1999).
Shahnaz, F., Berry, M.W., Pauca, V. P. ve Plemmons, R. J., Document clustering using nonnegative matrix factorization, Information Processing & Management, 42, 2, 373–386, (2006).
Xu, W., Liu, X. ve Gong, Y., Document clustering based on non-negative matrix factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR ’03, 267, (2003).
Fisher, R.A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 2, 179–188, (1936).

Turkish fake news detection on twitter

Year 2021, Volume: 23 Issue: 1, 151 - 172, 29.01.2021

Süleyman Gökhan Taşkın , Ecir Uğur Küçüksille Kamil Topal

https://doi.org/10.25092/baunfbed.843909

Cited By: 7

Abstract

In recent years, news and their sources have transformed with the increasing use of the internet. Instead of traditional media platforms such as radio, television, newspaper and magazine, the use of social media platforms is also growing. While certain sources share the news in traditional media, every user can be a news source in social media. Fake news is news produced by fake or provocative users for the purpose of propaganda, provocation or misleading users. Since an ordinary social media user may share any news without any filter and they are usually interesting, a fake news can spread rapidly. For this reason, it is very important to detect fake news as soon as possible. Sometimes, fake news is detected by expert systems. It is not possible to detect fake news in a short time with such expert systems on social media platforms with very dense sharing traffic. This causes fake news to be shared by many people in a short time. Therefore, semi-automatic and automatic fake news detection systems can provide fake news detection in a shorter time than non-autonomous expert systems. Automatic detection systems are needed to be developed in order to overcome this shortcoming. In this study, we collect data from Twitter, annotate them whether they are fake or real news. Then, we use supervised (K-Nearest Neighbor-KNN, Support Vector Machines-SVM, and Random Forest) and unsupervised (K-means, Non-Negative Matrix Factorization-NMF, and Linear Discriminant Analysis-LDA) machine learning algorithms to detect fake news automatically. We run each algorithm 100 times and the average F1-score values were examined. The best results were obtained with 0.86 F1-score value in supervised learning algorithms. The F1- score value of unsupervised learning algorithms remained at 0.72.

Keywords

Fake news detection , machine learning , artificial intelligence

References

Del Vicario, M. vd., The spreading of misinformation online, Proceedings of the National Academy of Sciences, 113, 3, 554–559, (2016).
Twitter, "KAMUOYUNA DUYURU İletişim Başkanlığı, vatandaşlardan hiçbir şekilde kredi kartı bilgilerini talep etmez. Kurumumuzun adı ve logosu ile yayılan “elektrik ve doğal gaz fatura iadesi” bildirimi, dolandırıcıların milletimizin devletimize olan güvenini kötüye, [Tweet]" (2020). https://twitter.com/iletisim/status/1213530046733979649, (04.1.2020).
Twitter, "Yoğun kar yağışı,buzlanma ve soğuk nedeniyle, 07 Ocak 2020 Salı günü, il merkezi dışında kalan resmi ve özel tüm okul ve kurumlarımızda (okul öncesi, ilkokul, ortaokul, lise ve yaygın eğitim kurumları) eğitim öğretime bir gün ara verilmiştir. [Tweet]", (2020). https://twitter.com/eskvalilik/status/1214309576939573248, (07.1.2020).
Ihlas Haber Ajansi, Eskişehir’de sahte hesaptan kar tatili mesajı atıldı, 2020. https://www.iha.com.tr/haber-eskisehirde-sahte-hesaptan-kar-tatili-mesaji-atildi-821170/, (06.1.2020).
Shu, K., Sliva, A., Wang, S., Tang, J. ve Liu, H., Fake News Detection on Social Media, ACM SIGKDD Explorations Newsletter, 19, 1, 22–36, (2017).
Newman, N., Fletcher, R., Kalogeropoulos, A. ve Nielsen, R., Reuters Institute Digital News Report 2018, Teknik Rapor, Reuters Institute for the Study of Journalism, Oxford, (2018).
Newman, N., Fletcher, R., Kalogeropoulos, A. ve Nielsen, R., Reuters Institute Digital News Report 2019, Teknik Rapor, Reuters Institute for the Study of Journalism, Oxford, (2019).
Zhao, X. ve Jiang, J., An empirical comparison of topics in twitter and traditional media, Singapore Management University School of Information Systems Technical paper series, (2011).
Twitter, Twitter Inc., 2006. https://twitter.com/, (10.10.2018).
Alpaydin, E., Machine Learning: The New AI. Cambridge, MA: The MIT Press, (2016).
Rosten, E. ve Drummond, T., Machine learning for high-speed corner detection, European Conference on Computer Vision, Lecture Notes in Computer Science, 430–443, Graz- Austria, (2006).
Arganda-Carreras, I. vd., Trainable weka segmentation: a machine learning tool for microscopy pixel classification, Bioinformatics, 33, 15, 2424–2426, (2017).
Amodei, D. vd., Deep speech 2: end-to-end speech recognition in english and mandarin, Computing Research Repository,(2015).
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K. ve Bengio, Y., Attention-based models for speech recognition, Computing Research Repository, (2015).
Pang, B. ve Lee, L., Opinion mining and sentiment analysis, Foundations and Trends® in Information Retrieval, 2, 1–2, 1–135, (2008).
Pang, B. ve Lee, L., A Sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, (2004), doi: 0409058.
Pomerleau, D. ve Rao, D., Fake News Challenge, (2015). http://www.fakenewschallenge.org/, (11.07.2018).
Vosoughi, S., Automatic detection and verification of rumors on twitter, Yüksek Lisans Tezi, Massachusetts Institute of Technology, Cambridge, (2015).
Chen, Y. ve Chen, H., Opinion spam detection in web forum : a real case study, Www 2015, 1, 173–183, (2015).
Ahmed, H., Detecting opinion spam and fake news using n-gram analysis and semantic similarity, Yüksek Lisans Tezi, University of Ahram Canadian, Kahire, (2017).
Bajaj, S., “The Pope Has a New Baby !” Fake news detection using deep learning, 1–8, (2017).
Granik, M. ve Mesyura, V., Fake news detection using naive Bayes classifier, Electrical and Computer Engineering (UKRCON), 2017 IEEE First Ukraine Conference on, 900–903, (2017).
Patel, M., Detection of Maliciously Authored News Articles, Yüksek Lisans Tezi, The Cooper Union For The Advancement of Science and Art, New York, (2017).
Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S. ve de Alfaro, L., Some like it Hoax: Automated fake news detection in social networks, SoGood 2017 - Second Workshop on Data Science for Social Good, Skopje-Macedonia,(2017).
Ågren, A. ve Ågren, C., Combating fake news with stance detection using recurrent neural networks, Yüksek Lisans Tezi, University of Gothenburg, Gothenburg, (2018).
Rajendran, G., Chitturi, B. ve Poornachandran, P., Stance-in-depth deep neural approach to stance classification, International Conference on Computational Intelligence and Data Science (ICCIDS 2018), 132, 1646–1653, (2018).
Mertoğlu, U., Sever, H., ve Genç, B., Savunmada Yenilikçi bir Dijital Dönüşüm Alanı: Sahte Haber Tespit Modeli, SAVTEK 2018 - 9. Savunma Teknolojileri Kongresi, 771–778, (2018).
Mertoğlu, U., Genç, B., Sever, H. ve Sağlam, F., Auto-Tagging Model For Turkish News, içinde International Ankara Conference on Scientific Researches, 615–623, (2019).
Twitter Search API, Twitter Search API, Twitter, (2018). https://developer.twitter.com/en/docs/basics/getting-started, (10.06.2018).
Github, TweetScraper, (2019). https://github.com/jonbakerfish/TweetScraper, (10.06.2018).
Teyit.org, teyit.org, (2016). https://teyit.org/, (01.08.2018).
Levenshtein, V. I., Двоичные коды с исправлением выпадений, вставок и замещений символов (Binary Codes Capable of Correcting Deletions, Insertions, and Reversals), Доклады Академий Наук СССР, 163, 4, 845–848, (1965).
Bird, S., Klein, E. ve Loper, E., Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., (2009).
Mikolov, T., Chen, K., Corrado, G. ve Dean, J., Efficient estimation of word representations in vector space, (2013).
Le, Q. V. ve Mikolov, T., Distributed Representations of Sentences and Documents, (2014). http://arxiv.org/abs/1405.4053, (16.03.2019).
Goodfellow, I., Bengio, Y. ve Courville, A., Deep learning. Cambridge, MA: The MIT Press, (2017).
Cunningham, P. ve Delany, S. J., k-Nearest Neighbour Classifiers -- 2nd Edition, (2020). http://arxiv.org/abs/2004.04523, (10.11.2019).
V. Vapnik, The Nature of Statistical Learning Theory. Springer, (1995).
Breiman, L., Random Forests, Machine Learning, Springer, 5–32, (2001).
Arthur, D. ve Vassilvitskii, S., k-means++: The Advantages of Careful Seeding, (2006). http://ilpubs.stanford.edu:8090/778/, (08.11.2019).
Lee, D. D. ve Seung, H. S., Learning the parts of objects by non-negative matrix factorization, Nature, 401, 6755, 788–791, (1999).
Shahnaz, F., Berry, M.W., Pauca, V. P. ve Plemmons, R. J., Document clustering using nonnegative matrix factorization, Information Processing & Management, 42, 2, 373–386, (2006).
Xu, W., Liu, X. ve Gong, Y., Document clustering based on non-negative matrix factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR ’03, 267, (2003).
Fisher, R.A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 2, 179–188, (1936).

There are 44 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Research Article
Authors	Süleyman Gökhan Taşkın 0000-0002-1535-7462 Ecir Uğur Küçüksille This is me 0000-0002-3293-9878 Kamil Topal This is me 0000-0002-0266-7365
Publication Date	January 29, 2021
Submission Date	April 25, 2020
Published in Issue	Year 2021 Volume: 23 Issue: 1

Cite

APA	Taşkın, S. G., Küçüksille, E. U., & Topal, K. (2021). Twitter üzerinde Türkçe sahte haber tespiti. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 23(1), 151-172. https://doi.org/10.25092/baunfbed.843909
AMA	Taşkın SG, Küçüksille EU, Topal K. Twitter üzerinde Türkçe sahte haber tespiti. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi. January 2021;23(1):151-172. doi:10.25092/baunfbed.843909
Chicago	Taşkın, Süleyman Gökhan, Ecir Uğur Küçüksille, and Kamil Topal. “Twitter üzerinde Türkçe Sahte Haber Tespiti”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23, no. 1 (January 2021): 151-72. https://doi.org/10.25092/baunfbed.843909.
EndNote	Taşkın SG, Küçüksille EU, Topal K (January 1, 2021) Twitter üzerinde Türkçe sahte haber tespiti. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23 1 151–172.
IEEE	S. G. Taşkın, E. U. Küçüksille, and K. Topal, “Twitter üzerinde Türkçe sahte haber tespiti”, Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 23, no. 1, pp. 151–172, 2021, doi: 10.25092/baunfbed.843909.
ISNAD	Taşkın, Süleyman Gökhan et al. “Twitter üzerinde Türkçe Sahte Haber Tespiti”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23/1 (January2021), 151-172. https://doi.org/10.25092/baunfbed.843909.
JAMA	Taşkın SG, Küçüksille EU, Topal K. Twitter üzerinde Türkçe sahte haber tespiti. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2021;23:151–172.
MLA	Taşkın, Süleyman Gökhan et al. “Twitter üzerinde Türkçe Sahte Haber Tespiti”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 23, no. 1, 2021, pp. 151-72, doi:10.25092/baunfbed.843909.
Vancouver	Taşkın SG, Küçüksille EU, Topal K. Twitter üzerinde Türkçe sahte haber tespiti. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2021;23(1):151-72.