ADVANCED TURKISH FAKE NEWS PREDICTION WITH BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS

Mehmet Bozuyla

doi:10.36306/konjes.995060

Research Article

Çift Yönlü Transformatör Kodlayıcı Temsilleriyle Gelişmiş Türkçe Sahte Haber Tahmini

Year 2022, Volume: 10 Issue: 3, 750 - 761, 01.09.2022

Mehmet Bozuyla

https://doi.org/10.36306/konjes.995060

Abstract

Sosyal medya ve internetin artan kullanımı, çeşitli açılardan analiz edilmesi gereken önemli miktarda bilgi üretmektedir. Bu bağlamda yalan haber, gerçek haber olarak sunulan asılsız haber olarak tanımlanmaktadır. Sahte haberler genellikle bir manipülasyon amacına yönelik olarak üretilir. Sahte haber tespiti, genel olarak bir doğal dil analiz problemidir ve otomatik tahmin ediciler olarak makine öğrenmesi algoritmaları kullanılmaktadır. Naïve Bayes ve Rastgele Orman gibi iyi bilinen makine öğrenme algoritmaları, sahte haber tanımlama sorunu için başarıyla kullanılmaktadır. Türkçe morfolojik olarak zengin bir dildir ve yoğun dil ön işleme adımları ve özellik seçimi gerektiren sondan eklemeli karmaşıklığa sahiptir. Transformer olarak bilinen Çift Yönlü Kodlayıcı Temsilleri (BERT) gibi son zamanlardaki sinirsel dil modelleri, Türkçe benzeri morfolojik olarak zengin diller için doğal dil problemlerinin çözümünde nispeten basit bir metot fırsatı sunmaktadır. Bu çalışmada, NB, RF, Destek Vektör Makinesi, Naïve Bayes Multinomial ve Lojistik Regresyon ile korelasyon tabanlı öznitelik seçimi ve yeni önerilen Türkçe-BERT (BERTurk) ile Türkçe yalan haberlerini tespit etmek için karşılaştırdık. Ön işleme adımları olmaksızın BERTurk ile sahte haber tanımlamada %99,90 doğruluk elde ettik.

Keywords

Makina öğrenmesi, Metin madenciliği, Çift yönlü transformatör kodlayıcı temsilleri (BERT), Yalan haber, BERTurk

References

Al-Yahya, M., Al-Khalifa, H., Al-Baity, H., Alsaeed, D., & Essam, A., 2021, "Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches", Complexity.
Alim, A. A. A., Ayman, A., Praveen, K. D., & Myung, S. C., 2021, "Detecting Fake News using Machine Learning: A Systematic Literature Review", ArXiv Preprint ArXiv:2102.04458.
Amjad, M., Sidorov, G., Zhila, A., Gelbukh, A., & Rosso, P., 2021, "Overview of the shared task on fake news detection in urdu at FIRE 2020", CEUR Workshop Proceedings.
Bozuyla, M., & Özçift, A., 2022, "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data", Turkish Journal of Electrical Engineering & Computer Sciences, 30(3), 908–926.
Conroy, N. J., Rubin, V. L., & Chen, Y., 2015, "Automatic deception detection: Methods for finding fake news", Proceedings of the Association for Information Science and Technology, 52(1), 1–4.
D’Ulizia, A., Caschera, M. C., Ferri, F., & Grifoni, P., 2021, "Fake news detection: A survey of evaluation datasets", PeerJ Computer Science, 1–34. https://doi.org/10.7717/PEERJ-CS.518
Dadgar, S. M. H., Araghi, M. S., & Farahani, M. M., 2016, "A novel text mining approach based on TF-IDF and support vector machine for news classification", 2016 IEEE International Conference on Engineering and Technology (ICETECH), 112–116.
Dağli, İ., & Öztürk, A., 2021, "Görüntü Sınıflandırmada Derin Öğrenme Yöntemlerinin Karşılaştırılması", Konya Mühendislik Bilimleri Dergisi, 9(4), 872–888.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K., 2019, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", ArXiv Preprint ArXiv: 1810.04805v2, 4171–4186.
Flynn, D. J., Nyhan, B., & Reifler, J., 2017, "The Nature and Origins of Misperceptions: Understanding False and Unsupported Beliefs About Politics", Political Psychology, 38, 127–150.
Genç, Ş., & Surer, E., 2021, "ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms", Journal of Information Science, 1–20, https://doi.org/10.1177/01655515211007746.
Github, 2021, GitHub - sfkcvk/TurkishFakeNewsDataset: This is the reporsitory of Turkish fake news dataset which consists of Zaytung posts and Hurriyet news articles.
Jahanbakhsh-Nagadeh, Z., Feizi-Derakhshi, M. R., & Sharifi, A., 2020, "A semi-supervised model for Persian rumor verification based on content information", Multimedia Tools and Applications, 1–29, https://doi.org/10.1007/s11042-020-10077-3.
Jwa, H., Oh, D., Park, K., Kang, J. M., & Lim, H., 2019, "exBAKE: Automatic fake news detection model based on Bidirectional Encoder Representations from Transformers (BERT)", Applied Sciences, 9(19), 4062, https://doi.org/10.3390/app9194062.
Khorram, T., & Baykan, N. A., 2021, "Network Intrusion Detection using Optimized Machine Learning Algorithms", European Journal of Science and Technology, 25, 463–474.
Mertoğlu, U., & Genç, B., 2020, "Lexicon generation for detecting fake news", ArXiv Preprint ArXiv: 2010.11089, 1–16, https://arxiv.org/ftp/arxiv/papers/2010/2010.11089.pdf.
Nuzumlalı, M. Y., & Özgür, A., 2014, "Analyzing Stemming Approaches for Turkish Multi-Document Summarization", 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 702–706, https://github.com/manuyavuz/.
Oflazer, K., 2014, "Turkish and its challenges for language processing", Language Resources and Evaluation, 48(4), 639–653, https://doi.org/10.1007/s10579-014-9267-2.
Onan, A., & Tocoglu, M. A., 2020, "Satire identification in Turkish news articles based on ensemble of classifiers", Turkish Journal of Electrical Engineering and Computer Sciences, 28(2), 1086–1106.
Ozbay, F. A., & Alatas, B., 2019, "A Novel Approach for Detection of Fake News on Social Media Using Metaheuristic Optimization Algorithms", Elektronika Ir Elektrotechnika, 25(4), 62–67.
Ozbay, F. A., & Alatas, B., 2020, "Fake news detection within online social media using supervised artificial intelligence algorithms", Physica A: Statistical Mechanics and Its Applications, 540, 123174.
Sarker, I. H., 2021, "Machine Learning: Algorithms, Real-World Applications and Research Directions", SN Computer Science, 2(160), 1–21.
Sasikala, B. S., Biju, V. G., & Prashanth, C. M., 2017, "Kappa and accuracy evaluations of machine learning classifiers", 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT)., 20–23.
Schweter, S., 2020, BERTurk - BERT models for Turkish. Taşkın, S. G., Küçüksille, E. U., & Topal, K., 2021, "Twitter üzerinde Türkçe sahte haber tespiti", Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 23(1), 151–172.
Uysal, A. K., & Gunal, S., 2014, "The impact of preprocessing on text classification", Information Processing & Management, 50(1), 104–112.
Wardhani, N. W. S., Rochayani, M. Y., Iriany, A., Sulistyono, A. D., & Lestantyo, P., 2019, "Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data", 2019 International Conference on Computer, Control, Informatics and Its Applications, 14–18.
Web1, 2018, Fake News and Disinformation Online Report, European, Commission, https://europa.eu/eurobarometer/surveys/detail/2183.
Web3, 2021, Zaytung.Com, https://www.zaytung.com/.

ADVANCED TURKISH FAKE NEWS PREDICTION WITH BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS

Year 2022, Volume: 10 Issue: 3, 750 - 761, 01.09.2022

Mehmet Bozuyla

https://doi.org/10.36306/konjes.995060

Abstract

The increasing usage of social media and internet generates a significant amount of information to be analyzed from various perspectives. In particular, fake news is defined as the false news that is presented as factual news. Fake news are in general fabricated toward a manipulation aim. Fake news identification is in general a natural language analysis problem and machine learning algorithms are emerged as automated predictors. Well-known machine learning algorithms such as Naïve Bayes (NB) and Random Forest (RF) are successfully used for fake-news identification problem. Turkish is a morphologically rich language and it has agglutinative complexity that requires dense language pre-processing steps and feature selection. Recent neural language models such as Bidirectional Encoder Representations from Transformers (BERT) proposes an opportunity for Turkish-like morphologically rich languages a relatively straightforward pipeline in the solution of natural language problems. In this work, we compared NB, RF, Support Vector Machine (SVM), Naïve Bayes Multinomial (NBM) and Logistics Regression (LR) on top of correlation based feature selection and newly proposed Turkish-BERT (BERTurk) to identify Turkish fake news. And we obtained 99.90 % accuracy in fake news identification which is a highly efficient model without substantial language pre-processing tasks.

Keywords

Machine learning, Text mining, Bidirectional Encoder Representations from Transformers (BERT), Fake news, BERTurk

References

Al-Yahya, M., Al-Khalifa, H., Al-Baity, H., Alsaeed, D., & Essam, A., 2021, "Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches", Complexity.
Alim, A. A. A., Ayman, A., Praveen, K. D., & Myung, S. C., 2021, "Detecting Fake News using Machine Learning: A Systematic Literature Review", ArXiv Preprint ArXiv:2102.04458.
Amjad, M., Sidorov, G., Zhila, A., Gelbukh, A., & Rosso, P., 2021, "Overview of the shared task on fake news detection in urdu at FIRE 2020", CEUR Workshop Proceedings.
Bozuyla, M., & Özçift, A., 2022, "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data", Turkish Journal of Electrical Engineering & Computer Sciences, 30(3), 908–926.
Conroy, N. J., Rubin, V. L., & Chen, Y., 2015, "Automatic deception detection: Methods for finding fake news", Proceedings of the Association for Information Science and Technology, 52(1), 1–4.
D’Ulizia, A., Caschera, M. C., Ferri, F., & Grifoni, P., 2021, "Fake news detection: A survey of evaluation datasets", PeerJ Computer Science, 1–34. https://doi.org/10.7717/PEERJ-CS.518
Dadgar, S. M. H., Araghi, M. S., & Farahani, M. M., 2016, "A novel text mining approach based on TF-IDF and support vector machine for news classification", 2016 IEEE International Conference on Engineering and Technology (ICETECH), 112–116.
Dağli, İ., & Öztürk, A., 2021, "Görüntü Sınıflandırmada Derin Öğrenme Yöntemlerinin Karşılaştırılması", Konya Mühendislik Bilimleri Dergisi, 9(4), 872–888.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K., 2019, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", ArXiv Preprint ArXiv: 1810.04805v2, 4171–4186.
Flynn, D. J., Nyhan, B., & Reifler, J., 2017, "The Nature and Origins of Misperceptions: Understanding False and Unsupported Beliefs About Politics", Political Psychology, 38, 127–150.
Genç, Ş., & Surer, E., 2021, "ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms", Journal of Information Science, 1–20, https://doi.org/10.1177/01655515211007746.
Github, 2021, GitHub - sfkcvk/TurkishFakeNewsDataset: This is the reporsitory of Turkish fake news dataset which consists of Zaytung posts and Hurriyet news articles.
Jahanbakhsh-Nagadeh, Z., Feizi-Derakhshi, M. R., & Sharifi, A., 2020, "A semi-supervised model for Persian rumor verification based on content information", Multimedia Tools and Applications, 1–29, https://doi.org/10.1007/s11042-020-10077-3.
Jwa, H., Oh, D., Park, K., Kang, J. M., & Lim, H., 2019, "exBAKE: Automatic fake news detection model based on Bidirectional Encoder Representations from Transformers (BERT)", Applied Sciences, 9(19), 4062, https://doi.org/10.3390/app9194062.
Khorram, T., & Baykan, N. A., 2021, "Network Intrusion Detection using Optimized Machine Learning Algorithms", European Journal of Science and Technology, 25, 463–474.
Mertoğlu, U., & Genç, B., 2020, "Lexicon generation for detecting fake news", ArXiv Preprint ArXiv: 2010.11089, 1–16, https://arxiv.org/ftp/arxiv/papers/2010/2010.11089.pdf.
Nuzumlalı, M. Y., & Özgür, A., 2014, "Analyzing Stemming Approaches for Turkish Multi-Document Summarization", 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 702–706, https://github.com/manuyavuz/.
Oflazer, K., 2014, "Turkish and its challenges for language processing", Language Resources and Evaluation, 48(4), 639–653, https://doi.org/10.1007/s10579-014-9267-2.
Onan, A., & Tocoglu, M. A., 2020, "Satire identification in Turkish news articles based on ensemble of classifiers", Turkish Journal of Electrical Engineering and Computer Sciences, 28(2), 1086–1106.
Ozbay, F. A., & Alatas, B., 2019, "A Novel Approach for Detection of Fake News on Social Media Using Metaheuristic Optimization Algorithms", Elektronika Ir Elektrotechnika, 25(4), 62–67.
Ozbay, F. A., & Alatas, B., 2020, "Fake news detection within online social media using supervised artificial intelligence algorithms", Physica A: Statistical Mechanics and Its Applications, 540, 123174.
Sarker, I. H., 2021, "Machine Learning: Algorithms, Real-World Applications and Research Directions", SN Computer Science, 2(160), 1–21.
Sasikala, B. S., Biju, V. G., & Prashanth, C. M., 2017, "Kappa and accuracy evaluations of machine learning classifiers", 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT)., 20–23.
Schweter, S., 2020, BERTurk - BERT models for Turkish. Taşkın, S. G., Küçüksille, E. U., & Topal, K., 2021, "Twitter üzerinde Türkçe sahte haber tespiti", Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 23(1), 151–172.
Uysal, A. K., & Gunal, S., 2014, "The impact of preprocessing on text classification", Information Processing & Management, 50(1), 104–112.
Wardhani, N. W. S., Rochayani, M. Y., Iriany, A., Sulistyono, A. D., & Lestantyo, P., 2019, "Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data", 2019 International Conference on Computer, Control, Informatics and Its Applications, 14–18.
Web1, 2018, Fake News and Disinformation Online Report, European, Commission, https://europa.eu/eurobarometer/surveys/detail/2183.
Web3, 2021, Zaytung.Com, https://www.zaytung.com/.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Article
Authors	Mehmet Bozuyla 0000-0002-7485-6106
Publication Date	September 1, 2022
Submission Date	September 14, 2021
Acceptance Date	August 4, 2022
Published in Issue	Year 2022 Volume: 10 Issue: 3

Cite

IEEE	M. Bozuyla, “ADVANCED TURKISH FAKE NEWS PREDICTION WITH BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS”, KONJES, vol. 10, no. 3, pp. 750–761, 2022, doi: 10.36306/konjes.995060.

Download Cover Image

Article Files

Full Text