Machine Learning Based Deception Detection System in Online Social Networks

Harun Bingol; Bilal Alatas

doi:10.29132/ijpas.994840

Araştırma Makalesi

Çevrimiçi Sosyal Ağlarda Makine Öğrenmesi Tabanlı Aldatma Tespit Sistemi

Yıl 2022, Cilt: 8 Sayı: 1, 31 - 42, 30.06.2022

Harun Bingol , Bilal Alatas

https://doi.org/10.29132/ijpas.994840

https://izlik.org/JA58HE76PC

Öz

İnternet teknolojilerinin hızla yaygınlaşması, insanların bilgiye erişim açısından yaşamlarını kolaylaştırmaktadır. Ancak internetin bu olumlu yönlerine ilaveten olumsuz etkileride göz ardı edilemez. Bunların en önemlisi ise sosyal medya üzerinden güvenilirliği tartışmalı olan bilgiye erişmek isteyen insanların aldatılmasıdır. Aldatma, genel olarak insanların belirli bir konuda düşüncelerini yönlendirmeyi ve belirli bir amaca yönelik toplumsal bir algı oluşturmayı amaçlar. Bu fenomenin tespiti, sosyal ağları kullanan insan sayısındaki muazzam artış nedeniyle giderek daha önemli hale geliyor. Bazı araştırmacılar son zamanlarda aldatma tespiti problemini çözmek için teknikler önermiş olsa da, farklı değerlendirme ölçütleri açısından yüksek performanslı sistemler tasarlamaya ve kullanmaya ihtiyaç vardır. Bu çalışmada, çevrimiçi sosyal ağlarda aldatma tespiti problemi bir sınıflandırma problemi olarak modellenmiş ve metin madenciliği ve makine öğrenmesi algoritmaları kullanılarak sosyal ağlardaki yanıltıcı içerikleri tespit eden bir metodoloji önerilmiştir. Bu yöntemde içerik metin tabanlı olduğu için metin madenciliği işlemleri yapılmakta ve yapılandırılmamış veri kümeleri yapılandırılmış veri kümelerine dönüştürülmektedir. Ardından denetimli makine öğrenmesi algoritmaları uyarlanmata ve yapılandırılmış veri kümelerine uygulanmaktadır. Bu çalışmada, gerçek halka açık veri setleri kullanılmış ve Destek Vektör Makinesi, k-Nearest Neighbor (k-NN), Naive Bayes (NB), Random Forest, Decision Trees, Gradient Boosted Trees (GBT) ve Logistic Regresyon algoritmaları birçok farklı metrik açısından karşılaştırılmıştır. Veri seti 1’de en yüksek ortalama doğruluk değerini %74.4 GBT algoritmasında elde edilirken, Veri seti 2’de en yüksek ortalama doğruluk değeri %71.2 ile NB algoritmasından elde edilmiştir.

Anahtar Kelimeler

Sınıflandırma , Aldatma Tespiti , Makine Öğrenmesi , Sosyal Ağlar

Kaynakça

Aggarwal, C. C., Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business Media.
Altay, O., Ulas, M., Mahmut, O. Z. E. R., Ece, G. E. N. C. (2019). An expert system to predict warfarin dosage in Turkish patients depending on genetic and non-genetic factors. In IEEE 7th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1-6).
Altunbey Ozbay, F., Alatas, B. (2019). Fake news detection within online social media using supervised artificial intelligence algorithms, Physica A, https://doi.org/10.1016/j.physa.2019.123174.
Azam, N., Yao, J. (2012). Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Systems with Applications, 39(5), 4760-4768.
Baloglu, U. B., Alatas, B., Bingol, H. (2019). Assessment of Supervised Learning Algorithms for Irony Detection in Online Social Media. In 2019 1st International Informatics and Software Engineering Conference (UBMYK) (pp. 1-5). IEEE.
Baydogan, C., Alatas, B. (2021). Metaheuristic Ant Lion and Moth Flame Optimization-Based Novel Approach for Automatic Detection of Hate Speech in Online Social Networks. IEEE Access, 9, 110047-110062.
Bessi, A. (2017) On the statistical properties of viral misinformation in online social media, Physica A 469, 459–470 Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Can, Ü., Alataş, B. (2017). Review of Sentiment Analysis and Opinion Mining Algorithms. International Journal of Pure and Applied Sciences, 3(1), 75-111.
Ceballos Delgado, A. A., Glisson, W., Shashidhar, N., Mcdonald, J., Grispos, G., Benton, R. (2021). Deception Detection Using Machine Learning. In Proceedings of the 54th Hawaii International Conference on System Sciences (p. 7122).
Conroy, N. J., Rubin, V. L., Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community (p. 82). American Society for Information Science.
Dematis, I., Karapistoli, E., Vakali, A. (2018). Fake Review Detection via Exploitation of Spam Indicators and Reviewer Behavior Characteristics. In International Conference on Current Trends in Theory and Practice of Informatics (pp. 581-595). Edizioni Della Normale, Cham.
Ding, M., Zhao, A., Lu, Z., Xiang, T., & Wen, J. R. (2019). Face-focused cross-stream network for deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7802-7811).
Feng, V. W., Hirst, G. (2013). Detecting deceptive opinions with profile compatibility. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 338-346).
Fix, E., Hodges Jr, J. L. (1951). Discriminatory analysis-nonparametric discrimination: consistency properties. California Univ Berkeley.
Friedl, M. A., Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote sensing of environment, 61(3), 399-409.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378. Göker, H., Tekedere, H. (2017). FATIH Projesine Yönelik Görüşlerin Metin Madenciliği Yöntemleri İle Otomatik Değerlendirilmesi. Bilişim Teknolojileri Dergisi, 10(3), 291-299.
Kesarwani, A., Chauhan, S. S., Nair, A. R., & Verma, G. (2021). Supervised Machine Learning Algorithms for Fake News Detection. In Advances in Communication and Computational Technology (pp. 767-778). Springer, Singapore.
Kleinberg, B., Arntz, A., & Verschuere, B. (2019). Being accurate about accuracy in verbal deception detection. PloS one, 14(8), e0220228.
Krishnamurthy, G., Majumder, N., Poria, S., & Cambria, E. (2018). A deep learning approach for multimodal deception detection. arXiv preprint arXiv:1803.00344.
Krishnaveni, N., & Radha, V. (2021). Performance Evaluation of Clustering-Based Classification Algorithms for Detection of Online Spam Reviews. In Data Intelligence and Cognitive Informatics (pp. 255-266). Springer, Singapore.
Kumari, R., Srivastava, S. K. (2017). Machine learning: A review on binary classification. International Journal of Computer Applications, 160(7).
Levine, T. R., Daiku, Y., & Masip, J. (2021). The Number of Senders and Total Judgments Matter More Than Sample Size in Deception-Detection Experiments. Perspectives on Psychological Science, 1745691621990369.
Li, H., Liu, B., Mukherjee, A., Shao, J. (2014). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3), 467-475.
Li, J., Ott, M., Cardie, C., Hovy, E. (2014). Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1566-1576).
Litvinova, O., Seredin, P., Litvinova, T., & Lyell, J. (2017). Deception detection in russian texts. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 43-52).
Masip, J. (2017). Deception detection: State of the art and future prospects. Psicothema, 29(2), 149-159.
Merritts, R. A. (2013). Online Deception Detection Using BDI Agents.
Mullen, L. A., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655.
Osuna, E., Freund, R., Girosit, F. (1997). Training support vector machines: an application to face detection. In Proceedings of IEEE computer society conference on computer vision and pattern recognition (pp. 130-136). IEEE.
Ott, M., Choi, Y., Cardie, C., Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 309-319).
Peng, C. Y. J., Lee, K. L., Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1), 3-14.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883.
Rill-García, R., Jair Escalante, H., Villasenor-Pineda, L., & Reyes-Meza, V. (2019). High-level features for multimodal deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).
Rosso, P., Cagnina, L. C., (2017). Deception Detection and Opinion Spam, A practical guide to sentiment analysis, 155-171, Springer.
Rubin, V. L., Chen, Y., Conroy, N. J. (2015). Deception detection for news: three types of fakes. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community (p. 83). American Society for Information Science.
Rudolph, S. (2015). The impact of online reviews on customers’ buying decisions. Business 2 Community. Sternglanz, R. W., Morris, W. L., Morrow, M., & Braverman, J. (2019). A review of meta-analyses about deception detection. The Palgrave handbook of deceptive communication, 303-326.
Van der Walt, E., Eloff, J. H., & Grobler, J. (2018). Cyber-security: Identity deception detection on social media platforms. Computers & Security, 78, 76-89.
Van Der Zee, S., Poppe, R., Havrileck, A., & Baillon, A. (2021). A personal model of Trumpery: linguistic deception detection in a real-world high-stakes setting. Psychological science, 09567976211015941.
Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021). Evaluating Deep Learning Approaches for Covid19 Fake News Detection. arXiv preprint arXiv:2101.04012.
Zhu, H., Wu, H., Cao, J., Fu, G., Li, H. (2018). Information dissemination model for social media with constant updates, Physica A 502, 469–482

Machine Learning Based Deception Detection System in Online Social Networks

Yıl 2022, Cilt: 8 Sayı: 1, 31 - 42, 30.06.2022

Harun Bingol , Bilal Alatas

https://doi.org/10.29132/ijpas.994840

https://izlik.org/JA58HE76PC

Öz

The rapid dissemination of Internet technologies makes it easier for people to live in terms of access to information. However, in addition to these positive aspects of the internet, negative effects cannot be ignored. The most important of these is to deceive people who have access to information whose reliability is controversial through social media. Deception, in general, aims to direct the thoughts of the people on a particular subject and create a social perception for a specific purpose. The detection of this phenomenon is becoming more and more important due to the enormous increase in the number of people using social networks. Although some researchers have recently proposed techniques for solving the problem of deception detection, there is a need to design and use high-performance systems in terms of different evaluation metrics. In this study, the problem of deception detection in online social networks is modeled as a classification problem and a methodology that detects misleading contents in social networks using text mining and machine learning algorithms is proposed. In this method, since the content is text-based, text mining processes are performed and unstructured data sets are converted to structured data sets. Then supervised machine learning algorithms are adapted and applied to the structured data sets. In this paper, real public data sets are used and Support Vector Machine, k-Nearest Neighbor (k-NN), Naive Bayes, Random Forest, Decision Trees, Gradient Boosted Trees, and Logistic Regression algorithms are compared in terms of many different metrics.

Anahtar Kelimeler

Classification , Deception Detection , Machine Learning , Social Networks

Kaynakça

Aggarwal, C. C., Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business Media.
Altay, O., Ulas, M., Mahmut, O. Z. E. R., Ece, G. E. N. C. (2019). An expert system to predict warfarin dosage in Turkish patients depending on genetic and non-genetic factors. In IEEE 7th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1-6).
Altunbey Ozbay, F., Alatas, B. (2019). Fake news detection within online social media using supervised artificial intelligence algorithms, Physica A, https://doi.org/10.1016/j.physa.2019.123174.
Azam, N., Yao, J. (2012). Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Systems with Applications, 39(5), 4760-4768.
Baloglu, U. B., Alatas, B., Bingol, H. (2019). Assessment of Supervised Learning Algorithms for Irony Detection in Online Social Media. In 2019 1st International Informatics and Software Engineering Conference (UBMYK) (pp. 1-5). IEEE.
Baydogan, C., Alatas, B. (2021). Metaheuristic Ant Lion and Moth Flame Optimization-Based Novel Approach for Automatic Detection of Hate Speech in Online Social Networks. IEEE Access, 9, 110047-110062.
Bessi, A. (2017) On the statistical properties of viral misinformation in online social media, Physica A 469, 459–470 Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Can, Ü., Alataş, B. (2017). Review of Sentiment Analysis and Opinion Mining Algorithms. International Journal of Pure and Applied Sciences, 3(1), 75-111.
Ceballos Delgado, A. A., Glisson, W., Shashidhar, N., Mcdonald, J., Grispos, G., Benton, R. (2021). Deception Detection Using Machine Learning. In Proceedings of the 54th Hawaii International Conference on System Sciences (p. 7122).
Conroy, N. J., Rubin, V. L., Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community (p. 82). American Society for Information Science.
Dematis, I., Karapistoli, E., Vakali, A. (2018). Fake Review Detection via Exploitation of Spam Indicators and Reviewer Behavior Characteristics. In International Conference on Current Trends in Theory and Practice of Informatics (pp. 581-595). Edizioni Della Normale, Cham.
Ding, M., Zhao, A., Lu, Z., Xiang, T., & Wen, J. R. (2019). Face-focused cross-stream network for deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7802-7811).
Feng, V. W., Hirst, G. (2013). Detecting deceptive opinions with profile compatibility. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 338-346).
Fix, E., Hodges Jr, J. L. (1951). Discriminatory analysis-nonparametric discrimination: consistency properties. California Univ Berkeley.
Friedl, M. A., Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote sensing of environment, 61(3), 399-409.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378. Göker, H., Tekedere, H. (2017). FATIH Projesine Yönelik Görüşlerin Metin Madenciliği Yöntemleri İle Otomatik Değerlendirilmesi. Bilişim Teknolojileri Dergisi, 10(3), 291-299.
Kesarwani, A., Chauhan, S. S., Nair, A. R., & Verma, G. (2021). Supervised Machine Learning Algorithms for Fake News Detection. In Advances in Communication and Computational Technology (pp. 767-778). Springer, Singapore.
Kleinberg, B., Arntz, A., & Verschuere, B. (2019). Being accurate about accuracy in verbal deception detection. PloS one, 14(8), e0220228.
Krishnamurthy, G., Majumder, N., Poria, S., & Cambria, E. (2018). A deep learning approach for multimodal deception detection. arXiv preprint arXiv:1803.00344.
Krishnaveni, N., & Radha, V. (2021). Performance Evaluation of Clustering-Based Classification Algorithms for Detection of Online Spam Reviews. In Data Intelligence and Cognitive Informatics (pp. 255-266). Springer, Singapore.
Kumari, R., Srivastava, S. K. (2017). Machine learning: A review on binary classification. International Journal of Computer Applications, 160(7).
Levine, T. R., Daiku, Y., & Masip, J. (2021). The Number of Senders and Total Judgments Matter More Than Sample Size in Deception-Detection Experiments. Perspectives on Psychological Science, 1745691621990369.
Li, H., Liu, B., Mukherjee, A., Shao, J. (2014). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3), 467-475.
Li, J., Ott, M., Cardie, C., Hovy, E. (2014). Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1566-1576).
Litvinova, O., Seredin, P., Litvinova, T., & Lyell, J. (2017). Deception detection in russian texts. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 43-52).
Masip, J. (2017). Deception detection: State of the art and future prospects. Psicothema, 29(2), 149-159.
Merritts, R. A. (2013). Online Deception Detection Using BDI Agents.
Mullen, L. A., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655.
Osuna, E., Freund, R., Girosit, F. (1997). Training support vector machines: an application to face detection. In Proceedings of IEEE computer society conference on computer vision and pattern recognition (pp. 130-136). IEEE.
Ott, M., Choi, Y., Cardie, C., Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 309-319).
Peng, C. Y. J., Lee, K. L., Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1), 3-14.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883.
Rill-García, R., Jair Escalante, H., Villasenor-Pineda, L., & Reyes-Meza, V. (2019). High-level features for multimodal deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).
Rosso, P., Cagnina, L. C., (2017). Deception Detection and Opinion Spam, A practical guide to sentiment analysis, 155-171, Springer.
Rubin, V. L., Chen, Y., Conroy, N. J. (2015). Deception detection for news: three types of fakes. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community (p. 83). American Society for Information Science.
Rudolph, S. (2015). The impact of online reviews on customers’ buying decisions. Business 2 Community. Sternglanz, R. W., Morris, W. L., Morrow, M., & Braverman, J. (2019). A review of meta-analyses about deception detection. The Palgrave handbook of deceptive communication, 303-326.
Van der Walt, E., Eloff, J. H., & Grobler, J. (2018). Cyber-security: Identity deception detection on social media platforms. Computers & Security, 78, 76-89.
Van Der Zee, S., Poppe, R., Havrileck, A., & Baillon, A. (2021). A personal model of Trumpery: linguistic deception detection in a real-world high-stakes setting. Psychological science, 09567976211015941.
Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021). Evaluating Deep Learning Approaches for Covid19 Fake News Detection. arXiv preprint arXiv:2101.04012.
Zhu, H., Wu, H., Cao, J., Fu, G., Li, H. (2018). Information dissemination model for social media with constant updates, Physica A 502, 469–482

Toplam 40 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Harun Bingol 0000-0001-5071-4616 Bilal Alatas 0000-0002-3513-0329
Gönderilme Tarihi	13 Eylül 2021
Kabul Tarihi	4 Şubat 2022
Yayımlanma Tarihi	30 Haziran 2022
DOI	https://doi.org/10.29132/ijpas.994840
IZ	https://izlik.org/JA58HE76PC
Yayımlandığı Sayı	Yıl 2022 Cilt: 8 Sayı: 1

Kaynak Göster

APA	Bingol, H., & Alatas, B. (2022). Machine Learning Based Deception Detection System in Online Social Networks. International Journal of Pure and Applied Sciences, 8(1), 31-42. https://doi.org/10.29132/ijpas.994840
AMA	1.Bingol H, Alatas B. Machine Learning Based Deception Detection System in Online Social Networks. International Journal of Pure and Applied Sciences. 2022;8(1):31-42. doi:10.29132/ijpas.994840
Chicago	Bingol, Harun, ve Bilal Alatas. 2022. “Machine Learning Based Deception Detection System in Online Social Networks”. International Journal of Pure and Applied Sciences 8 (1): 31-42. https://doi.org/10.29132/ijpas.994840.
EndNote	Bingol H, Alatas B (01 Haziran 2022) Machine Learning Based Deception Detection System in Online Social Networks. International Journal of Pure and Applied Sciences 8 1 31–42.
IEEE	[1]H. Bingol ve B. Alatas, “Machine Learning Based Deception Detection System in Online Social Networks”, International Journal of Pure and Applied Sciences, c. 8, sy 1, ss. 31–42, Haz. 2022, doi: 10.29132/ijpas.994840.
ISNAD	Bingol, Harun - Alatas, Bilal. “Machine Learning Based Deception Detection System in Online Social Networks”. International Journal of Pure and Applied Sciences 8/1 (01 Haziran 2022): 31-42. https://doi.org/10.29132/ijpas.994840.
JAMA	1.Bingol H, Alatas B. Machine Learning Based Deception Detection System in Online Social Networks. International Journal of Pure and Applied Sciences. 2022;8:31–42.
MLA	Bingol, Harun, ve Bilal Alatas. “Machine Learning Based Deception Detection System in Online Social Networks”. International Journal of Pure and Applied Sciences, c. 8, sy 1, Haziran 2022, ss. 31-42, doi:10.29132/ijpas.994840.
Vancouver	1.Bingol H, Alatas B. Machine Learning Based Deception Detection System in Online Social Networks. International Journal of Pure and Applied Sciences [Internet]. 01 Haziran 2022;8(1):31-42. Erişim adresi: https://izlik.org/JA58HE76PC

Makale Dosyaları

Tam Metin