Research Article
BibTex RIS Cite

Sosyal Ağlar için Hibrit Bir Spam Algılama Framework

Year 2023, , 823 - 837, 05.07.2023
https://doi.org/10.2339/politeknik.933785

Abstract

Sosyal ağların yaygınlaşması bu platformların kötü niyetli kişilerin hedefi haline gelmesine neden olmaktadır. Sosyal ağların kendi spam tespit sistemleri olmasına rağmen, bu sistemler bazen sosyal ağlarındaki spamları engelleyememektedir. Spam içerikler ve mesajlar, sosyal ağ kullanıcılarının güvenliğini ve performansını tehdit etmektedir. Bu çalışmada, üç bileşene dayalı bir spam hesap tespit modeli önerilmektedir. Kısa bağlantı analizi, makine öğrenmesi ve metin analizi önerilen modelde birlikte kullanılan bileşenlerdir. Bu amaçla, öncelikle bir veri seti oluşturulmuştur ve spam hesapların özellikleri belirlenmiştir. Sonra, bu veri setindeki mesajlarda yer alan hyperlinkler link analizi bileşeni ile analiz edilmektedir. Makine öğrenimi bileşeni, önceden belirlenen özniteliklere göre modellenmektedir. Ayrıca, sosyal ağ kullanıcılarının mesajları metin analizi yöntemi ilede analiz edilmektedir. Önerilen modelin web tabanlı bir uygulaması hayata geçirilmektedir. Önerilen model sayesinde yapılan deneysel çalışmalar sonucunda, önerilen modelin %95.69 oranında doğru performans gösterdiği tespit edilmektedir. Bu makalenin başarısının hesağlanmasında, hassas içerik oranının etkisi ile F puanı ve kesinlik değerlendirme metriklerine göre hesaplanmaktadır. Bu çalışmada, sosyal ağlardaki spam hesapların tespit edilmesi ve bu ağların spam tespit politikasının desteklenmesi amaçlanmaktadır.

References

  • [1] Erdoğan G. and Bahtiyar Ş., “Sosyal ağlarda güvenlik”, Akademik Bilişim Konferansı, 1-6, (2014).
  • [2] https://makeawebsitehub.com/social-media-sites/, “95+ Social Networking Sites You Need To Know About In 2021”, (16 January 2021).
  • [3] Kabakus A. T. and Kara R., “A survey of spam detection methods on twitter”, International Journal of Advanced Computer Science and Applications, 8(3): 29-38, (2017).
  • [4] https://dijilopedi.com/2020-turkiye-internet-kullanimi-ve-sosyal-medya-istatistikleri/, “2020 Türkiye İnternet Kullanımı ve Sosyal Medya İstatistikleri”,(17 April 2021).
  • [5] Wang S., Chen Z., Yan Q., Ji K., Peng L., Yang B. and Conti M., “Deep and broad URL feature mining for android malware detection”, Information Sciences, 513: 600-613, (2020).
  • [6] Hong J., Kim T., Liu J., Park N. and Kim S. W., “Phishing url detection with lexical features and blacklisted domains”, In Adaptive Autonomous Secure Cyber Systems, Springer, Cham, 253-267, (2020).
  • [7] https://help.twitter.com/en/safety-and-security/phishing-spam-and- malware-links, “About unsafe links Twitter spam or malware links and blocking links”, (11 May 2021).
  • [8] Buecheler T., Sieg J. H., Füchslin R. M. and Pfeifer R., ”Crowdsourcing, open innovation and collective intelligence in the scientific method: a research agenda and operational framework” In The12th International Conference on the Synthesis and Simulation of Living Systems, Odense, Denmark, MIT Press, 679-686, (2010).
  • [9] Dent K., and Paul S., “Through the twitter glass: Detecting questions in micro-text”, arXiv preprint arXiv:2006.07732, (2020).
  • [10] Hendal B., “Hashtags as Crowdsourcing: A Case Study of Arabic Hashtags on Twitter”, Social Networking, 8(4): 158-173, (2019).
  • [11] Suzuki Y., “Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning”,Journal of Information Processing, 27: 404-410, (2019).
  • [12] Mata S. J. I., “Anomaly Detection as a Method for Uncovering Twitter Bots”, (2019).
  • [13] Akiyama M., Yagi T., Mori T. and Kadobayashi Y. “Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots”, Computers & Security, 69: 155-173, (2017).
  • [14] Fernandes M. A., Patel P. and Marwala T., “Automated detection of human users in Twitter” Procedia Computer Science, 53: 224-231, (2015).
  • [15] Wang A. H., “Don't follow me: Spam detection in twitter”, In Security and cryptography (SECRYPT), proceedings of the 2010 international conference on, IEEE, 1-10, (2010).
  • [16] Romo J. and Araujo L., “Detecting malicious tweets in trending topics using a statistical analysis of language”, Expert Systems with Applications, 8: 2992-3000, (2013).
  • [17] Liu S., Zhang J., Wang Y. and Xiang Y., “Fuzzy-based feature and instance recovery”. In Asian Conference on Intelligent Information and Database Systems, Berlin, Heidelberg, 605-615, (2016).
  • [18] Lee S. and Kim J., “Early filtering of ephemeral malicious accounts on Twitter”, Computer Communications, 54: 48-57, (2014).
  • [19] Miller Z., Dickinson B., Deitrick W., Hu W. and Wang A. H., “Twitter spammer detection using data stream clustering”, Information Sciences, 260: 64-73, (2014).
  • [20] Demaio C., Fenza G., Gallo M., Loia V. and Parente M.,”Time-aware adaptive tweets ranking through deep learning”, Future Generation Computer Systems, 93: 924-932, (2019).
  • [21] Chatterjee A., Gupta U., Chinnakotla M. K., Srikanth R., Galley M., and Agrawal P., “Understanding emotions in text using deep learning and big data”, Computers in Human Behavior, 93: 309-317, (2019).
  • [22] https://apps.twitter.com/app/13644526/keys, “Twitter API page”, (14 May 2021).
  • [23] https://developer.twitter.com/en/community#, “Twitter Community Developer”, (16 May 2021).
  • [24] Ahmed F. and Abulaish M., “A generic statistical approach for spam detection in Online Social Networks”, Computer Communications, 36: 1120-1129, (2013).
  • [25] Çıtlak O., Dörterler M. and Doğru, İ. A., “A survey on detecting spam accounts on Twitter network”, Social Network Analysis and Mining, 9: 1-13, (2019).
  • [26] Lüdering J. and Tillmann P., “Monetary policy on twitter and asset prices: Evidence from computational text analysis”, The North American Journal of Economics and Finance, 51: 100875, (2020).
  • [27] Karamollaoğlu H., Doğru İ. A. and Utku A., “Identification of shares containing offensive charge in social media”, In 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, 1-4, (2017).
  • [28] Grandjean M., “A social network analysis of Twitter: Mapping the digital humanities community”, Cogent Arts & Humanities, 3.1, 1171458, (2016).
  • [29] Alom Z., Carminati B. and Ferrari E., “A deep learning model for Twitter spam detection”, Online Social Networks and Media, 18: 100079, (2020).
  • [30] Arici N. and Yildiz E., “Gercek Zamanli Bir Saldiri Tespit Sistemi Tasarimi Ve Gerceklestirme”, Engineering Sciences, 5.2: 143-159,(2010).
  • [31] Gupta N., Aggarwal A. and Kumaraguru P., ”bit. ly/malicious: Deep dive into short url based e-crime detection”, APWG Symposium on Electronic Crime Research (eCrime), IEEE, 14-24, (2014).
  • [32] Çıtlak O., Doğru İ. A. and Dörterler M., “A Spam Detection System with Short Link Analysis”, 10th International Conference on Information Security and Cryptology (ISCTURKEY 2017), Ankara, 178-185,(2017).
  • [33] Nepali R. K. and Wang Y., “You look suspicious!!: Leveraging visible attributes to classify malicious short urls on twitter”, 49th Hawaii International Conference on System Sciences (HICSS), IEEE, 2648-2655, (2016).
  • [34] Ren J., Lee S. D., Chen X., Kao B., Cheng R. and Cheung D., “Naive bayes classification of uncertain data. In Data Mining”, 9th IEEE International Conference, IEEE, 944-949, (2009).
  • [35]. Simsek M., Yilmaz O., Kahriman A. H. and Sabah L., “Detecting Fake Twitter Accounts with using Artificial Neural Networks”, Artificial Intelligence Studies, 1.1: 26-29, (2018).
  • [36] Liu S., Wang Y., Zhang J., Chen C. and Xiang Y., ”Addressing the class imbalance problem in twitter spam detection using ensemble learning”, Computers & Security, 69: 35-49, (2017).
  • [37] Kabakus A. T. And Kara R., “TwitterSentiDetector: a domain-independent Twitter sentiment analyser”, INFOR: Information Systems and Operational Research, 56.2: 137-162, (2018).
  • [38] Wu T., Liu S., Zhang J. and Xiang Y., “Twitter spam detection based on deep learning”, In Proceedings of the australasian computer science week multiconference, 1-8, (2017).
  • [39] Henderson P. and Ferrari V., “End-to-end training of object class detectors for mean average precision”, In Asian Conference on Computer Vision, Springer, Cham, 198-213, (2016).
  • [40] Sharma A., Tian Y., Sulistya A., Lo D. and Yamashita A. F., “Harnessing Twitter to support serendipitous learning of developers”, In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 387-391, (2017).
  • [41] Goldberg Y. and Levy O., “word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method”, arXiv preprint arXiv:1402.3722, (2014).
  • [42] Arslan R. S. and Barışçı N., “Development of output correction methodology for long short term memory-based speech recognition”, Sustainability, 11.15: 4250, (2019).
  • [43] Pennington J., Socher R. and Manning C. D., “Glove: Global vectors for word representation”, In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP),1532-1543, (2014).
  • [44] Athiwaratkun B., Wilson A. G. and Anandkumar A., “Probabilistic fasttext for multi-sense word embeddings”, arXiv preprint arXiv:1806.02901, (2018).
  • [45] Etaati L., “Deep Learning Tools with Cognitive Toolkit (CNTK)”, In Machine Learning with Microsoft Technologies, Apress, Berkeley, CA, 287-302, (2019).
  • [46] Winston W., “Microsoft Excel data analysis and business modeling”, Microsoft press, (2016).
  • [47] Gonçalves B. and Sánchez D., “Crowdsourcing dialect characterization through twitter”, PloS one, 9.11: e112074, (2014).
  • [48] Bessho F., Harada T. and Kuniyoshi Y., “Dialog system using real-time crowdsourcing and twitter large-scale corpus”, In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 227-231, (2012).
  • [49] Finin T., Murnane W., Karandikar A., Keller N., Martineau J. and Dredze M., “Annotating named entities in Twitter data with crowdsourcing”, In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics, 80-88, (2010).
  • [50] Yadav K., Kumaraguru P., Goyal A., Gupta A. and Naik V.,”SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering”, In Proceedings of the 12th Workshop on Mobile Computing Systems and Applications,1-6, (2011).
  • [51]]]https://drive.google.com/drive/folders/1lQxCokjXov7bWHAMjsXxcrBOunZNX_bW, The Twitter dataset used in this manuscript can be accessed from this link or contact to author, (15 May 2021).
  • [52] https://developers.virustotal.com/reference#file-search, This is the web page of the virus total site, (13 January 2021)
  • [53] Gupta A. and Kaushal R.,”Improving spam detection in online social networks”, In 2015 International conference on cognitive computing and information processing (CCIP), IEEE, 1-6, (2015).
  • [54] Mahmoud T. M. and Mahfouz A. M., “SMS spam filtering technique based on artificial immune system”, International Journal of Computer Science Issues (IJCSI), 9.2: 589, (2012).
  • [55] Yadav K., Kumaraguru P., Goyal A., Gupta, A. and Naik V., “SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering”, In Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, 1-6, (2011).
  • [56] Nuruzzaman M. T., Lee C. and Choi D., “Independent and personal SMS spam filtering”, 11th International Conference on Computer and Information Technology, IEEE, 429-435, (2011).
  • [57] Swe M. M. and Myo N. N.,”Fake accounts detection on twitter using blacklist”, 17th International Conference on Computer and Information Science (ICIS),IEEE, 562-566, (2018).
  • [58] https://prospect.io/blog/455-email-spam-trigger-words-avoid-2018/, Some Sensitive words used in social networks, (9 January 2021).
  • [59] Patil T. R. and Sherekar S. S., “Performance analysis of Naive Bayes and J48 classification algorithm for data classification”, International journal of computer science and applications, 6.2: 256-261, (2013).
  • [60] Genuer R., Poggi J. M. and Malot C., “Variable selection using random forests”,Pattern Recognition Letters, 31.14: 2225-2236, (2010).
  • [61] Moradian M. and Baraani A., “KNNBA: K-Nearest Neighbor Based Association Algorithm”. Journal of Theoretical & Applied Information Technology, 6.1: (2009).
  • [62] Boahen E. K., Changda W. And Elvire B. M., “Detection of Compromised Online Social Network Account with an Enhanced Knn”, Applied Artificial Intelligence, 34.11: 777-791, (2020).
  • [63] Kaur G. and Chhabra A., “Improved J48 classification algorithm for the prediction of diabetes”, International Journal of Computer Applications, 98.22: (2014).
  • [64] Rajput A., Aharwal R. P., Dubey M., Saxena S. P. and Raghuvanshi M., “J48 and JRIP rules for e-governance data”, International Journal of Computer Science and Security (IJCSS), 5.2: 201, (2011).
  • [65] Tapkan P. Z. and Özmen T., “Determining the spam quality by feature selection and classification in a social media”, Pamukkale University Journal of Engineering Sciences, 4: 713-719, (2018).
  • [66] Gerbet T., Kumar A. and Lauradoux C., “A privacy analysis of Google and Yandex safe browsing”, 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 347-358, (2016).
  • [67] https://www.virustotal.com/intelligence/help, “This is the full list of allowed”, (16 May 2021).
  • [68] Peng P., Yang L., Song L. and Wang G.,”Opening the blackbox of virustotal: Analyzing online phishing scan engines”, In Proceedings of the Internet Measurement Conference, 478-485, (2019). [69] Salem A., Banescu S. and Pretschner A., “Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection”, arXiv preprint arXiv:2007.00510, (2020).
  • [70]]]https://www.virustotal.com/gui/user/oguzhancitlak/apikey, “more about the API functionality in the Virus Total Developer Hub”, (16 May 2021).
  • [71] Witten I. H. and Frank E., “Weka. Machine Learning Algorithms in Java”, 265-320, (2000).
  • [72] Sharma R. C. Hara K. and Hirayama H., “A machine learning and cross-validation approach for the discrimination of vegetation physiognomic types using satellite based multispectral and multitemporal data”, Scientifica, (2017).
  • [73] Dener M., Dörterler M. and Orman A., “Açık kaynak kodlu veri madenciliği programları: WEKA’da örnek uygulama”, Akademik Bilişim, 9: 11-13, (2009).
  • [74] Baskin I. I., Marcou G., Horvath D. and Varnek A., “Cross‐Validation and the Variable Selection Bias”, Tutorials in Chemoinformatics, 163-173, (2017).
  • [75] Foozy C. F. M., Ahmad R., Abdollah M. F. and Wen, C. C., ”A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam”, In IOP Conference Series: Materials Science and Engineering, IOP Publishing, 226.1: 012100, (2017).
  • [76] Xu Q. S. and Liang Y. Z., “Monte Carlo cross validation”, Chemometrics and Intelligent Laboratory Systems, 56.1: 1-11, (2001).
  • [77] Smyth P.,”Clustering Using Monte Carlo Cross-Validation”, In Kdd, 1: 26-133, (1996).
  • [78] https://www.cs.waikato.ac.nz/ml/weka/index.html, “Weka is tried and tested open source machine learning software”, (16 May 2021).
  • [79] Nasukawa T. and Nagano T., “Text analysis and knowledge mining system”, IBM systems journal, 40.4: 967-984, (2001).
  • [80] Baldry A., Thibault P. J., “Multimodal transcription and text analysis”, London: Equinox, 26, (2005).
  • [81] Bozan Y. S., Çoban Ö., Özyer G. T. and Özyer B., “SMS spam filtering based on text classification and expert system”, 23nd Signal Processing and Communications Applications Conference (SIU), IEEE, 2345-2348 , (2015).
  • [82] Colladon A. F. and Gloor P. A., “Measuring the impact of spammers on e-mail and Twitter networks”, International Journal of Information Management, 48: 254-262, (2019).
  • [83] Gloor P. A., Laubacher R., Dynes S. B. and Zhao Y., “Visualization of communication patterns in collaborative innovation networks-analysis of some w3c working groups”, In Proceedings of the twelfth international conference on Information and knowledge management, 56-60, (2003).
  • [84] Bayrakdar S., Yucedag I., Simsek M. and Dogru I. A., “Semantic analysis on social networks: A survey”, International Journal of Communication Systems, e4424, (2020).

A Hybrid Spam Detection Framework for Social Networks

Year 2023, , 823 - 837, 05.07.2023
https://doi.org/10.2339/politeknik.933785

Abstract

The widespread use of social networks has caused these platforms to become the target of malicious people. Although social networks have their own spam detection systems, these systems sometimes may not prevent spams in their social networks. Spam contents and messages threaten the security and performance of users of these networks. A spam account detection framework based on three components is proposed in this study. Short link analysis, machine learning and text analysis are the components used together in the proposed framework. First, a dataset was created for this purpose and the attributes of spam accounts were determined. Later, the hyperlinks in the messages in this dataset were analyzed through link analysis component. The machine learning component was modelled through attributes. Moreover, the messages of the social network users were analyzed through text analysis method. A web-based application of the proposed model was put into practice. As a result of the experimental studies carried out thanks to the framework, it was determined that the proposed framework showed a performance of 95.69 %. The success of this article was calculated according to the F-measure and precision evaluation metrics under the influence of sensitive content rate. It is aimed to detect spam accounts on social network and the spam detection policy of these networks is intended to support.

References

  • [1] Erdoğan G. and Bahtiyar Ş., “Sosyal ağlarda güvenlik”, Akademik Bilişim Konferansı, 1-6, (2014).
  • [2] https://makeawebsitehub.com/social-media-sites/, “95+ Social Networking Sites You Need To Know About In 2021”, (16 January 2021).
  • [3] Kabakus A. T. and Kara R., “A survey of spam detection methods on twitter”, International Journal of Advanced Computer Science and Applications, 8(3): 29-38, (2017).
  • [4] https://dijilopedi.com/2020-turkiye-internet-kullanimi-ve-sosyal-medya-istatistikleri/, “2020 Türkiye İnternet Kullanımı ve Sosyal Medya İstatistikleri”,(17 April 2021).
  • [5] Wang S., Chen Z., Yan Q., Ji K., Peng L., Yang B. and Conti M., “Deep and broad URL feature mining for android malware detection”, Information Sciences, 513: 600-613, (2020).
  • [6] Hong J., Kim T., Liu J., Park N. and Kim S. W., “Phishing url detection with lexical features and blacklisted domains”, In Adaptive Autonomous Secure Cyber Systems, Springer, Cham, 253-267, (2020).
  • [7] https://help.twitter.com/en/safety-and-security/phishing-spam-and- malware-links, “About unsafe links Twitter spam or malware links and blocking links”, (11 May 2021).
  • [8] Buecheler T., Sieg J. H., Füchslin R. M. and Pfeifer R., ”Crowdsourcing, open innovation and collective intelligence in the scientific method: a research agenda and operational framework” In The12th International Conference on the Synthesis and Simulation of Living Systems, Odense, Denmark, MIT Press, 679-686, (2010).
  • [9] Dent K., and Paul S., “Through the twitter glass: Detecting questions in micro-text”, arXiv preprint arXiv:2006.07732, (2020).
  • [10] Hendal B., “Hashtags as Crowdsourcing: A Case Study of Arabic Hashtags on Twitter”, Social Networking, 8(4): 158-173, (2019).
  • [11] Suzuki Y., “Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning”,Journal of Information Processing, 27: 404-410, (2019).
  • [12] Mata S. J. I., “Anomaly Detection as a Method for Uncovering Twitter Bots”, (2019).
  • [13] Akiyama M., Yagi T., Mori T. and Kadobayashi Y. “Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots”, Computers & Security, 69: 155-173, (2017).
  • [14] Fernandes M. A., Patel P. and Marwala T., “Automated detection of human users in Twitter” Procedia Computer Science, 53: 224-231, (2015).
  • [15] Wang A. H., “Don't follow me: Spam detection in twitter”, In Security and cryptography (SECRYPT), proceedings of the 2010 international conference on, IEEE, 1-10, (2010).
  • [16] Romo J. and Araujo L., “Detecting malicious tweets in trending topics using a statistical analysis of language”, Expert Systems with Applications, 8: 2992-3000, (2013).
  • [17] Liu S., Zhang J., Wang Y. and Xiang Y., “Fuzzy-based feature and instance recovery”. In Asian Conference on Intelligent Information and Database Systems, Berlin, Heidelberg, 605-615, (2016).
  • [18] Lee S. and Kim J., “Early filtering of ephemeral malicious accounts on Twitter”, Computer Communications, 54: 48-57, (2014).
  • [19] Miller Z., Dickinson B., Deitrick W., Hu W. and Wang A. H., “Twitter spammer detection using data stream clustering”, Information Sciences, 260: 64-73, (2014).
  • [20] Demaio C., Fenza G., Gallo M., Loia V. and Parente M.,”Time-aware adaptive tweets ranking through deep learning”, Future Generation Computer Systems, 93: 924-932, (2019).
  • [21] Chatterjee A., Gupta U., Chinnakotla M. K., Srikanth R., Galley M., and Agrawal P., “Understanding emotions in text using deep learning and big data”, Computers in Human Behavior, 93: 309-317, (2019).
  • [22] https://apps.twitter.com/app/13644526/keys, “Twitter API page”, (14 May 2021).
  • [23] https://developer.twitter.com/en/community#, “Twitter Community Developer”, (16 May 2021).
  • [24] Ahmed F. and Abulaish M., “A generic statistical approach for spam detection in Online Social Networks”, Computer Communications, 36: 1120-1129, (2013).
  • [25] Çıtlak O., Dörterler M. and Doğru, İ. A., “A survey on detecting spam accounts on Twitter network”, Social Network Analysis and Mining, 9: 1-13, (2019).
  • [26] Lüdering J. and Tillmann P., “Monetary policy on twitter and asset prices: Evidence from computational text analysis”, The North American Journal of Economics and Finance, 51: 100875, (2020).
  • [27] Karamollaoğlu H., Doğru İ. A. and Utku A., “Identification of shares containing offensive charge in social media”, In 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, 1-4, (2017).
  • [28] Grandjean M., “A social network analysis of Twitter: Mapping the digital humanities community”, Cogent Arts & Humanities, 3.1, 1171458, (2016).
  • [29] Alom Z., Carminati B. and Ferrari E., “A deep learning model for Twitter spam detection”, Online Social Networks and Media, 18: 100079, (2020).
  • [30] Arici N. and Yildiz E., “Gercek Zamanli Bir Saldiri Tespit Sistemi Tasarimi Ve Gerceklestirme”, Engineering Sciences, 5.2: 143-159,(2010).
  • [31] Gupta N., Aggarwal A. and Kumaraguru P., ”bit. ly/malicious: Deep dive into short url based e-crime detection”, APWG Symposium on Electronic Crime Research (eCrime), IEEE, 14-24, (2014).
  • [32] Çıtlak O., Doğru İ. A. and Dörterler M., “A Spam Detection System with Short Link Analysis”, 10th International Conference on Information Security and Cryptology (ISCTURKEY 2017), Ankara, 178-185,(2017).
  • [33] Nepali R. K. and Wang Y., “You look suspicious!!: Leveraging visible attributes to classify malicious short urls on twitter”, 49th Hawaii International Conference on System Sciences (HICSS), IEEE, 2648-2655, (2016).
  • [34] Ren J., Lee S. D., Chen X., Kao B., Cheng R. and Cheung D., “Naive bayes classification of uncertain data. In Data Mining”, 9th IEEE International Conference, IEEE, 944-949, (2009).
  • [35]. Simsek M., Yilmaz O., Kahriman A. H. and Sabah L., “Detecting Fake Twitter Accounts with using Artificial Neural Networks”, Artificial Intelligence Studies, 1.1: 26-29, (2018).
  • [36] Liu S., Wang Y., Zhang J., Chen C. and Xiang Y., ”Addressing the class imbalance problem in twitter spam detection using ensemble learning”, Computers & Security, 69: 35-49, (2017).
  • [37] Kabakus A. T. And Kara R., “TwitterSentiDetector: a domain-independent Twitter sentiment analyser”, INFOR: Information Systems and Operational Research, 56.2: 137-162, (2018).
  • [38] Wu T., Liu S., Zhang J. and Xiang Y., “Twitter spam detection based on deep learning”, In Proceedings of the australasian computer science week multiconference, 1-8, (2017).
  • [39] Henderson P. and Ferrari V., “End-to-end training of object class detectors for mean average precision”, In Asian Conference on Computer Vision, Springer, Cham, 198-213, (2016).
  • [40] Sharma A., Tian Y., Sulistya A., Lo D. and Yamashita A. F., “Harnessing Twitter to support serendipitous learning of developers”, In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 387-391, (2017).
  • [41] Goldberg Y. and Levy O., “word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method”, arXiv preprint arXiv:1402.3722, (2014).
  • [42] Arslan R. S. and Barışçı N., “Development of output correction methodology for long short term memory-based speech recognition”, Sustainability, 11.15: 4250, (2019).
  • [43] Pennington J., Socher R. and Manning C. D., “Glove: Global vectors for word representation”, In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP),1532-1543, (2014).
  • [44] Athiwaratkun B., Wilson A. G. and Anandkumar A., “Probabilistic fasttext for multi-sense word embeddings”, arXiv preprint arXiv:1806.02901, (2018).
  • [45] Etaati L., “Deep Learning Tools with Cognitive Toolkit (CNTK)”, In Machine Learning with Microsoft Technologies, Apress, Berkeley, CA, 287-302, (2019).
  • [46] Winston W., “Microsoft Excel data analysis and business modeling”, Microsoft press, (2016).
  • [47] Gonçalves B. and Sánchez D., “Crowdsourcing dialect characterization through twitter”, PloS one, 9.11: e112074, (2014).
  • [48] Bessho F., Harada T. and Kuniyoshi Y., “Dialog system using real-time crowdsourcing and twitter large-scale corpus”, In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 227-231, (2012).
  • [49] Finin T., Murnane W., Karandikar A., Keller N., Martineau J. and Dredze M., “Annotating named entities in Twitter data with crowdsourcing”, In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics, 80-88, (2010).
  • [50] Yadav K., Kumaraguru P., Goyal A., Gupta A. and Naik V.,”SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering”, In Proceedings of the 12th Workshop on Mobile Computing Systems and Applications,1-6, (2011).
  • [51]]]https://drive.google.com/drive/folders/1lQxCokjXov7bWHAMjsXxcrBOunZNX_bW, The Twitter dataset used in this manuscript can be accessed from this link or contact to author, (15 May 2021).
  • [52] https://developers.virustotal.com/reference#file-search, This is the web page of the virus total site, (13 January 2021)
  • [53] Gupta A. and Kaushal R.,”Improving spam detection in online social networks”, In 2015 International conference on cognitive computing and information processing (CCIP), IEEE, 1-6, (2015).
  • [54] Mahmoud T. M. and Mahfouz A. M., “SMS spam filtering technique based on artificial immune system”, International Journal of Computer Science Issues (IJCSI), 9.2: 589, (2012).
  • [55] Yadav K., Kumaraguru P., Goyal A., Gupta, A. and Naik V., “SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering”, In Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, 1-6, (2011).
  • [56] Nuruzzaman M. T., Lee C. and Choi D., “Independent and personal SMS spam filtering”, 11th International Conference on Computer and Information Technology, IEEE, 429-435, (2011).
  • [57] Swe M. M. and Myo N. N.,”Fake accounts detection on twitter using blacklist”, 17th International Conference on Computer and Information Science (ICIS),IEEE, 562-566, (2018).
  • [58] https://prospect.io/blog/455-email-spam-trigger-words-avoid-2018/, Some Sensitive words used in social networks, (9 January 2021).
  • [59] Patil T. R. and Sherekar S. S., “Performance analysis of Naive Bayes and J48 classification algorithm for data classification”, International journal of computer science and applications, 6.2: 256-261, (2013).
  • [60] Genuer R., Poggi J. M. and Malot C., “Variable selection using random forests”,Pattern Recognition Letters, 31.14: 2225-2236, (2010).
  • [61] Moradian M. and Baraani A., “KNNBA: K-Nearest Neighbor Based Association Algorithm”. Journal of Theoretical & Applied Information Technology, 6.1: (2009).
  • [62] Boahen E. K., Changda W. And Elvire B. M., “Detection of Compromised Online Social Network Account with an Enhanced Knn”, Applied Artificial Intelligence, 34.11: 777-791, (2020).
  • [63] Kaur G. and Chhabra A., “Improved J48 classification algorithm for the prediction of diabetes”, International Journal of Computer Applications, 98.22: (2014).
  • [64] Rajput A., Aharwal R. P., Dubey M., Saxena S. P. and Raghuvanshi M., “J48 and JRIP rules for e-governance data”, International Journal of Computer Science and Security (IJCSS), 5.2: 201, (2011).
  • [65] Tapkan P. Z. and Özmen T., “Determining the spam quality by feature selection and classification in a social media”, Pamukkale University Journal of Engineering Sciences, 4: 713-719, (2018).
  • [66] Gerbet T., Kumar A. and Lauradoux C., “A privacy analysis of Google and Yandex safe browsing”, 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 347-358, (2016).
  • [67] https://www.virustotal.com/intelligence/help, “This is the full list of allowed”, (16 May 2021).
  • [68] Peng P., Yang L., Song L. and Wang G.,”Opening the blackbox of virustotal: Analyzing online phishing scan engines”, In Proceedings of the Internet Measurement Conference, 478-485, (2019). [69] Salem A., Banescu S. and Pretschner A., “Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection”, arXiv preprint arXiv:2007.00510, (2020).
  • [70]]]https://www.virustotal.com/gui/user/oguzhancitlak/apikey, “more about the API functionality in the Virus Total Developer Hub”, (16 May 2021).
  • [71] Witten I. H. and Frank E., “Weka. Machine Learning Algorithms in Java”, 265-320, (2000).
  • [72] Sharma R. C. Hara K. and Hirayama H., “A machine learning and cross-validation approach for the discrimination of vegetation physiognomic types using satellite based multispectral and multitemporal data”, Scientifica, (2017).
  • [73] Dener M., Dörterler M. and Orman A., “Açık kaynak kodlu veri madenciliği programları: WEKA’da örnek uygulama”, Akademik Bilişim, 9: 11-13, (2009).
  • [74] Baskin I. I., Marcou G., Horvath D. and Varnek A., “Cross‐Validation and the Variable Selection Bias”, Tutorials in Chemoinformatics, 163-173, (2017).
  • [75] Foozy C. F. M., Ahmad R., Abdollah M. F. and Wen, C. C., ”A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam”, In IOP Conference Series: Materials Science and Engineering, IOP Publishing, 226.1: 012100, (2017).
  • [76] Xu Q. S. and Liang Y. Z., “Monte Carlo cross validation”, Chemometrics and Intelligent Laboratory Systems, 56.1: 1-11, (2001).
  • [77] Smyth P.,”Clustering Using Monte Carlo Cross-Validation”, In Kdd, 1: 26-133, (1996).
  • [78] https://www.cs.waikato.ac.nz/ml/weka/index.html, “Weka is tried and tested open source machine learning software”, (16 May 2021).
  • [79] Nasukawa T. and Nagano T., “Text analysis and knowledge mining system”, IBM systems journal, 40.4: 967-984, (2001).
  • [80] Baldry A., Thibault P. J., “Multimodal transcription and text analysis”, London: Equinox, 26, (2005).
  • [81] Bozan Y. S., Çoban Ö., Özyer G. T. and Özyer B., “SMS spam filtering based on text classification and expert system”, 23nd Signal Processing and Communications Applications Conference (SIU), IEEE, 2345-2348 , (2015).
  • [82] Colladon A. F. and Gloor P. A., “Measuring the impact of spammers on e-mail and Twitter networks”, International Journal of Information Management, 48: 254-262, (2019).
  • [83] Gloor P. A., Laubacher R., Dynes S. B. and Zhao Y., “Visualization of communication patterns in collaborative innovation networks-analysis of some w3c working groups”, In Proceedings of the twelfth international conference on Information and knowledge management, 56-60, (2003).
  • [84] Bayrakdar S., Yucedag I., Simsek M. and Dogru I. A., “Semantic analysis on social networks: A survey”, International Journal of Communication Systems, e4424, (2020).
There are 83 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Article
Authors

Oğuzhan Çıtlak 0000-0001-9545-2795

Murat Dörterler 0000-0003-1127-515X

İbrahim Dogru 0000-0001-9324-7157

Publication Date July 5, 2023
Submission Date May 6, 2021
Published in Issue Year 2023

Cite

APA Çıtlak, O., Dörterler, M., & Dogru, İ. (2023). A Hybrid Spam Detection Framework for Social Networks. Politeknik Dergisi, 26(2), 823-837. https://doi.org/10.2339/politeknik.933785
AMA Çıtlak O, Dörterler M, Dogru İ. A Hybrid Spam Detection Framework for Social Networks. Politeknik Dergisi. July 2023;26(2):823-837. doi:10.2339/politeknik.933785
Chicago Çıtlak, Oğuzhan, Murat Dörterler, and İbrahim Dogru. “A Hybrid Spam Detection Framework for Social Networks”. Politeknik Dergisi 26, no. 2 (July 2023): 823-37. https://doi.org/10.2339/politeknik.933785.
EndNote Çıtlak O, Dörterler M, Dogru İ (July 1, 2023) A Hybrid Spam Detection Framework for Social Networks. Politeknik Dergisi 26 2 823–837.
IEEE O. Çıtlak, M. Dörterler, and İ. Dogru, “A Hybrid Spam Detection Framework for Social Networks”, Politeknik Dergisi, vol. 26, no. 2, pp. 823–837, 2023, doi: 10.2339/politeknik.933785.
ISNAD Çıtlak, Oğuzhan et al. “A Hybrid Spam Detection Framework for Social Networks”. Politeknik Dergisi 26/2 (July 2023), 823-837. https://doi.org/10.2339/politeknik.933785.
JAMA Çıtlak O, Dörterler M, Dogru İ. A Hybrid Spam Detection Framework for Social Networks. Politeknik Dergisi. 2023;26:823–837.
MLA Çıtlak, Oğuzhan et al. “A Hybrid Spam Detection Framework for Social Networks”. Politeknik Dergisi, vol. 26, no. 2, 2023, pp. 823-37, doi:10.2339/politeknik.933785.
Vancouver Çıtlak O, Dörterler M, Dogru İ. A Hybrid Spam Detection Framework for Social Networks. Politeknik Dergisi. 2023;26(2):823-37.
 
TARANDIĞIMIZ DİZİNLER (ABSTRACTING / INDEXING)
181341319013191 13189 13187 13188 18016 

download Bu eser Creative Commons Atıf-AynıLisanslaPaylaş 4.0 Uluslararası ile lisanslanmıştır.