Spam SMS’lerin filtrelenmesinde yeni bir yaklaşım: Motif örüntüler
Year 2018,
Volume: 6 Issue: 2, 436 - 450, 30.06.2018
Yılmaz Kaya
,
Cüneyt Özdemir
Abstract
Her teknolojinin
yaygınlaşması ile birlikte birçok problemde beraberinde gelir. Mobil
teknolojilerde yaygın olarak kullanılan Mobil Kısa Mesaj Servisi(SMS) birçok sorunu
beraberinde getirmiştir. SMS’in en önemli sorunu spam olarak belirtilen
istenmeyen mesajların mobil ağ üzerinde yayılmasıdır. Spam mesajlar mobil
trafiğini engellemekle birlikte kişileri de gereksiz yere meşgul etmektedir. Bu
çalışmada spam SMS’leri filtrelemek için, karakterlerin UTF-8 kodlarını birbiri
ile karşılaştırılması sonucu oluşan formları kullanan yeni bir öznitelik
çıkarım, motif örüntüler yöntemi önerilmiştir. Önerilen motif örüntüler yönteminde,
SMS’in unikodları üzerinde tanımlanan bir pencere boyutu (PB) içerisine giren
değerlerin birbirlerine göre oluşturdukları görünüm motif olarak ele
alınmaktadır. SMS’deki bu motiflerin frekansları öznitelik vektörü olarak
kullanılmıştır. Motif çeşitleri belirtilen PB’a bağlıdır. Motif örüntüler yöntemini test etmek için üç
kıyaslama veri kümesi kullanılmıştır. Gözlenen sonuçlara göre önerilen yöntemin
spam filtrelenmesinde SMS mesajlarından başarılı öznitelik çıkarım yöntemi
olduğu görülmüştür. Ayrıca motif yöntemi diğer metin madenciliği, doğal dil
işleme alanlarında kullanılabileceği düşünülmektedir.
References
- [1] Ji Won Yoon, Hyoungshick Kim, Jun Ho Huh, 2010, Hybrid spam filtering for mobile communication, computers & security 29 (2010) 446–459.
- [2] Chen, L., Yan, Z., Zhang, W., & Kantola, R. (2014). TruSMS: A trustworthy SMS spam control system based on trust management. Future Generation Computer Systems. http://dx.doi.org/10.1016/j.future.2014.06.010
- [3] Ahmed, I, Ali, R, Guan, D, Lee, YK, Lee, S, Chung, TSemi-supervised learning using frequent itemset and ensemble learning for SMS classification. Expert Systems with Applications, 2015, 42(3): 1065–1073.
- [4] Su, MC, Lo, HH, Hsu, FH, A neural tree ve its application to spam e-mail detection. Expert Systems with Applications, 37(12), 2010, 7976-7985.
- [5] Delany, S. J., Buckley, M., & Greene, D. (2012). SMS spam filtering: methods ve data. Expert Systems with Applications, 39(10), 9899-9908.
- [6] A. K. Uysal, S. Gunal1, S. Ergin, E. Sora Gunal, 2013. The Impact of Feature Extraction ve Selection on SMS Spam Filtering, ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 19, NO. 5, 2013
- [7] Healy M, Delany S, Zamolotskikh A. An assessment of case-based reasoning for short text message classification. In: Proceedings of 16th Irish conference on artificial intelligence ve cognitive science; 2005. p. 257–66.
- [8] Idris, I, Selamat, A, Omatu, S, Hybrid email spam detection model with negative selection algorithm ve differential evolution. Engineering Applications of Artificial Intelligence, 2014, 28, 97-110.
- [9] Almeida, T. A., Gomez Hidalgo, J. M., & Yamakami, A. (2011). In Proceedings of the 11th ACM Symposium on document engineering DOCENG’11 (pp. 259-262). Mountain View, CA, USA: ACM.
- [10] Wuying Liu, Ting Wang, 2010, Index-based Online Text Classification for SMS Spam Filtering, JOURNAL OF COMPUTERS, VOL. 5, NO. 6, JUNE 2010
- [11] Cormack, G.V., Lynam, T.R., 2007. Online supervised spam filter evaluation. ACM Trans. Inform. Syst. 25 (3), 1–31.
- [12] Y.-T. Hou, Y. Chang, T. Chen, C.-S. Laih, C.-M. Chen, Malicious web content detection by machine learning, Expert Syst. Appl. 37 (1) (2010) 55–60
- [13] A.H. Wang, Detecting spam bots in online social networking sites: a machine learning approach, in: Data ve Applications Security ve Privace
- [14] Liang Chen, Zheng Yan, Weidong Zhang, Raimo Kantola, 2015, TruSMS: A trustworthy SMS spam control system based on trust management. Future Generation Computer Systems 49 , 77-93
- [15] M. Tufiq, M.F.A. Abdullah, K. Kang, D. Choi, A survey of preventing, blocking ve filtering Short Message Services (SMS) spam, in: Proc. of International Conference on Computer ve Electrical Engineering. IACSIT, November 2010, Vol. 1, pp. 462–466
- [16] T. A. Almeida, A. Yamakami, ve J. Almeida. Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters. In Proc. of the 8th IEEE ICMLA, pages 517–522, Miami, FL, USA, 2009.
- [17] I. H. Witten ve E. Frank, Data Mining: Practical Machine Learning Tools ve Techniques, 2nd ed. San Francisco, CA: Morgan Kaufmann, 2005.
- [18] Biggio, B., Fumera, G., Pillai, I., Roli, F. (2011). A survey ve experimental evaluation of image spam filtering techniques. Pattern Recognition Letters,32(10), 1436-1446.
- [19] Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P. G. (2014). Study on the effectiveness of anomaly detection for spam filtering.Information Sciences, 277, 421-444.
- [20] Sakkis, G., Veroutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., Stamatopoulos, P. (2003). A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval, 6(1), 49-73.
- [21] Xiang, Y., Chowdhury, M., & Ali, S. (2004). Filtering mobile spam by support vector machine. In N. Debnath (Ed.),Proceedings of the third international conference on computer sciences, software engineering, information technology, E-business ve applications(pp. 1–4)
- [22] Healy, M., Delany, S., & Zamolotskikh, A. (2005). An assessment of case-based reasoning for short text message classification. In N. Creaney (Ed.),Proceedings of 16th Irish conference on artificial intelligence ve cognitive science, (AICS-05) (pp. 257–266)
- [23] Cai, J., Tang, Y., & Hu, R. (2008). Spam filter for short messages using winnow. In Proceedings of the international conference on advanced language processing ve web information technology(pp. 454–459). IEEE
- [24] Wu, N., Wu, M., & Chen, S. (2008). Real-time monitoring ve filtering system for mobile SMS. InProceedings of 3rd IEEE conference on industrial electronics ve applications(pp. 1319–1324)
- [25] Longzhen, D., An, L., & Longjun, H. (2009). A new spam short message classification. In Proceedings of the first international workshop on education technology ve computer science(Vol. 2, pp. 168 –171).
- [26] Almeida, T. A., Gómez Hidalgo, J. M., & Yamakami, A. (2011). InProceedings of the 11th ACM Symposium on document engineering DOCENG’11 (pp. 259-262). Mountain View, CA, USA: ACM.
- [27] Deng, W.-W., & Peng, H., 2006. Research on a Naive Bayesian Based Short Message Filtering System. In Proceedings of the international conference on machinelearning ve cybernetics (pp. 1233–1237). IEEE.
- [28] Rafique, M. Z., & Farooq, M. (2010). SMS SPAM detection by operating on byte-level distributions using hidden markov models (HMMs). InProceedings of the 20th virus bulletin international conference.
- [29] G. V. Cormack, J. M. Gómez Hidalgo, ve E. Puertas Sanz, “Feature Engineering for Mobile (SMS) Spam Filtering,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research ve Development in Information Retrieval, New York, NY, USA, 2007, pp. 871–872.
- [30] Sohn, D. N., Lee, J. T., & Rim, H. C. (2009). The contribution of stylistic information to content-based mobile spam filtering. In Proceedings of the ACL/AFNLP 2009 conference short papers(pp. 321–324).
- [31] He P, Sun Y, Zheng W, Wen X. Filtering short message spam of group sending using CAPTCHA. In: Workshop on knowledge discovery ve data mining; 2008. p. 558–61.
- [32] Deng W, Peng H. Research on a naive Bayesian based short message filtering system. In: Machine learning ve cybernetics, 2006 international conference on Aug. 2006. p. 1233–7.
- [33] J. M. Gómez Hidalgo, G. Cajigas Bringas, E. Puertas Sanz, ve F. Carrero García, “Content Based SMS Spam Filtering,” in Proceedings of the 2006 ACM Symposium on Document Engineering, Amsterdam, The Netherlves, 2006, pp. 107–114.
- [34] Dae-Neung Sohn, Jung-Tae Lee, Kyoung-Soo Han, Hae-Chang Rim, 2012, Content-based mobile spam classification using stylistically motivated features, Pattern Recognition Letters 33 (2012) 364–369
- [35] Ishtiaq Ahmed, Rahman Ali, Donghai Guan, Young-Koo Lee, Sungyoung Lee, TaeChoong Chung, 2015. Semi-supervised learning using frequent itemset ve ensemble learning for SMS classification, Expert Systems with Applications 42 (2015) 1065–1073
- [36] Ali A. Al-Hasan, El-Sayed M. El-Alfy, 2015, Dendritic Cell Algorithm for Mobile Phone Spam Filtering, Procedia Computer Science 52 ( 2015 ) 244 – 251
- [37] Adebukola S. Onashoga, Olusola O. Abayomi-Alli, Adesina S. Sodiya & David A. Ojo, 2015, Information Security Journal: A Global Perspective, Information Security Journal: A Global Perspective, 00:1–13, 2015
- [38] Tran Phuc Ho, Ho-Seok Kang, Sung-Ryul Kim, Graph-based KNN Algorithm for Spam SMS Detection, Journal of Universal Computer Science, vol. 19, no. 16 (2013), 2404-2419
- [39] Nagwani, N. K. (2017). A Bi-Level Text Classification Approach for SMS Spam Filtering ve Identifying Priority Messages. International Arab Journal of Information Technology (IAJIT), 14(4).
- [40] Zhang, X., Xiong, G., Hu, Y., Zhu, F., Dong, X., & Nyberg, T. R. (2016, June). A method of SMS spam filtering based on AdaBoost algorithm. In Intelligent Control ve Automation (WCICA), 2016 12th World Congress on (pp. 2328-2332). IEEE.
- [41] Karasoy, O., & Ballı, S. (2017, October). Classification Turkish SMS with deep learning tool Word2Vec. In Computer Science ve Engineering (UBMK), 2017 International Conference on (pp. 294-297). IEEE.
- [42] Pham, T. H., & Le-Hong, P. (2016, November). Content-based approach for Vietnamese spam SMS filtering. In Asian Language Processing (IALP), 2016 International Conference on (pp. 41-44). IEEE.
- [43] Najadat, H., Abdulla, N., Abooraig, R., & Nawasrah, S. (2014). Mobile sms spam filtering based on mixing classifiers. International Journal of Advanced Computing Research, 1, 1-7.
- [44] Rafique, M. Z., & Abulaish, M. (2012, August). Graph-based learning model for detection of SMS spam on smart phones. In Wireless Communications and Mobile Computing Conference (IWCMC), 2012 8th International (pp. 1046-1051). IEEE.
- [45] Rafique, M. Z., Alrayes, N., & Khan, M. K. (2011, July). Application of evolutionary algorithms in detecting SMS spam at access layer. In Proceedings of the 13th annual conference on Genetic and evolutionary computation (pp. 1787-1794). ACM.
- [46] Almeida, T. A., Silva, T. P., Santos, I., & Hidalgo, J. M. G. (2016). Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering. Knowledge-Based Systems, 108, 25-32.
- [47] Akbari, F., & Sajedi, H. (2015, May). SMS spam detection using selected text features and boosting classifiers. In Information and Knowledge Technology (IKT), 2015 7th Conference on (pp. 1-5). IEEE.
- [48] Kim, K., Sin-Eon, S., Jo, J., & Choi, S. H. (2015). SMS Spam filtering using Keyword Frequency Ratio. International Journal of Security and its Applications, 9(1), 329-36.
- [49] Mahmoud, T. M., & Mahfouz, A. M. (2012). SMS spam filtering technique based on artificial immune system. IJCSI International Journal of Computer Science Issues, 9(1), 589-597.
- [50] Mujtaba, G., & Yasin, M. (2014). SMS spam detection using simple message content features. J. Basic Appl. Sci. Res, 4(4), 275-279.
Year 2018,
Volume: 6 Issue: 2, 436 - 450, 30.06.2018
Yılmaz Kaya
,
Cüneyt Özdemir
References
- [1] Ji Won Yoon, Hyoungshick Kim, Jun Ho Huh, 2010, Hybrid spam filtering for mobile communication, computers & security 29 (2010) 446–459.
- [2] Chen, L., Yan, Z., Zhang, W., & Kantola, R. (2014). TruSMS: A trustworthy SMS spam control system based on trust management. Future Generation Computer Systems. http://dx.doi.org/10.1016/j.future.2014.06.010
- [3] Ahmed, I, Ali, R, Guan, D, Lee, YK, Lee, S, Chung, TSemi-supervised learning using frequent itemset and ensemble learning for SMS classification. Expert Systems with Applications, 2015, 42(3): 1065–1073.
- [4] Su, MC, Lo, HH, Hsu, FH, A neural tree ve its application to spam e-mail detection. Expert Systems with Applications, 37(12), 2010, 7976-7985.
- [5] Delany, S. J., Buckley, M., & Greene, D. (2012). SMS spam filtering: methods ve data. Expert Systems with Applications, 39(10), 9899-9908.
- [6] A. K. Uysal, S. Gunal1, S. Ergin, E. Sora Gunal, 2013. The Impact of Feature Extraction ve Selection on SMS Spam Filtering, ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 19, NO. 5, 2013
- [7] Healy M, Delany S, Zamolotskikh A. An assessment of case-based reasoning for short text message classification. In: Proceedings of 16th Irish conference on artificial intelligence ve cognitive science; 2005. p. 257–66.
- [8] Idris, I, Selamat, A, Omatu, S, Hybrid email spam detection model with negative selection algorithm ve differential evolution. Engineering Applications of Artificial Intelligence, 2014, 28, 97-110.
- [9] Almeida, T. A., Gomez Hidalgo, J. M., & Yamakami, A. (2011). In Proceedings of the 11th ACM Symposium on document engineering DOCENG’11 (pp. 259-262). Mountain View, CA, USA: ACM.
- [10] Wuying Liu, Ting Wang, 2010, Index-based Online Text Classification for SMS Spam Filtering, JOURNAL OF COMPUTERS, VOL. 5, NO. 6, JUNE 2010
- [11] Cormack, G.V., Lynam, T.R., 2007. Online supervised spam filter evaluation. ACM Trans. Inform. Syst. 25 (3), 1–31.
- [12] Y.-T. Hou, Y. Chang, T. Chen, C.-S. Laih, C.-M. Chen, Malicious web content detection by machine learning, Expert Syst. Appl. 37 (1) (2010) 55–60
- [13] A.H. Wang, Detecting spam bots in online social networking sites: a machine learning approach, in: Data ve Applications Security ve Privace
- [14] Liang Chen, Zheng Yan, Weidong Zhang, Raimo Kantola, 2015, TruSMS: A trustworthy SMS spam control system based on trust management. Future Generation Computer Systems 49 , 77-93
- [15] M. Tufiq, M.F.A. Abdullah, K. Kang, D. Choi, A survey of preventing, blocking ve filtering Short Message Services (SMS) spam, in: Proc. of International Conference on Computer ve Electrical Engineering. IACSIT, November 2010, Vol. 1, pp. 462–466
- [16] T. A. Almeida, A. Yamakami, ve J. Almeida. Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters. In Proc. of the 8th IEEE ICMLA, pages 517–522, Miami, FL, USA, 2009.
- [17] I. H. Witten ve E. Frank, Data Mining: Practical Machine Learning Tools ve Techniques, 2nd ed. San Francisco, CA: Morgan Kaufmann, 2005.
- [18] Biggio, B., Fumera, G., Pillai, I., Roli, F. (2011). A survey ve experimental evaluation of image spam filtering techniques. Pattern Recognition Letters,32(10), 1436-1446.
- [19] Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P. G. (2014). Study on the effectiveness of anomaly detection for spam filtering.Information Sciences, 277, 421-444.
- [20] Sakkis, G., Veroutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., Stamatopoulos, P. (2003). A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval, 6(1), 49-73.
- [21] Xiang, Y., Chowdhury, M., & Ali, S. (2004). Filtering mobile spam by support vector machine. In N. Debnath (Ed.),Proceedings of the third international conference on computer sciences, software engineering, information technology, E-business ve applications(pp. 1–4)
- [22] Healy, M., Delany, S., & Zamolotskikh, A. (2005). An assessment of case-based reasoning for short text message classification. In N. Creaney (Ed.),Proceedings of 16th Irish conference on artificial intelligence ve cognitive science, (AICS-05) (pp. 257–266)
- [23] Cai, J., Tang, Y., & Hu, R. (2008). Spam filter for short messages using winnow. In Proceedings of the international conference on advanced language processing ve web information technology(pp. 454–459). IEEE
- [24] Wu, N., Wu, M., & Chen, S. (2008). Real-time monitoring ve filtering system for mobile SMS. InProceedings of 3rd IEEE conference on industrial electronics ve applications(pp. 1319–1324)
- [25] Longzhen, D., An, L., & Longjun, H. (2009). A new spam short message classification. In Proceedings of the first international workshop on education technology ve computer science(Vol. 2, pp. 168 –171).
- [26] Almeida, T. A., Gómez Hidalgo, J. M., & Yamakami, A. (2011). InProceedings of the 11th ACM Symposium on document engineering DOCENG’11 (pp. 259-262). Mountain View, CA, USA: ACM.
- [27] Deng, W.-W., & Peng, H., 2006. Research on a Naive Bayesian Based Short Message Filtering System. In Proceedings of the international conference on machinelearning ve cybernetics (pp. 1233–1237). IEEE.
- [28] Rafique, M. Z., & Farooq, M. (2010). SMS SPAM detection by operating on byte-level distributions using hidden markov models (HMMs). InProceedings of the 20th virus bulletin international conference.
- [29] G. V. Cormack, J. M. Gómez Hidalgo, ve E. Puertas Sanz, “Feature Engineering for Mobile (SMS) Spam Filtering,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research ve Development in Information Retrieval, New York, NY, USA, 2007, pp. 871–872.
- [30] Sohn, D. N., Lee, J. T., & Rim, H. C. (2009). The contribution of stylistic information to content-based mobile spam filtering. In Proceedings of the ACL/AFNLP 2009 conference short papers(pp. 321–324).
- [31] He P, Sun Y, Zheng W, Wen X. Filtering short message spam of group sending using CAPTCHA. In: Workshop on knowledge discovery ve data mining; 2008. p. 558–61.
- [32] Deng W, Peng H. Research on a naive Bayesian based short message filtering system. In: Machine learning ve cybernetics, 2006 international conference on Aug. 2006. p. 1233–7.
- [33] J. M. Gómez Hidalgo, G. Cajigas Bringas, E. Puertas Sanz, ve F. Carrero García, “Content Based SMS Spam Filtering,” in Proceedings of the 2006 ACM Symposium on Document Engineering, Amsterdam, The Netherlves, 2006, pp. 107–114.
- [34] Dae-Neung Sohn, Jung-Tae Lee, Kyoung-Soo Han, Hae-Chang Rim, 2012, Content-based mobile spam classification using stylistically motivated features, Pattern Recognition Letters 33 (2012) 364–369
- [35] Ishtiaq Ahmed, Rahman Ali, Donghai Guan, Young-Koo Lee, Sungyoung Lee, TaeChoong Chung, 2015. Semi-supervised learning using frequent itemset ve ensemble learning for SMS classification, Expert Systems with Applications 42 (2015) 1065–1073
- [36] Ali A. Al-Hasan, El-Sayed M. El-Alfy, 2015, Dendritic Cell Algorithm for Mobile Phone Spam Filtering, Procedia Computer Science 52 ( 2015 ) 244 – 251
- [37] Adebukola S. Onashoga, Olusola O. Abayomi-Alli, Adesina S. Sodiya & David A. Ojo, 2015, Information Security Journal: A Global Perspective, Information Security Journal: A Global Perspective, 00:1–13, 2015
- [38] Tran Phuc Ho, Ho-Seok Kang, Sung-Ryul Kim, Graph-based KNN Algorithm for Spam SMS Detection, Journal of Universal Computer Science, vol. 19, no. 16 (2013), 2404-2419
- [39] Nagwani, N. K. (2017). A Bi-Level Text Classification Approach for SMS Spam Filtering ve Identifying Priority Messages. International Arab Journal of Information Technology (IAJIT), 14(4).
- [40] Zhang, X., Xiong, G., Hu, Y., Zhu, F., Dong, X., & Nyberg, T. R. (2016, June). A method of SMS spam filtering based on AdaBoost algorithm. In Intelligent Control ve Automation (WCICA), 2016 12th World Congress on (pp. 2328-2332). IEEE.
- [41] Karasoy, O., & Ballı, S. (2017, October). Classification Turkish SMS with deep learning tool Word2Vec. In Computer Science ve Engineering (UBMK), 2017 International Conference on (pp. 294-297). IEEE.
- [42] Pham, T. H., & Le-Hong, P. (2016, November). Content-based approach for Vietnamese spam SMS filtering. In Asian Language Processing (IALP), 2016 International Conference on (pp. 41-44). IEEE.
- [43] Najadat, H., Abdulla, N., Abooraig, R., & Nawasrah, S. (2014). Mobile sms spam filtering based on mixing classifiers. International Journal of Advanced Computing Research, 1, 1-7.
- [44] Rafique, M. Z., & Abulaish, M. (2012, August). Graph-based learning model for detection of SMS spam on smart phones. In Wireless Communications and Mobile Computing Conference (IWCMC), 2012 8th International (pp. 1046-1051). IEEE.
- [45] Rafique, M. Z., Alrayes, N., & Khan, M. K. (2011, July). Application of evolutionary algorithms in detecting SMS spam at access layer. In Proceedings of the 13th annual conference on Genetic and evolutionary computation (pp. 1787-1794). ACM.
- [46] Almeida, T. A., Silva, T. P., Santos, I., & Hidalgo, J. M. G. (2016). Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering. Knowledge-Based Systems, 108, 25-32.
- [47] Akbari, F., & Sajedi, H. (2015, May). SMS spam detection using selected text features and boosting classifiers. In Information and Knowledge Technology (IKT), 2015 7th Conference on (pp. 1-5). IEEE.
- [48] Kim, K., Sin-Eon, S., Jo, J., & Choi, S. H. (2015). SMS Spam filtering using Keyword Frequency Ratio. International Journal of Security and its Applications, 9(1), 329-36.
- [49] Mahmoud, T. M., & Mahfouz, A. M. (2012). SMS spam filtering technique based on artificial immune system. IJCSI International Journal of Computer Science Issues, 9(1), 589-597.
- [50] Mujtaba, G., & Yasin, M. (2014). SMS spam detection using simple message content features. J. Basic Appl. Sci. Res, 4(4), 275-279.