EN
Machine Learning Methods for Spamdexing Detection
Abstract
In this paper, we present recent contributions for the battle against one of the main problems faced by search engines: the spamdexing or web spamming. They are malicious techniques used in web pages with the purpose of circumvent the search engines in order to achieve good visibility in search results. To better understand the problem and finding the best setup and methods to avoid such virtual plague, in this paper we present a comprehensive performance evaluation of several established machine learning techniques. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007 collections. The samples are represented by content-based, link-based, transformed link-based features and their combinations. The found results indicate that bagging of decision trees, multilayer perceptron neural networks, random forest and adaptive boosting of decision trees are promising in the task of web spam classification.
Keywords
References
- Z. Gyongyi and H. Garcia-Molina, “Spam: It’s not just for inboxes anymore,” Computer, vol. 38, no. 10, pp. 28–34, 2005.
- K. M. Svore, Q. Wu, and C. J. Burges, “Improving web spam classification using rank-time features,” in Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb’07), Banff, Alberta, Canada, 2007, pp. 9–16.
- G. Shen, B. Gao, T. Liu, G. Feng, S. Song, and H. Li, “Detecting link spam using temporal information,” in Proceedings of the 6th IEEE International Conference on Data Mining (ICDM’06), Hong Kong, China, 2006, pp. 1049–1053.
- M. Egele, C. Kolbitsch, and C. Platzer, “Removing web spam links from search engine results,” Journal in Computer Virology, vol. 7, pp. 51–62, 2011.
- J. P. John, F. Yu, Y. Xie, A. Krishnamurthy, and M. Abadi, “deSEO: combating search-result poisoning,” in Proceedings of the 20th USENIX conference on Security (SEC’11), Berkeley, CA, USA, 2011, pp. 20–20.
- L. Lu, R. Perdisci, and W. Lee, “SURF: detecting and measuring search poisoning,” in Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS’11), New York, NY, USA, 2011, pp. 467–476.
- R. M. Silva, T. A. Almeida, and A. Yamakami, “Artificial neural networks for content-based web spam detection,” in Proc. of the 14th International Conference on Artificial Intelligence (ICAI’12), Las Vegas, NV, USA, 2012, pp. 209–215.
- ——, “Towards web spam filtering with neural-based ap- proaches,” in Advances in Artificial Intelligence – IBERAMIA 2012, ser. Lecture Notes in Computer Science, vol. 7637. Cartagena de Indias, Colombia: Springer Berlin Heidelberg, 2012, pp. 199–209.
Details
Primary Language
English
Subjects
-
Journal Section
-
Publication Date
September 29, 2013
Submission Date
January 30, 2016
Acceptance Date
-
Published in Issue
Year 2013 Volume: 2 Number: 3
APA
Almeida, T., Silva, R., & Yamakami, A. (2013). Machine Learning Methods for Spamdexing Detection. International Journal of Information Security Science, 2(3), 86-107. https://izlik.org/JA73HZ96RD
AMA
1.Almeida T, Silva R, Yamakami A. Machine Learning Methods for Spamdexing Detection. IJISS. 2013;2(3):86-107. https://izlik.org/JA73HZ96RD
Chicago
Almeida, Tiago, Renato Silva, and Akebo Yamakami. 2013. “Machine Learning Methods for Spamdexing Detection”. International Journal of Information Security Science 2 (3): 86-107. https://izlik.org/JA73HZ96RD.
EndNote
Almeida T, Silva R, Yamakami A (September 1, 2013) Machine Learning Methods for Spamdexing Detection. International Journal of Information Security Science 2 3 86–107.
IEEE
[1]T. Almeida, R. Silva, and A. Yamakami, “Machine Learning Methods for Spamdexing Detection”, IJISS, vol. 2, no. 3, pp. 86–107, Sept. 2013, [Online]. Available: https://izlik.org/JA73HZ96RD
ISNAD
Almeida, Tiago - Silva, Renato - Yamakami, Akebo. “Machine Learning Methods for Spamdexing Detection”. International Journal of Information Security Science 2/3 (September 1, 2013): 86-107. https://izlik.org/JA73HZ96RD.
JAMA
1.Almeida T, Silva R, Yamakami A. Machine Learning Methods for Spamdexing Detection. IJISS. 2013;2:86–107.
MLA
Almeida, Tiago, et al. “Machine Learning Methods for Spamdexing Detection”. International Journal of Information Security Science, vol. 2, no. 3, Sept. 2013, pp. 86-107, https://izlik.org/JA73HZ96RD.
Vancouver
1.Tiago Almeida, Renato Silva, Akebo Yamakami. Machine Learning Methods for Spamdexing Detection. IJISS [Internet]. 2013 Sep. 1;2(3):86-107. Available from: https://izlik.org/JA73HZ96RD