In this paper, we present recent contributions for the battle against one of the main problems faced by search engines: the spamdexing or web spamming. They are malicious techniques used in web pages with the purpose of circumvent the search engines in order to achieve good visibility in search results. To better understand the problem and finding the best setup and methods to avoid such virtual plague, in this paper we present a comprehensive performance evaluation of several established machine learning techniques. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007 collections. The samples are represented by content-based, link-based, transformed link-based features and their combinations. The found results indicate that bagging of decision trees, multilayer perceptron neural networks, random forest and adaptive boosting of decision trees are promising in the task of web spam classification.
—Spamdexing web spam spam host classification WEBSPAM-UK2006 WEBSPAM-UK2007
Birincil Dil | İngilizce |
---|---|
Bölüm | Makaleler |
Yazarlar | |
Yayımlanma Tarihi | 29 Eylül 2013 |
Gönderilme Tarihi | 30 Ocak 2016 |
Yayımlandığı Sayı | Yıl 2013 Cilt: 2 Sayı: 3 |