Performance of Using Tag-based Feature Sets in Web Page Classification

Cilt: 22 Sayı: 2 15 Ağustos 2018
PDF İndir
TR

Performance of Using Tag-based Feature Sets in Web Page Classification

Öz

As the Web is a large collection of data growing daily, an automatic Web page classification mechanism is needed to effectively reach to useful information. Majority of the Web pages are in the form of HTML documents, therefore the aim of this study is to explore the effect of HTML tags on classification process, and try to determine the most valuable HTML tags for feature extraction of the classification task. To achieve this goal, we employ 13 different datasets, and use 5 popular classifiers that are SVM, naïve bayes (NB), kNN, C4.5, and OneR. The statistical analysis shows that, the features extracted by using solely the anchor, <p> or <title> tags can be used as an alternative to the features extracted from the whole Web page. SVM is the best among the classifiers used in this study. Using the HTML tags for feature extraction improves classification accuracy.

Anahtar Kelimeler

Kaynakça

  1. [1] Shaker, M., Ibrahim, H., Mustapha, A. and Abdullah, L. N. 2009. Information Extraction From Hypertext Mark-up Language Web Pages. Journal of Computer Science, 5(8), 596-607.
  2. [2] Soonthomphisaj, N., Chartbanchachai, P., Pratheeptham, T. and Kijsirikul, B. 2002. Web Page Categorization Using Hierarchical Headings Structure. Proceedings of the 24th International Conference on Information Technology Interfaces in Cavtat, Croatia, IEEE, 37-42.
  3. [3] Xue, W., Bao, H., Huang, W. and Lu, Y. 2006. Web Page Classification Based on SVM. Proceedings of the 6th World Congress on Intelligent Control and Automation in Dalian, China, IEEE, 6111-6114.
  4. [4] Werner, L., Böttcher, S. and Beckmann, R. 2005. Enhanced Information Retrieval by Using HTML Tags. Proceedings of the 2005 International Conference on Data Mining in Las Vegas, Nevada, USA, CSREA Press, 24-29.
  5. [5] Kim, S. and Zhang, B.-T. 2003. Genetic Mining of HTML Structures for Effective Web-document Retrieval. Applied Intelligence, 18(3), 243–256.
  6. [6] Özel, S. A. 2011. A Web Page Classification System Based on a Genetic Algorithm Using Tagged-terms as Features. Expert Systems with Applications, 38(4), 3407-3415.
  7. [7] Golub, K. and Ardo, A. 2005. Importance of HTML structural elements and metadata in automated subject classification. Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries in Vienna, Austria, Springer-Verlag, 368–378.
  8. [8] Yang, Y., Slattery, S. and Ghani, R. 2002. A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems, 18(2-3), 219–241.

Ayrıntılar

Birincil Dil

Türkçe

Konular

-

Bölüm

-

Yayımlanma Tarihi

15 Ağustos 2018

Gönderilme Tarihi

6 Kasım 2017

Kabul Tarihi

-

Yayımlandığı Sayı

Yıl 2018 Cilt: 22 Sayı: 2

Kaynak Göster

APA
Özel, S. A., Ünal, H. E., & Ünal, İ. (2018). Performance of Using Tag-based Feature Sets in Web Page Classification. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22(2), 583-594. https://izlik.org/JA94JE67GM
AMA
1.Özel SA, Ünal HE, Ünal İ. Performance of Using Tag-based Feature Sets in Web Page Classification. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. 2018;22(2):583-594. https://izlik.org/JA94JE67GM
Chicago
Özel, Selma Ayşe, Havva Esin Ünal, ve İlker Ünal. 2018. “Performance of Using Tag-based Feature Sets in Web Page Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22 (2): 583-94. https://izlik.org/JA94JE67GM.
EndNote
Özel SA, Ünal HE, Ünal İ (01 Ağustos 2018) Performance of Using Tag-based Feature Sets in Web Page Classification. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22 2 583–594.
IEEE
[1]S. A. Özel, H. E. Ünal, ve İ. Ünal, “Performance of Using Tag-based Feature Sets in Web Page Classification”, Süleyman Demirel Üniv. Fen Bilim. Enst. Derg., c. 22, sy 2, ss. 583–594, Ağu. 2018, [çevrimiçi]. Erişim adresi: https://izlik.org/JA94JE67GM
ISNAD
Özel, Selma Ayşe - Ünal, Havva Esin - Ünal, İlker. “Performance of Using Tag-based Feature Sets in Web Page Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 22/2 (01 Ağustos 2018): 583-594. https://izlik.org/JA94JE67GM.
JAMA
1.Özel SA, Ünal HE, Ünal İ. Performance of Using Tag-based Feature Sets in Web Page Classification. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. 2018;22:583–594.
MLA
Özel, Selma Ayşe, vd. “Performance of Using Tag-based Feature Sets in Web Page Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, c. 22, sy 2, Ağustos 2018, ss. 583-94, https://izlik.org/JA94JE67GM.
Vancouver
1.Selma Ayşe Özel, Havva Esin Ünal, İlker Ünal. Performance of Using Tag-based Feature Sets in Web Page Classification. Süleyman Demirel Üniv. Fen Bilim. Enst. Derg. [Internet]. 01 Ağustos 2018;22(2):583-94. Erişim adresi: https://izlik.org/JA94JE67GM

e-ISSN :1308-6529
Linking ISSN (ISSN-L): 1300-7688

Dergide yayımlanan tüm makalelere ücretiz olarak erişilebilinir ve Creative Commons CC BY-NC Atıf-GayriTicari lisansı ile açık erişime sunulur. Tüm yazarlar ve diğer dergi kullanıcıları bu durumu kabul etmiş sayılırlar. CC BY-NC lisansı hakkında detaylı bilgiye erişmek için tıklayınız.