Araştırma Makalesi

WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS

Cilt: 27 Sayı: 1 30 Nisan 2022
PDF İndir
TR EN

WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS

Öz

Today, millions of websites on the Internet are widely used to access information. For effective use of web pages with increasing numbers every day, they need to be well classified. In this study, binary and multi-class classification models have been created which can classify web pages with high accuracy. In our experiments, URLs and categories of English web pages in the Open Directory Project (ODP) were used. Training dataset was created by pulling web page texts from URL information. To our knowledge, this is the first comprehensive web page classification dataset for Turkish. In this study, Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deep learning methods which are effective in text classification are used. Word embedding was used instead of n-gram approaches commonly used for feature extraction in text classification studies. In this study, hyper-parameter optimization was performed for deep learning models. Binary and multi-class classification models were created with the best parameters. Binary classification models were compared with the results of another study, and multi-class classification models were compared with each other. The performances of all models were examined by considering their training time and f1 scores.

Anahtar Kelimeler

Kaynakça

  1. 1. Auli, M., Galley, M., Quirk, C., and Zweig, G. (2013). Joint language and translation modeling with recurrent neural networks. In Proceedings of EMNLP, pages 1044–1054.
  2. 2. Baykan, E., Henzinger, M., Marian, L., and Weber, I. (2009). Purely url based topic classification. In Proceedings of the 18th international conference on World wide web, pages 1109–1110. doi:10.1145/1526709.1526880.
  3. 3. Baykan, E., Henzinger, M., Marian, L., and Weber, I. (2011). A comprehensive study of features and algorithms for url-based topic classification. ACM Transactions on the Web. doi:10.1145/1993053.1993057
  4. 4. Baykan, E., Henzinger, M., Marian, L., and Weber, I. (2013). A comprehensive study of techniques for url-based web page language classification. ACM Transactions on the Web. doi:10.1145/2435215.2435218
  5. 5. Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, pages 157–166. doi:10.1109/72.279181
  6. 6. Chen, C. M., Lee, H. M., and Chang, Y. J. (2009). Two novel feature selection approaches for web page classification. Expert Systems with Applications. doi:10.1016/j.eswa.2007.09.008
  7. 7. Chen, C. M., Lee, H. M., and Tan, C. C. (2006). An intelligent web-page classifier with fair feature-subset selection. Engineering Applications of Artificial Intelligence. doi:10.1109/NAFIPS.2001.944285
  8. 8. Chen, R. C. and Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications. doi:10.1016/j.eswa.2005.09.079

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Zeka

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

30 Nisan 2022

Gönderilme Tarihi

31 Mart 2021

Kabul Tarihi

13 Şubat 2022

Yayımlandığı Sayı

Yıl 2022 Cilt: 27 Sayı: 1

Kaynak Göster

APA
Kurt, M. S., & Yücel Demirel, E. (2022). WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 27(1), 191-204. https://doi.org/10.17482/uumfd.891038
AMA
1.Kurt MS, Yücel Demirel E. WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS. UUJFE. 2022;27(1):191-204. doi:10.17482/uumfd.891038
Chicago
Kurt, Mehmet Salih, ve Eylem Yücel Demirel. 2022. “WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 27 (1): 191-204. https://doi.org/10.17482/uumfd.891038.
EndNote
Kurt MS, Yücel Demirel E (01 Nisan 2022) WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 27 1 191–204.
IEEE
[1]M. S. Kurt ve E. Yücel Demirel, “WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS”, UUJFE, c. 27, sy 1, ss. 191–204, Nis. 2022, doi: 10.17482/uumfd.891038.
ISNAD
Kurt, Mehmet Salih - Yücel Demirel, Eylem. “WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 27/1 (01 Nisan 2022): 191-204. https://doi.org/10.17482/uumfd.891038.
JAMA
1.Kurt MS, Yücel Demirel E. WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS. UUJFE. 2022;27:191–204.
MLA
Kurt, Mehmet Salih, ve Eylem Yücel Demirel. “WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, c. 27, sy 1, Nisan 2022, ss. 191-04, doi:10.17482/uumfd.891038.
Vancouver
1.Mehmet Salih Kurt, Eylem Yücel Demirel. WEB PAGE CLASSIFICATION WITH DEEP LEARNING METHODS. UUJFE. 01 Nisan 2022;27(1):191-204. doi:10.17482/uumfd.891038

Cited By

DUYURU:

30.03.2021- Nisan 2021 (26/1) sayımızdan itibaren TR-Dizin yeni kuralları gereği, dergimizde basılacak makalelerde, ilk gönderim aşamasında Telif Hakkı Formu yanısıra, Çıkar Çatışması Bildirim Formu ve Yazar Katkısı Bildirim Formu da tüm yazarlarca imzalanarak gönderilmelidir. Yayınlanacak makalelerde de makale metni içinde "Çıkar Çatışması" ve "Yazar Katkısı" bölümleri yer alacaktır. İlk gönderim aşamasında doldurulması gereken yeni formlara "Yazım Kuralları" ve "Makale Gönderim Süreci" sayfalarımızdan ulaşılabilir. (Değerlendirme süreci bu tarihten önce tamamlanıp basımı bekleyen makalelerin yanısıra değerlendirme süreci devam eden makaleler için, yazarlar tarafından ilgili formlar doldurularak sisteme yüklenmelidir).  Makale şablonları da, bu değişiklik doğrultusunda güncellenmiştir. Tüm yazarlarımıza önemle duyurulur.

Bursa Uludağ Üniversitesi, Mühendislik Fakültesi Dekanlığı, Görükle Kampüsü, Nilüfer, 16059 Bursa. Tel: (224) 294 1907, Faks: (224) 294 1903, e-posta: mmfd@uludag.edu.tr