TY - JOUR T1 - Hibrit Derin Öğrenme Modeli ile Web Sitelerinin Görsel ve Metinsel Verilere Dayalı Sınıflandırılması: DeepCLA-Web TT - Classification of Websites Based on Visual and Textual Data Using a Hybrid Deep Learning Model: DeepCLA-Web AU - Şeker, Harun AU - Çubukçu, Burakhan PY - 2025 DA - August Y2 - 2025 DO - 10.46740/alku.1639372 JF - ALKÜ Fen Bilimleri Dergisi PB - Alanya Alaaddin Keykubat Üniversitesi WT - DergiPark SN - 2667-7814 SP - 66 EP - 79 VL - 7 IS - 2 LA - tr AB - Bu çalışmada, web sitelerinin sınıflandırılması için metin ve görsel içerikleri işleyen hibrit bir derin öğrenme modeli önerilmektedir. İnternette erişilebilen bilgi hizmetlerinin miktarı her geçen gün artmakta olup, yoğun veri akışı içinde web sitelerinin içeriğe göre doğru sınıflandırılması önem arz etmektedir. Kullanıcılar için bu işlemi yapabilecek bir derin öğrenme modeli oluşturmak amacıyla, Université Toulouse tarafından yayınlanan UT1 Blacklist içerisinden 430 web adresi seçilmiş ve bu adresler alışveriş, haber ve oyun olmak üzere üç kategoriye ayrılmıştır. Önerilen model, web sitelerinin metin içeriklerini işlemek için Uzun Kısa Süreli Bellek (LSTM) kullanırken, görüntü verilerini analiz etmek için Evrişimli Sinir Ağı (CNN) kullanmaktadır. LSTM ve CNN modellerinin çıktısını birleştiren bir Yapay Sinir Ağı (ANN) nihai sınıflandırmayı gerçekleştirmektedir. CNN ile görsel, LSTM ile metin işleyerek ANN ile nihai karar veren, önerilen web sitesi sınıflandırma modelinin (DeepCLA-Web) başarımı, sadece görsel verileri kullanan CNN modeli ve sadece metin verileri kullanan LSTM modeli ile literatürde sık kullanılan metrikler üzerinden kıyaslanmıştır. CNN modeli %59,22, LSTM modeli %75,85 doğruluk oranına ulaşırken, önerilen DeepCLA-Web %80,89 doğruluk oranına ulaşmıştır. KW - Web sitesi sınıflandırma KW - Hibrit derin öğrenme KW - Uzun Kısa Süreli Bellek KW - Evrişimli Sinir Ağı N2 - This study proposes a hybrid deep learning model that processes both textual and visual content for web site classification. The amount of accessible information services on the internet is increasing daily, and within this intense data flow, accurately classifying web sites based on their content is crucial. To develop a deep learning model capable of performing this classification for users, 430 website addresses were selected from the UT1 Blacklist, published by Université Toulouse, and divided into three categories: shopping, news, and gaming. The proposed model uses Long Short-Term Memory (LSTM) for processing the textual content of websites and Convolutional Neural Network (CNN) for analyzing visual data. An Artificial Neural Network (ANN) combining the outputs of LSTM and CNN models performs the final classification. The performance of the proposed website classification model (DeepCLA-Web), which processes visual data with CNN, text with LSTM, and makes the final decision with ANN, was compared to a CNN model using only visual data and an LSTM model using only textual data based on commonly used metrics in the literature. The CNN model achieved an accuracy of 59.22%, the LSTM model 75.85%, while the proposed DeepCLA-Web reached 80.89% accuracy. CR - [1] M. S. Kurt and E. Yücel, "Web page classification with deep learning methods," Bursa Uludağ University Journal of The Faculty of Engineering, vol. 27, no. 1, pp. 191–202, 2022, doi: 10.17482/uumfd.891038. CR - [2] Y. Yu, "Web page classification algorithm based on deep learning," Computational Intelligence and Neuroscience, vol. 2022, Art. no. 9534918, 2022, doi: 10.1155/2022/9534918. CR - [3] D. López-Sánchez, A. González Arrieta, and J. M. Corchado, "Visual content-based web page categorization with deep transfer learning and metric learning," Neurocomputing, vol. 338, pp. 418–431, 2019, doi: 10.1016/j.neucom.2018.08.086. CR - [4] M. Hashemi, "Web page classification: A survey of perspectives, gaps, and future directions," Multimedia Tools and Applications, vol. 79, pp. 11921–11945, 2020, doi: 10.1007/s11042-019-08373-8. CR - [5] R. Bruni and G. Bianchi, "Web site categorization: A formal approach and robustness analysis in the case of e-commerce detection" Expert Systems with Applications, vol. 142, p. 113001, 2020, doi: 10.1016/j.eswa.2019.113001. CR - [6] D. Cohen, O. Naim, E. Toch, and I. Ben-Gal, "Web site categorization via design attribute learning," Computers & Security, vol. 107, p. 102312, 2021, doi: 10.1016/j.cose.2021.102312. CR - [7] V. K. Bhalla and N. Kumar, "An efficient scheme for automatic web pages categorization using the support vector machine," New Review of Hypermedia and Multimedia, vol. 22, no. 3, pp. 223–242, 2016, doi: 10.1080/13614568.2016.1152316. CR - [8] E. Buber and B. Diri, "Web page classification using RNN," Procedia Computer Science, vol. 154, pp. 62–72, 2019, doi: 10.1016/j.procs.2019.06.011. CR - [9] S. H. Apandi, J. Sallim, and R. Mohamed, "A survey on technique for solving web page classification problem," IOP Conference Series: Materials Science and Engineering, vol. 769, no. 1, p. 012036, 2020, doi: 10.1088/1757-899X/769/1/012036. CR - [10] S. H. Apandi, J. Sallim, R. Mohamed, and N. Ahmad, "Automatic topic-based web page classification using deep learning," International Journal on Informatics Visualization, vol. 7, no. 3-2, pp. 2108–2114, 2023. CR - [11] S. H. Apandi, J. Sallim, R. Mohammed, and A. Madbouly, "Web page classification using convolutional neural network towards eliminating internet addiction," in Proc. 2021 Int. Conf. Software Engineering & Computer Systems and 4th Int. Conf. Computational Science and Information Management (ICSECS-ICOCSIM), 2021, doi: 10.1109/ICSECS52883.2021.00034. CR - [12] P. Prajapati and P. V. Nainwani, "Comparative study of web page classification approaches," International Journal of Computer Applications, vol. 179, no. 45, pp. 6–9, 2018, doi: 10.5120/ijca2018916994. CR - [13] DSI Université Toulouse Capitole, "The blacklists of the University of Toulouse Capitole," Database, [Online]. Available: https://dsi.ut-capitole.fr/blacklists/index_en.php. [Accessed: Nov. 13, 2024]. CR - [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012, doi: 10.1145/3065386. CR - [15] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. CR - [16] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735. CR - [17] F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000. CR - [18] D. M. W. Powers, "Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011. CR - [19] H. Bozcu and B. Çubukçu, "Deep learning-based damage assessment in cherry leaves," Journal of Innovative Science and Engineering (JISE), vol. 8, no. 2, pp. 160–178, 2024, doi: 10.38088/jise.1455860. CR - [20] M. Aybar, U. Talaş, and B. Çubukçu, "Transfer öğrenme modelleri ile elma yapraklarında hastalık tespiti," ESTUDAM Bilişim, vol. 5, no. 2, pp. 57–63, 2024, doi: 10.53608/estudambilisim.1556425. UR - https://doi.org/10.46740/alku.1639372 L1 - https://dergipark.org.tr/tr/download/article-file/4607458 ER -