Conference Paper
BibTex RIS Cite

İkili Gri Kurt ve İkili Harris Şahin Optimizasyonları ile Web Haber Sayfalarının Sınıflandırılması

Year 2021, Issue: 26 - Ejosat Special Issue 2021 (HORA), 234 - 241, 31.07.2021
https://doi.org/10.31590/ejosat.950497

Abstract

İnternetin hızlı gelişmesi ile başta haber kaynakları, e-ticaret ve sosyal ağ uygulamaları olmak üzere çok sayıda web hizmeti ve sayfaları kullanıma sunuldu. Bu uygulamaların kullanımı ile inanılmaz büyüklükte video, ses ve metin gibi içerikler oluştu. Oluşan bu verilerin doğru olarak sınıflandırılması, web uygulamasından faydalanan kullanıcıların istedikleri verilere daha hızlı ve kolay erişmesini sağlar. Çok sayıda öznitelikten oluşan bu veriler metin sınıflandırması için yüksek hesaplama sürelerine neden olur. Yüksek boyutlara sahip veriler için daha az öznitelik ve düşük hesaplama süresi ile yüksek doğrulukta metin sınıflandırma başarısını öznitelik seçimi metotları kullanımı ile sağlamak mümkündür. Literatürde metin sınıflandırmasında kullanılan öznitelik seçim metotları filtreleme, sarma, gömülü ve hibrit yöntemler olarak sınıflandırılmaktadır. Bu çalışmada, metin sınıflandırılmasında öznitelik seçimi için İkili Gri Kurt Optimizasyonu (IGKO) ve İkili Harris Şahin Optimizasyonu (IHSO) algoritmaları ReliefF ile beraber kullanılmıştır. Çalışmada algoritmaların sonuçlarını değerlendirmek için 2 farklı özelliğe sahip veri kümesi kullanılmıştır. Birincisi,100 web belgesinden oluşan 2 kategoriye sahip bir veri kümesi, ikincisi ise 9 kategoriden oluşan (fizik, biyoloji, genetik vs) bilim haberleriyle ilgili web sayfalarından çıkarılan 450 web belgesini içeren veri kümesidir. Sonuçlara göre, IHSO amaç fonksiyonu ve öznitelik sayısına göre karşılaştırma yapılan diğer öznitelik seçim metotlarından daha performanslı olduğu görülmüştür.

References

  • Aggarwal, C. C. ve Zhai, C. (2012). A survey of text classification algorithms. In Mining text data (pp. 163-222). Springer, Boston, MA.
  • Aktaş, M. ve Kılıç, F. (2021) Binary Grey Wolf Optimizer using Archeology and Astronomy News for Text Classification, II. International Conference on Innovative Engineering Applications (CIEA' 2021). Asgarnezhad, R., Monadjemi, S. A. ve Soltanaghaei, M. (2020) An application of MOGW optimization for feature selection in text classification. The Journal of Supercomputing, 1-34.
  • Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I. ve Faris, H. (2020) Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications, 32(16), 12201-12220.
  • Chen, H., Jiang, W., Li, C. ve Li, R. (2013) A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Mathematical problems in Engineering.
  • Das, S. (2001)Filters, wrappers and a boosting-based hybrid for feature selection. In Icml, 1, 74-81.
  • Deng, X., Li, Y., Weng, J. ve Zhang, J. (2019) Feature selection for text classification: A review. Multimedia Tools and Applications, 78(3), 3797-3816.
  • Günal, S. (2012) Hybrid feature selection for text classification. Turkish Journal of Electrical Engineering and Computer Science, 20(Sup. 2); 1296-1311.
  • Heidari, A. A., Mirjalili, S., Faris, H., Aljarah, I., Mafarja, M. ve Chen, H. (2019). Harris hawks optimization: Algorithm and applications. Future generation computer systems, 97, 849-872.
  • Jindal, R., Malhotra, R. ve Jain, A. (2015) Techniques for text classification: Literature review and current trends. Webology, 12(2).
  • Kononenko, I., Šimec, E. ve Robnik-Šikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7(1), 39-55.
  • Kira, K. ve Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Aaai (Vol. 2, pp. 129-134).
  • Kira, K. ve Rendell, L. A. (1992). A practical approach to feature selection. In Machine learning proceedings 1992 (pp. 249-256). Morgan Kaufmann.
  • Labani, M., Moradi, P., Ahmadizar, F. ve Jalili, M. (2018). A novel multivariate filter method for feature selection in text classification problems. Engineering Applications of Artificial Intelligence, 70, 25-37.
  • Lee, J., Park, J., Kim, H. C. ve Kim, D. W. (2019). Competitive particle swarm optimization for multi-category text feature selection. Entropy, 21(6), 602.
  • Liu, H. ve Setiono, R. (1997). Feature selection and classification-a probabilistic wrapper approach. In Proceedings of 9th International Conference on Industrial and Engineering Applications of AI and ES, January, p.419-424.
  • Manoj, R. J., Praveena, M. A. ve Vijayakumar, K. (2019) An ACO–ANN based feature selection algorithm for big data. Cluster Computing, 22(2), 3953-3960.
  • Marie-Sainte, S. L. ve Alalyani, N. (2020). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University-Computer and Information Sciences, 32(3), 320-328.
  • Mirjalili, S., Mirjalili, S. M. ve Lewis, A. (2014). Grey wolf optimizer. Advances in engineering software, 69, 46-61.
  • Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y. ve Wang, Z. (2007). A novel feature selection algorithm for text categorization. Expert Systems with Applications, 33(1), 1-5.
  • The SCI-NEWS website. (2021). (online), Available: http://www.sci-news.com/
  • Too, J., Abdullah, A. R. ve Mohd Saad, N. (2019). A new quadratic binary harris hawk optimization for feature selection. Electronics, 8(10), 1130.
  • Wah, Y. B., Ibrahim, N., Hamid, H. A., Abdul-Rahman, S. ve Fong, S. (2018). Feature Selection Methods: Case of Filter and Wrapper Approaches for Maximising Classification Accuracy. Pertanika Journal of Science & Technology, 26(1), 329-340.
  • Xing, E. P., Jordan, M. I. ve Karp, R. M. (2001). Feature selection for high-dimensional genomic microarray data. In Icml, 1, 601- 608.

Classification of News Web Pages using Binary Grey Wolf and Binary Harris Hawk Optimizations

Year 2021, Issue: 26 - Ejosat Special Issue 2021 (HORA), 234 - 241, 31.07.2021
https://doi.org/10.31590/ejosat.950497

Abstract

With the rapid development of the internet, many web services and pages, especially news sources, e-commerce, and social network applications, have been released to use. Using these applications creates an incredible amount of content such as video, audio, and text. The classification of these data with high accuracy provides faster and easier access to the data which the users search for using the web applications. These datasets, consisting of high dimension features, give rise to high computation times for text classification. It is possible to achieve high accuracy with fewer features and less computation time for classification using feature selection methods on these datasets having high dimensions. In the literature, feature selection methods used in text classification can be classified as filtering, wrapping, embedded, and hybrid methods. In this study, Binary Grey Wolf Optimization (BGWO) and Binary Harris Hawk Optimization (BHHO) algorithms are used with ReliefF for feature selection in text classification. To evaluate the results of the proposed algorithms, two datasets having two different characteristics are used. The first dataset has 2 categories and 100 web documents. The second dataset has 9 categories (physics, biology, genetics, etc.) and 450 web documents extracted from science news web pages. The results show that BHHO has better performance than the compared feature selection methods according to fitness and the number of selected features.

References

  • Aggarwal, C. C. ve Zhai, C. (2012). A survey of text classification algorithms. In Mining text data (pp. 163-222). Springer, Boston, MA.
  • Aktaş, M. ve Kılıç, F. (2021) Binary Grey Wolf Optimizer using Archeology and Astronomy News for Text Classification, II. International Conference on Innovative Engineering Applications (CIEA' 2021). Asgarnezhad, R., Monadjemi, S. A. ve Soltanaghaei, M. (2020) An application of MOGW optimization for feature selection in text classification. The Journal of Supercomputing, 1-34.
  • Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I. ve Faris, H. (2020) Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications, 32(16), 12201-12220.
  • Chen, H., Jiang, W., Li, C. ve Li, R. (2013) A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Mathematical problems in Engineering.
  • Das, S. (2001)Filters, wrappers and a boosting-based hybrid for feature selection. In Icml, 1, 74-81.
  • Deng, X., Li, Y., Weng, J. ve Zhang, J. (2019) Feature selection for text classification: A review. Multimedia Tools and Applications, 78(3), 3797-3816.
  • Günal, S. (2012) Hybrid feature selection for text classification. Turkish Journal of Electrical Engineering and Computer Science, 20(Sup. 2); 1296-1311.
  • Heidari, A. A., Mirjalili, S., Faris, H., Aljarah, I., Mafarja, M. ve Chen, H. (2019). Harris hawks optimization: Algorithm and applications. Future generation computer systems, 97, 849-872.
  • Jindal, R., Malhotra, R. ve Jain, A. (2015) Techniques for text classification: Literature review and current trends. Webology, 12(2).
  • Kononenko, I., Šimec, E. ve Robnik-Šikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7(1), 39-55.
  • Kira, K. ve Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Aaai (Vol. 2, pp. 129-134).
  • Kira, K. ve Rendell, L. A. (1992). A practical approach to feature selection. In Machine learning proceedings 1992 (pp. 249-256). Morgan Kaufmann.
  • Labani, M., Moradi, P., Ahmadizar, F. ve Jalili, M. (2018). A novel multivariate filter method for feature selection in text classification problems. Engineering Applications of Artificial Intelligence, 70, 25-37.
  • Lee, J., Park, J., Kim, H. C. ve Kim, D. W. (2019). Competitive particle swarm optimization for multi-category text feature selection. Entropy, 21(6), 602.
  • Liu, H. ve Setiono, R. (1997). Feature selection and classification-a probabilistic wrapper approach. In Proceedings of 9th International Conference on Industrial and Engineering Applications of AI and ES, January, p.419-424.
  • Manoj, R. J., Praveena, M. A. ve Vijayakumar, K. (2019) An ACO–ANN based feature selection algorithm for big data. Cluster Computing, 22(2), 3953-3960.
  • Marie-Sainte, S. L. ve Alalyani, N. (2020). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University-Computer and Information Sciences, 32(3), 320-328.
  • Mirjalili, S., Mirjalili, S. M. ve Lewis, A. (2014). Grey wolf optimizer. Advances in engineering software, 69, 46-61.
  • Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y. ve Wang, Z. (2007). A novel feature selection algorithm for text categorization. Expert Systems with Applications, 33(1), 1-5.
  • The SCI-NEWS website. (2021). (online), Available: http://www.sci-news.com/
  • Too, J., Abdullah, A. R. ve Mohd Saad, N. (2019). A new quadratic binary harris hawk optimization for feature selection. Electronics, 8(10), 1130.
  • Wah, Y. B., Ibrahim, N., Hamid, H. A., Abdul-Rahman, S. ve Fong, S. (2018). Feature Selection Methods: Case of Filter and Wrapper Approaches for Maximising Classification Accuracy. Pertanika Journal of Science & Technology, 26(1), 329-340.
  • Xing, E. P., Jordan, M. I. ve Karp, R. M. (2001). Feature selection for high-dimensional genomic microarray data. In Icml, 1, 601- 608.
There are 23 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Muhammet Aktaş 0000-0002-2598-3387

Fatih Kılıç 0000-0002-8550-1562

Publication Date July 31, 2021
Published in Issue Year 2021 Issue: 26 - Ejosat Special Issue 2021 (HORA)

Cite

APA Aktaş, M., & Kılıç, F. (2021). İkili Gri Kurt ve İkili Harris Şahin Optimizasyonları ile Web Haber Sayfalarının Sınıflandırılması. Avrupa Bilim Ve Teknoloji Dergisi(26), 234-241. https://doi.org/10.31590/ejosat.950497