Research Article
BibTex RIS Cite

Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization

Year 2024, Volume: 24 Issue: 5, 1180 - 1188, 01.10.2024
https://doi.org/10.35414/akufemubid.1420120

Abstract

The vast majority of the digital era data is stored as text. Text mining is an integral part of data mining. Text classification (TC) is a natural language processing (NLP) operation often needed in text mining. This operation is needed in numerous kinds of research such as information retrieval, document classification, language detection, sentiment analysis, etc. According to the literature, the filter feature selection methods have often been applied to reduce the dimensionality of data in Turkish TC. However, the wrapper-based feature selection methods can provide better classification accuracies than the filter methods. Motivated by this idea, a Turkish TC method based on wrapper feature selection using particle swarm optimization algorithm (PSO) and multinomial naive bayes (MNB) classifier is proposed in this study. TTC-3600 Turkish news texts are used for TC in the experiments. The proposed method achieves a classification accuracy of 94.55% on TTC-3600 Turkish news text dataset by using stemming Tf-Idf features. Hence, it produces competitive accuracies to the cutting-edge Turkish TC methods.

References

  • Aci, Ç. And Çirak , A., 2019. Turkish news articles categorization using convolutional neural networks and Word2Vec. Bilişim Teknolojileri Dergisi, 12(3), 219-228. https://doi.org/10.17671/gazibtd.457917
  • Alqaraleh, S., 2021. Efficient Turkish text classification approach for crisis management systems. Gazi University Journal of Science, 34(3), 718-731. https://doi.org/10.35378/gujs.715296
  • Borandağ, E., Özçift, A. and Kaygusuz, Y., 2021. Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization. Turkish Journal of Electrical Engineering and Computer Sciences, 29(2), 514-530. https://doi.org/10.3906/elk-1911-116
  • Dogru, H. B., Tilki, S., Jamil, A. and Hameed, A. A., 2021. Deep learning-based classification of news texts using doc2vec model. 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 91-96.
  • Ghareb, A.S., Bakar, A.A. and Hamdan, A.R., 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications, 49, 31-47. https://doi.org/10.1016/j.eswa.2015.12.004
  • Heyong, W. and Ming, H., 2019. Supervised Hebb rule based feature selection for text classification. Information Processing and Management, 56, 167-191. https://doi.org/10.1016/j.ipm.2018.09.004
  • Kayakuş, M. and Açıkgöz, F. Y., 2022. Classification of news texts by categories using machine learning methods. Alphanumeric Journal, 10(2), 155-166. https://doi.org/10.17093/alphanumeric.1149753
  • Kennedy, J. and Eberhart, R., 1995. Particle swarm optimization. In Proceedings of ICNN'95-international conference on neural networks, 4, 1942-1948.
  • Kılınç, D., 2016. The effect of ensemble learning models on Turkish text classification. Celal Bayar University Journal of Science, 12(2), 215-220. http://dx.doi.org/10.18466/cbujos.04526
  • Kılınç, D., Özçift, A., Bozyigit, F., Yıldırım, P., Yücalar, F. and Borandag, E., 2017. TTC-3600: A new benchmark dataset for Turkish text categorization. Journal of Information Science, 43(2), 174-185. https://doi.org/10.1177/0165551515620551
  • Kim, K. and Zzang, S. Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119, 1-21. https://doi.org/10.1016/j.datak.2018.10.003
  • Köksal, Ö., 2020. Tuning the Turkish text classification process using supervised machine learning-based algorithms. International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 1-7.
  • Köksal, Ö. and Yılmaz, E.H., 2022. Improving automated Turkish text classification with learning‐based algorithms. Concurrency and Computation: Practice and Experience, 34(11), e6874. https://doi.org/10.1016/j.datak.2018.10.003
  • Köksal, Ö. and Akgül, Ö., 2022. A comparative text classification study with deep learning-based algorithms. 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 387-391.
  • Kuyumcu, B., Aksakalli, C. and Delil, S., 2019. An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing. 3rd International Conference on Natural Language Processing and Information Retrieval, Tokushima, Japan, 1-4.
  • McCallum, A. and Nigam, K., 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, 752, 41-48.
  • Meetei, L. S., Singh, T. D., Borgohain, S. K. and Bandyopadhyay, S., 2021. Low resource language specific pre-processing and features for sentiment analysis task. Language Resources and Evaluation, 55(4), 947-969. https://doi.org/10.1007/s10579-021-09541-9 Parlak, B., 2023. The effects of preprocessing on Turkish and English news data. Sakarya University Journal of Computer and Information Sciences, 6(1), 59-66. https://doi.org/ 10.35377/saucis...1207742
  • Umer, M., Imtiaz, Z., Ahmad, M., Nappi, M., Medaglia, C., Choi, G. S., and Mehmood, A., 2023. Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications, 82(4), 5569-5585. https://doi.org/10.1007/s11042-022-13459-x
  • Wang, D., Tan, D. and Liu, L., 2018. Particle swarm optimization algorithm: an overview. Soft Computing, 22, 387-408. https://doi.org/10.1007/s00500-016-2474-6
  • Xie, L., Liu, G. and Lian, H., 2019. Deep variational auto-encoder for text classification. In 2019 IEEE International conference on industrial cyber physical systems (ICPS), 737-742.
  • Yürekli, A., 2023. On the effectiveness of paragraph vector models in document similarity estimatıon for Turkish news categorization. Eskişehir Technical University Journal of Science and Technology A-Applied Sciences and Engineering, 24(1), 23-34. https://doi.org/10.18038/estubtda.1175001
  • Zhang, J., Wu, H., Jiang, M., Liu, J., Li, S., Tang, Y. and Long, J., 2023. Group-preserving label-specific feature selection for multi-label learning. Expert Systems with Applications, 213, 118861. https://doi.org/10.1016/j.eswa.2022.118861
  • Zorarpaci, E., 2023. A Turkish text classification based feature selection and density peaks clustering. 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 1-4.
  • https://archive.ics.uci.edu/dataset/407/ttc+3600 +benchmark+dataset+for+turkish+text+categorization. (15.01.2024)

Parçacık Sürü Optimizasyonunu Kullanan Sarmalayıcı Öznitelik Seçimine Dayalı Türkçe Metin Sınıflandırma

Year 2024, Volume: 24 Issue: 5, 1180 - 1188, 01.10.2024
https://doi.org/10.35414/akufemubid.1420120

Abstract

Dijital çağ verilerinin büyük çoğunluğu metin olarak depolanmaktadır. Metin madenciliği veri madenciliğinin ayrılmaz bir parçasıdır. Metin sınıflandırma (TC), metin madenciliğinde sıklıkla ihtiyaç duyulan bir doğal dil işleme (NLP) işlemidir. Bu işleme bilgi erişimi, belge sınıflandırma, dil tespiti, duygu analizi vb. birçok araştırmada ihtiyaç duyulmaktadır. Literatüre göre, Türkçe TC'de veri boyutunun azaltması için filtre öznitelik seçme yöntemleri sıklıkla uygulanmaktadır. Ancak sarmalayıcı tabanlı öznitelik seçme yöntemleri, filtre yöntemlerine kıyasla daha iyi sınıflandırma doğruluğu sağlayabilir. Bu fikirden hareketle, bu çalışmada parçacık sürüsü optimizasyon algoritması (PSO) ve çok terimli naive bayes (MNB) sınıflandırıcısını kullanan sarmalayıcı öznitelik seçim yöntemi tabanlı bir Türkçe TC metodu önerilmektedir. Deneylerde TC için TTC-3600 Türkçe haber metinleri kullanılmıştır. Önerilen yöntem, köklerine ayrılmış (stemming) Tf-Idf özniteliklerini kullanarak TTC-3600 Türkçe haber metni veri kümesinde %94,55'lik bir sınıflandırma doğruluğuna ulaşmaktadır. Böylece en son Türkçe TC yöntemleriyle rekabet edebilen sınıflandırma doğrulukları üretmektedir.

References

  • Aci, Ç. And Çirak , A., 2019. Turkish news articles categorization using convolutional neural networks and Word2Vec. Bilişim Teknolojileri Dergisi, 12(3), 219-228. https://doi.org/10.17671/gazibtd.457917
  • Alqaraleh, S., 2021. Efficient Turkish text classification approach for crisis management systems. Gazi University Journal of Science, 34(3), 718-731. https://doi.org/10.35378/gujs.715296
  • Borandağ, E., Özçift, A. and Kaygusuz, Y., 2021. Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization. Turkish Journal of Electrical Engineering and Computer Sciences, 29(2), 514-530. https://doi.org/10.3906/elk-1911-116
  • Dogru, H. B., Tilki, S., Jamil, A. and Hameed, A. A., 2021. Deep learning-based classification of news texts using doc2vec model. 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 91-96.
  • Ghareb, A.S., Bakar, A.A. and Hamdan, A.R., 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications, 49, 31-47. https://doi.org/10.1016/j.eswa.2015.12.004
  • Heyong, W. and Ming, H., 2019. Supervised Hebb rule based feature selection for text classification. Information Processing and Management, 56, 167-191. https://doi.org/10.1016/j.ipm.2018.09.004
  • Kayakuş, M. and Açıkgöz, F. Y., 2022. Classification of news texts by categories using machine learning methods. Alphanumeric Journal, 10(2), 155-166. https://doi.org/10.17093/alphanumeric.1149753
  • Kennedy, J. and Eberhart, R., 1995. Particle swarm optimization. In Proceedings of ICNN'95-international conference on neural networks, 4, 1942-1948.
  • Kılınç, D., 2016. The effect of ensemble learning models on Turkish text classification. Celal Bayar University Journal of Science, 12(2), 215-220. http://dx.doi.org/10.18466/cbujos.04526
  • Kılınç, D., Özçift, A., Bozyigit, F., Yıldırım, P., Yücalar, F. and Borandag, E., 2017. TTC-3600: A new benchmark dataset for Turkish text categorization. Journal of Information Science, 43(2), 174-185. https://doi.org/10.1177/0165551515620551
  • Kim, K. and Zzang, S. Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119, 1-21. https://doi.org/10.1016/j.datak.2018.10.003
  • Köksal, Ö., 2020. Tuning the Turkish text classification process using supervised machine learning-based algorithms. International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 1-7.
  • Köksal, Ö. and Yılmaz, E.H., 2022. Improving automated Turkish text classification with learning‐based algorithms. Concurrency and Computation: Practice and Experience, 34(11), e6874. https://doi.org/10.1016/j.datak.2018.10.003
  • Köksal, Ö. and Akgül, Ö., 2022. A comparative text classification study with deep learning-based algorithms. 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 387-391.
  • Kuyumcu, B., Aksakalli, C. and Delil, S., 2019. An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing. 3rd International Conference on Natural Language Processing and Information Retrieval, Tokushima, Japan, 1-4.
  • McCallum, A. and Nigam, K., 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, 752, 41-48.
  • Meetei, L. S., Singh, T. D., Borgohain, S. K. and Bandyopadhyay, S., 2021. Low resource language specific pre-processing and features for sentiment analysis task. Language Resources and Evaluation, 55(4), 947-969. https://doi.org/10.1007/s10579-021-09541-9 Parlak, B., 2023. The effects of preprocessing on Turkish and English news data. Sakarya University Journal of Computer and Information Sciences, 6(1), 59-66. https://doi.org/ 10.35377/saucis...1207742
  • Umer, M., Imtiaz, Z., Ahmad, M., Nappi, M., Medaglia, C., Choi, G. S., and Mehmood, A., 2023. Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications, 82(4), 5569-5585. https://doi.org/10.1007/s11042-022-13459-x
  • Wang, D., Tan, D. and Liu, L., 2018. Particle swarm optimization algorithm: an overview. Soft Computing, 22, 387-408. https://doi.org/10.1007/s00500-016-2474-6
  • Xie, L., Liu, G. and Lian, H., 2019. Deep variational auto-encoder for text classification. In 2019 IEEE International conference on industrial cyber physical systems (ICPS), 737-742.
  • Yürekli, A., 2023. On the effectiveness of paragraph vector models in document similarity estimatıon for Turkish news categorization. Eskişehir Technical University Journal of Science and Technology A-Applied Sciences and Engineering, 24(1), 23-34. https://doi.org/10.18038/estubtda.1175001
  • Zhang, J., Wu, H., Jiang, M., Liu, J., Li, S., Tang, Y. and Long, J., 2023. Group-preserving label-specific feature selection for multi-label learning. Expert Systems with Applications, 213, 118861. https://doi.org/10.1016/j.eswa.2022.118861
  • Zorarpaci, E., 2023. A Turkish text classification based feature selection and density peaks clustering. 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 1-4.
  • https://archive.ics.uci.edu/dataset/407/ttc+3600 +benchmark+dataset+for+turkish+text+categorization. (15.01.2024)
There are 24 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Articles
Authors

Ezgi Zorarpacı 0000-0003-0974-7584

Early Pub Date September 10, 2024
Publication Date October 1, 2024
Submission Date January 15, 2024
Acceptance Date July 16, 2024
Published in Issue Year 2024 Volume: 24 Issue: 5

Cite

APA Zorarpacı, E. (2024). Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 24(5), 1180-1188. https://doi.org/10.35414/akufemubid.1420120
AMA Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. October 2024;24(5):1180-1188. doi:10.35414/akufemubid.1420120
Chicago Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24, no. 5 (October 2024): 1180-88. https://doi.org/10.35414/akufemubid.1420120.
EndNote Zorarpacı E (October 1, 2024) Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24 5 1180–1188.
IEEE E. Zorarpacı, “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 24, no. 5, pp. 1180–1188, 2024, doi: 10.35414/akufemubid.1420120.
ISNAD Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24/5 (October 2024), 1180-1188. https://doi.org/10.35414/akufemubid.1420120.
JAMA Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24:1180–1188.
MLA Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 24, no. 5, 2024, pp. 1180-8, doi:10.35414/akufemubid.1420120.
Vancouver Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24(5):1180-8.