Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization

Ezgi Zorarpacı

doi:10.35414/akufemubid.1420120

Research Article

Parçacık Sürü Optimizasyonunu Kullanan Sarmalayıcı Öznitelik Seçimine Dayalı Türkçe Metin Sınıflandırma

Year 2024, Volume: 24 Issue: 5, 1180 - 1188, 01.10.2024

Ezgi Zorarpacı

https://doi.org/10.35414/akufemubid.1420120

Abstract

Dijital çağ verilerinin büyük çoğunluğu metin olarak depolanmaktadır. Metin madenciliği veri madenciliğinin ayrılmaz bir parçasıdır. Metin sınıflandırma (TC), metin madenciliğinde sıklıkla ihtiyaç duyulan bir doğal dil işleme (NLP) işlemidir. Bu işleme bilgi erişimi, belge sınıflandırma, dil tespiti, duygu analizi vb. birçok araştırmada ihtiyaç duyulmaktadır. Literatüre göre, Türkçe TC'de veri boyutunun azaltması için filtre öznitelik seçme yöntemleri sıklıkla uygulanmaktadır. Ancak sarmalayıcı tabanlı öznitelik seçme yöntemleri, filtre yöntemlerine kıyasla daha iyi sınıflandırma doğruluğu sağlayabilir. Bu fikirden hareketle, bu çalışmada parçacık sürüsü optimizasyon algoritması (PSO) ve çok terimli naive bayes (MNB) sınıflandırıcısını kullanan sarmalayıcı öznitelik seçim yöntemi tabanlı bir Türkçe TC metodu önerilmektedir. Deneylerde TC için TTC-3600 Türkçe haber metinleri kullanılmıştır. Önerilen yöntem, köklerine ayrılmış (stemming) Tf-Idf özniteliklerini kullanarak TTC-3600 Türkçe haber metni veri kümesinde %94,55'lik bir sınıflandırma doğruluğuna ulaşmaktadır. Böylece en son Türkçe TC yöntemleriyle rekabet edebilen sınıflandırma doğrulukları üretmektedir.

Keywords

Öznitelik seçimi , Doğal dil işleme , Metin sınıflandırma , Metin madenciliği

References

Aci, Ç. And Çirak , A., 2019. Turkish news articles categorization using convolutional neural networks and Word2Vec. Bilişim Teknolojileri Dergisi, 12(3), 219-228. https://doi.org/10.17671/gazibtd.457917
Alqaraleh, S., 2021. Efficient Turkish text classification approach for crisis management systems. Gazi University Journal of Science, 34(3), 718-731. https://doi.org/10.35378/gujs.715296
Borandağ, E., Özçift, A. and Kaygusuz, Y., 2021. Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization. Turkish Journal of Electrical Engineering and Computer Sciences, 29(2), 514-530. https://doi.org/10.3906/elk-1911-116
Dogru, H. B., Tilki, S., Jamil, A. and Hameed, A. A., 2021. Deep learning-based classification of news texts using doc2vec model. 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 91-96.
Ghareb, A.S., Bakar, A.A. and Hamdan, A.R., 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications, 49, 31-47. https://doi.org/10.1016/j.eswa.2015.12.004
Heyong, W. and Ming, H., 2019. Supervised Hebb rule based feature selection for text classification. Information Processing and Management, 56, 167-191. https://doi.org/10.1016/j.ipm.2018.09.004
Kayakuş, M. and Açıkgöz, F. Y., 2022. Classification of news texts by categories using machine learning methods. Alphanumeric Journal, 10(2), 155-166. https://doi.org/10.17093/alphanumeric.1149753
Kennedy, J. and Eberhart, R., 1995. Particle swarm optimization. In Proceedings of ICNN'95-international conference on neural networks, 4, 1942-1948.
Kılınç, D., 2016. The effect of ensemble learning models on Turkish text classification. Celal Bayar University Journal of Science, 12(2), 215-220. http://dx.doi.org/10.18466/cbujos.04526
Kılınç, D., Özçift, A., Bozyigit, F., Yıldırım, P., Yücalar, F. and Borandag, E., 2017. TTC-3600: A new benchmark dataset for Turkish text categorization. Journal of Information Science, 43(2), 174-185. https://doi.org/10.1177/0165551515620551
Kim, K. and Zzang, S. Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119, 1-21. https://doi.org/10.1016/j.datak.2018.10.003
Köksal, Ö., 2020. Tuning the Turkish text classification process using supervised machine learning-based algorithms. International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 1-7.
Köksal, Ö. and Yılmaz, E.H., 2022. Improving automated Turkish text classification with learning‐based algorithms. Concurrency and Computation: Practice and Experience, 34(11), e6874. https://doi.org/10.1016/j.datak.2018.10.003
Köksal, Ö. and Akgül, Ö., 2022. A comparative text classification study with deep learning-based algorithms. 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 387-391.
Kuyumcu, B., Aksakalli, C. and Delil, S., 2019. An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing. 3rd International Conference on Natural Language Processing and Information Retrieval, Tokushima, Japan, 1-4.
McCallum, A. and Nigam, K., 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, 752, 41-48.
Meetei, L. S., Singh, T. D., Borgohain, S. K. and Bandyopadhyay, S., 2021. Low resource language specific pre-processing and features for sentiment analysis task. Language Resources and Evaluation, 55(4), 947-969. https://doi.org/10.1007/s10579-021-09541-9 Parlak, B., 2023. The effects of preprocessing on Turkish and English news data. Sakarya University Journal of Computer and Information Sciences, 6(1), 59-66. https://doi.org/ 10.35377/saucis...1207742
Umer, M., Imtiaz, Z., Ahmad, M., Nappi, M., Medaglia, C., Choi, G. S., and Mehmood, A., 2023. Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications, 82(4), 5569-5585. https://doi.org/10.1007/s11042-022-13459-x
Wang, D., Tan, D. and Liu, L., 2018. Particle swarm optimization algorithm: an overview. Soft Computing, 22, 387-408. https://doi.org/10.1007/s00500-016-2474-6
Xie, L., Liu, G. and Lian, H., 2019. Deep variational auto-encoder for text classification. In 2019 IEEE International conference on industrial cyber physical systems (ICPS), 737-742.
Yürekli, A., 2023. On the effectiveness of paragraph vector models in document similarity estimatıon for Turkish news categorization. Eskişehir Technical University Journal of Science and Technology A-Applied Sciences and Engineering, 24(1), 23-34. https://doi.org/10.18038/estubtda.1175001
Zhang, J., Wu, H., Jiang, M., Liu, J., Li, S., Tang, Y. and Long, J., 2023. Group-preserving label-specific feature selection for multi-label learning. Expert Systems with Applications, 213, 118861. https://doi.org/10.1016/j.eswa.2022.118861
Zorarpaci, E., 2023. A Turkish text classification based feature selection and density peaks clustering. 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 1-4.
https://archive.ics.uci.edu/dataset/407/ttc+3600 +benchmark+dataset+for+turkish+text+categorization. (15.01.2024)

Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization

Year 2024, Volume: 24 Issue: 5, 1180 - 1188, 01.10.2024

Ezgi Zorarpacı

https://doi.org/10.35414/akufemubid.1420120

Abstract

The vast majority of the digital era data is stored as text. Text mining is an integral part of data mining. Text classification (TC) is a natural language processing (NLP) operation often needed in text mining. This operation is needed in numerous kinds of research such as information retrieval, document classification, language detection, sentiment analysis, etc. According to the literature, the filter feature selection methods have often been applied to reduce the dimensionality of data in Turkish TC. However, the wrapper-based feature selection methods can provide better classification accuracies than the filter methods. Motivated by this idea, a Turkish TC method based on wrapper feature selection using particle swarm optimization algorithm (PSO) and multinomial naive bayes (MNB) classifier is proposed in this study. TTC-3600 Turkish news texts are used for TC in the experiments. The proposed method achieves a classification accuracy of 94.55% on TTC-3600 Turkish news text dataset by using stemming Tf-Idf features. Hence, it produces competitive accuracies to the cutting-edge Turkish TC methods.

Keywords

Feature selection , Natural language processing , text classification , text mining

References

Aci, Ç. And Çirak , A., 2019. Turkish news articles categorization using convolutional neural networks and Word2Vec. Bilişim Teknolojileri Dergisi, 12(3), 219-228. https://doi.org/10.17671/gazibtd.457917
Alqaraleh, S., 2021. Efficient Turkish text classification approach for crisis management systems. Gazi University Journal of Science, 34(3), 718-731. https://doi.org/10.35378/gujs.715296
Borandağ, E., Özçift, A. and Kaygusuz, Y., 2021. Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization. Turkish Journal of Electrical Engineering and Computer Sciences, 29(2), 514-530. https://doi.org/10.3906/elk-1911-116
Dogru, H. B., Tilki, S., Jamil, A. and Hameed, A. A., 2021. Deep learning-based classification of news texts using doc2vec model. 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 91-96.
Ghareb, A.S., Bakar, A.A. and Hamdan, A.R., 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications, 49, 31-47. https://doi.org/10.1016/j.eswa.2015.12.004
Heyong, W. and Ming, H., 2019. Supervised Hebb rule based feature selection for text classification. Information Processing and Management, 56, 167-191. https://doi.org/10.1016/j.ipm.2018.09.004
Kayakuş, M. and Açıkgöz, F. Y., 2022. Classification of news texts by categories using machine learning methods. Alphanumeric Journal, 10(2), 155-166. https://doi.org/10.17093/alphanumeric.1149753
Kennedy, J. and Eberhart, R., 1995. Particle swarm optimization. In Proceedings of ICNN'95-international conference on neural networks, 4, 1942-1948.
Kılınç, D., 2016. The effect of ensemble learning models on Turkish text classification. Celal Bayar University Journal of Science, 12(2), 215-220. http://dx.doi.org/10.18466/cbujos.04526
Kılınç, D., Özçift, A., Bozyigit, F., Yıldırım, P., Yücalar, F. and Borandag, E., 2017. TTC-3600: A new benchmark dataset for Turkish text categorization. Journal of Information Science, 43(2), 174-185. https://doi.org/10.1177/0165551515620551
Kim, K. and Zzang, S. Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119, 1-21. https://doi.org/10.1016/j.datak.2018.10.003
Köksal, Ö., 2020. Tuning the Turkish text classification process using supervised machine learning-based algorithms. International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 1-7.
Köksal, Ö. and Yılmaz, E.H., 2022. Improving automated Turkish text classification with learning‐based algorithms. Concurrency and Computation: Practice and Experience, 34(11), e6874. https://doi.org/10.1016/j.datak.2018.10.003
Köksal, Ö. and Akgül, Ö., 2022. A comparative text classification study with deep learning-based algorithms. 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 387-391.
Kuyumcu, B., Aksakalli, C. and Delil, S., 2019. An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing. 3rd International Conference on Natural Language Processing and Information Retrieval, Tokushima, Japan, 1-4.
McCallum, A. and Nigam, K., 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, 752, 41-48.
Meetei, L. S., Singh, T. D., Borgohain, S. K. and Bandyopadhyay, S., 2021. Low resource language specific pre-processing and features for sentiment analysis task. Language Resources and Evaluation, 55(4), 947-969. https://doi.org/10.1007/s10579-021-09541-9 Parlak, B., 2023. The effects of preprocessing on Turkish and English news data. Sakarya University Journal of Computer and Information Sciences, 6(1), 59-66. https://doi.org/ 10.35377/saucis...1207742
Umer, M., Imtiaz, Z., Ahmad, M., Nappi, M., Medaglia, C., Choi, G. S., and Mehmood, A., 2023. Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications, 82(4), 5569-5585. https://doi.org/10.1007/s11042-022-13459-x
Wang, D., Tan, D. and Liu, L., 2018. Particle swarm optimization algorithm: an overview. Soft Computing, 22, 387-408. https://doi.org/10.1007/s00500-016-2474-6
Xie, L., Liu, G. and Lian, H., 2019. Deep variational auto-encoder for text classification. In 2019 IEEE International conference on industrial cyber physical systems (ICPS), 737-742.
Yürekli, A., 2023. On the effectiveness of paragraph vector models in document similarity estimatıon for Turkish news categorization. Eskişehir Technical University Journal of Science and Technology A-Applied Sciences and Engineering, 24(1), 23-34. https://doi.org/10.18038/estubtda.1175001
Zhang, J., Wu, H., Jiang, M., Liu, J., Li, S., Tang, Y. and Long, J., 2023. Group-preserving label-specific feature selection for multi-label learning. Expert Systems with Applications, 213, 118861. https://doi.org/10.1016/j.eswa.2022.118861
Zorarpaci, E., 2023. A Turkish text classification based feature selection and density peaks clustering. 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 1-4.
https://archive.ics.uci.edu/dataset/407/ttc+3600 +benchmark+dataset+for+turkish+text+categorization. (15.01.2024)

There are 24 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Articles
Authors	Ezgi Zorarpacı 0000-0003-0974-7584
Early Pub Date	September 10, 2024
Publication Date	October 1, 2024
Submission Date	January 15, 2024
Acceptance Date	July 16, 2024
Published in Issue	Year 2024 Volume: 24 Issue: 5

Cite

APA	Zorarpacı, E. (2024). Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 24(5), 1180-1188. https://doi.org/10.35414/akufemubid.1420120
AMA	Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. October 2024;24(5):1180-1188. doi:10.35414/akufemubid.1420120
Chicago	Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24, no. 5 (October 2024): 1180-88. https://doi.org/10.35414/akufemubid.1420120.
EndNote	Zorarpacı E (October 1, 2024) Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24 5 1180–1188.
IEEE	E. Zorarpacı, “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 24, no. 5, pp. 1180–1188, 2024, doi: 10.35414/akufemubid.1420120.
ISNAD	Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24/5 (October2024), 1180-1188. https://doi.org/10.35414/akufemubid.1420120.
JAMA	Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24:1180–1188.
MLA	Zorarpacı, Ezgi. “Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 24, no. 5, 2024, pp. 1180-8, doi:10.35414/akufemubid.1420120.
Vancouver	Zorarpacı E. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24(5):1180-8.

Download Cover Image

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.