Research Article
BibTex RIS Cite

ENHANCING MULTI-CLASS TEXT CLASSIFICATION WITH APRIORI-BASED FEATURE SELECTION

Year 2024, Volume: 10 Issue: 1, 41 - 57, 30.06.2024
https://doi.org/10.51477/mejs.1475196

Abstract

: In the field of Natural Language Processing, selecting the right features is crucial for reducing unnecessary model complexity, speeding up training, and improving the ability to generalize. However, the multi-class text classification problem makes it challenging for models to generalize well, which complicates feature selection. This paper investigates how feature selection impacts model performance for multi-class text classification, using a dataset of projects completed by TÜBİTAK TEYDEB between 2009 and 2022. The study employs LSTM, a deep learning method, to classify the projects into nine different industries based on various attributes. The paper proposes a new feature selection approach based on the Apriori algorithm, which reduces the number of attribute combinations considered and makes model training more efficient. Model performance is evaluated using metrics like accuracy, loss, validation scores, and test scores. The key findings are that feature selection significantly affects model performance, and different feature sets have varying impacts on performance.

Ethical Statement

Our study does not cause any harm to the environment and does not involve the use of animal or human subjects. Therefore, it was not necessary to obtain an Ethics Committee Report.

References

  • Dogra, V., Singh, A., Verma, S., Kavita, Jhanjhi, N. Z., Talib, M. N., Understanding of data preprocessing for dimensionality reduction using feature selection techniques in text classification, in: Intelligent Computing and Innovation on Data Science: Proceedings of ICTIDS, Springer, Singapore, pp. 455-464, 2021.
  • Thirumoorthy, K., Muneeswaran, K., “Feature selection for text classification using machine learning approaches.” National Academy Science Letters, 45(1), 51-56, 2022.
  • Amazal, H., Ramdani, M., Kissi, M. (2020). “Towards a feature selection for multi-label text classification in big data.” Proceedings of Smart Applications and Data Analysis: Third International Conference, Marrakesh, Morocco, June 25–26, 2020, pp. 187-199
  • Naik, D. A., Mythreyan, S., Seema, S., “Relevance Feature Discovery in Text Mining Using NLP”. in: 3rd International Conference for Emerging Technology, IEEE, pp. 1-6, 2022.
  • Dowlagar, S., Mamidi, R. “Does a Hybrid Neural Network Based Feature Selection Model Improve Text Classification?”, arXiv preprint arXiv:2101.09009, 2021. https://doi.org/10.48550/arXiv.2101.09009
  • Hussain, S. F., Babar, H. Z. U. D., Khalil, A., Jillani, R. M., Hanif, M., Khurshid, K. “A fast non-redundant feature selection technique for text data”, IEEE Access, 8, 181763-181781, 2020.
  • Belkarkor, S., Hafidi, I., Nachaoui, M., Feature Selection for Text Classification Using Genetic Algorithm, in: International Conference of Machine Learning and Computer Science Applications, Cham: Springer Nature Switzerland, pp. 69-80, 2022.
  • Zheng, W., “A comparative study of feature selection methods.” International Journal on Natural Language Computing, 7(5), 01-09, 2018.
  • Xiaochuan, T., Xiaochuan, T., Yuanshun, D., Yanping, X., “Feature selection based on feature interactions with application to text categorization.” Expert Systems with Applications, 120, 207-216, 2019. doi: 10.1016/J.ESWA.2018.11.018
  • TÜBİTAK TEYDEP Proje Değerlendirme ve İzleme Sistemi (2023, Nov. 23). Tamamlanmış Proje Sorgula [Online]. Available: https://eteydeb.tubitak.gov.tr/
  • re — Regular expression operations. (2024, Jan. 05). Regular expression operations [Online]. Available: https://docs.python.org/3/library/re.html
  • Van Houdt, G., Mosquera, C., Nápoles, G., “A review on the long short-term memory model”, Artificial Intelligence Review, 53(8), 5929–5955, 2020. https://doi.org/10.1007/s10462-020-09838-1
  • PO, U., Udanor, C. N., Bakpo, F. S., “Deep Learning Algorithms for predicting the geographical locations of Pandemic Disease Patients from Global Positioning System (GPS) Trajectory Datasets”. https://doi.org/10.21203/rs.3.rs-2770308/v1
  • Rainio, O., Teuho, J., Klén, R. “Evaluation metrics and statistical tests for machine learning”. Scientific Reports, 14(1), 6086. 2024.
Year 2024, Volume: 10 Issue: 1, 41 - 57, 30.06.2024
https://doi.org/10.51477/mejs.1475196

Abstract

References

  • Dogra, V., Singh, A., Verma, S., Kavita, Jhanjhi, N. Z., Talib, M. N., Understanding of data preprocessing for dimensionality reduction using feature selection techniques in text classification, in: Intelligent Computing and Innovation on Data Science: Proceedings of ICTIDS, Springer, Singapore, pp. 455-464, 2021.
  • Thirumoorthy, K., Muneeswaran, K., “Feature selection for text classification using machine learning approaches.” National Academy Science Letters, 45(1), 51-56, 2022.
  • Amazal, H., Ramdani, M., Kissi, M. (2020). “Towards a feature selection for multi-label text classification in big data.” Proceedings of Smart Applications and Data Analysis: Third International Conference, Marrakesh, Morocco, June 25–26, 2020, pp. 187-199
  • Naik, D. A., Mythreyan, S., Seema, S., “Relevance Feature Discovery in Text Mining Using NLP”. in: 3rd International Conference for Emerging Technology, IEEE, pp. 1-6, 2022.
  • Dowlagar, S., Mamidi, R. “Does a Hybrid Neural Network Based Feature Selection Model Improve Text Classification?”, arXiv preprint arXiv:2101.09009, 2021. https://doi.org/10.48550/arXiv.2101.09009
  • Hussain, S. F., Babar, H. Z. U. D., Khalil, A., Jillani, R. M., Hanif, M., Khurshid, K. “A fast non-redundant feature selection technique for text data”, IEEE Access, 8, 181763-181781, 2020.
  • Belkarkor, S., Hafidi, I., Nachaoui, M., Feature Selection for Text Classification Using Genetic Algorithm, in: International Conference of Machine Learning and Computer Science Applications, Cham: Springer Nature Switzerland, pp. 69-80, 2022.
  • Zheng, W., “A comparative study of feature selection methods.” International Journal on Natural Language Computing, 7(5), 01-09, 2018.
  • Xiaochuan, T., Xiaochuan, T., Yuanshun, D., Yanping, X., “Feature selection based on feature interactions with application to text categorization.” Expert Systems with Applications, 120, 207-216, 2019. doi: 10.1016/J.ESWA.2018.11.018
  • TÜBİTAK TEYDEP Proje Değerlendirme ve İzleme Sistemi (2023, Nov. 23). Tamamlanmış Proje Sorgula [Online]. Available: https://eteydeb.tubitak.gov.tr/
  • re — Regular expression operations. (2024, Jan. 05). Regular expression operations [Online]. Available: https://docs.python.org/3/library/re.html
  • Van Houdt, G., Mosquera, C., Nápoles, G., “A review on the long short-term memory model”, Artificial Intelligence Review, 53(8), 5929–5955, 2020. https://doi.org/10.1007/s10462-020-09838-1
  • PO, U., Udanor, C. N., Bakpo, F. S., “Deep Learning Algorithms for predicting the geographical locations of Pandemic Disease Patients from Global Positioning System (GPS) Trajectory Datasets”. https://doi.org/10.21203/rs.3.rs-2770308/v1
  • Rainio, O., Teuho, J., Klén, R. “Evaluation metrics and statistical tests for machine learning”. Scientific Reports, 14(1), 6086. 2024.
There are 14 citations in total.

Details

Primary Language English
Subjects Communications Engineering (Other)
Journal Section Article
Authors

Maide Feyza Er 0000-0003-2580-1309

Turgay Tugay Bilgin 0000-0002-9245-5728

Publication Date June 30, 2024
Submission Date April 29, 2024
Acceptance Date June 25, 2024
Published in Issue Year 2024 Volume: 10 Issue: 1

Cite

IEEE M. F. Er and T. T. Bilgin, “ENHANCING MULTI-CLASS TEXT CLASSIFICATION WITH APRIORI-BASED FEATURE SELECTION”, MEJS, vol. 10, no. 1, pp. 41–57, 2024, doi: 10.51477/mejs.1475196.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

TRDizinlogo_live-e1586763957746.png   ici2.png     scholar_logo_64dp.png    CenterLogo.png     crossref-logo-landscape-200.png  logo.png         logo1.jpg   DRJI_Logo.jpg  17826265674769  logo.png