Araştırma Makalesi

The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods

Cilt: 12 Sayı: 4 31 Aralık 2024
PDF İndir
EN

The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods

Abstract

In this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie reviews dataset, we first undertake data cleaning and text representation with data preprocessing steps. Then, we apply sentiment classification using different ML models. In order to evaluate the models, we used precision (P), recall (R), F1-score (F1), and area under curve (AUC), as well as receiver operating characteristic (ROC). It is worth noting that text feature extraction with Bidirectional Encoder Representations from Transformers (BERT) provided the highest performance in all models, with the SVM model offering particularly promising results. In this model, we observed the following results: ACC 0.9033, F1 0.9308, R 0.9015, R 0.9015, P 0.9072, AUC 0.9638, and ROC 0.96. These findings suggest that NLP techniques and, in particular, machine learning models that employ BERT may offer high levels of accuracy and reliability in text classification problems. It would be beneficial for future studies to validate these findings using BERT on different NLP tasks. This would help to evaluate the effectiveness and applicability of the models in practice.

Keywords

Destekleyen Kurum

None

Proje Numarası

Yok

Etik Beyan

None

Teşekkür

None

Kaynakça

  1. [1] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150, 2011.
  2. [2] Z. Shaukat, A. A. Zulfiqar, C. Xiao, M. Azeem, and T. Mahmood, “Sentiment analysis on IMDB using lexicon and neural networks,” SN Appl Sci, vol. 2, no. 2, p. 148, Feb. 2020, doi: 10.1007/s42452-019-1926-x.
  3. [3] O. Kaynar, Y. Görmez, M. Yldz, and A. Albayrak, “Makine öğrenmesi yöntemleri ile Duygu Analizi,” in International Artificial Intelligence and Data Processing Symposium (IDAP’16), 2016, pp. 17–18.
  4. [4] K. Amulya, S. B. Swathi, P. Kamakshi, and Y. Bhavani, “Sentiment Analysis on IMDB Movie Reviews using Machine Learning and Deep Learning Algorithms,” in 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, Jan. 2022, pp. 814–819. doi: 10.1109/ICSSIT53264.2022.9716550.
  5. [5] A. Misini, A. Kadriu, and E. Canhasi, “Albanian Authorship Attribution Model,” in 2023 12th Mediterranean Conference on Embedded Computing (MECO), IEEE, Jun. 2023, pp. 1–5. doi: 10.1109/MECO58584.2023.10155046.
  6. [6] M. S. Basarslan and F. Kayaalp, “Sentiment Analysis with Various Deep Learning Models on Movie Reviews,” in 2022 International Conference on Artificial Intelligence of Things (ICAIoT), IEEE, Dec. 2022, pp. 1–5. doi: 10.1109/ICAIoT57170.2022.10121745.
  7. [7] M. Mohaiminul and N. Sultana, “Comparative Study on Machine Learning Algorithms for Sentiment Classification,” Int J Comput Appl, vol. 182, no. 21, pp. 1–7, Oct. 2018, doi: 10.5120/ijca2018917961.
  8. [8] S. N. Başa and M. S. Basarslan, “Sentiment Analysis Using Machine Learning Techniques on IMDB Dataset,” in 2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), IEEE, Oct. 2023, pp. 1–5. doi: 10.1109/ISMSIT58785.2023.10304923.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Karar Desteği ve Grup Destek Sistemleri , Bilgi Sistemleri (Diğer)

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

21 Kasım 2024

Yayımlanma Tarihi

31 Aralık 2024

Gönderilme Tarihi

9 Haziran 2024

Kabul Tarihi

6 Ekim 2024

Yayımlandığı Sayı

Yıl 2024 Cilt: 12 Sayı: 4

Kaynak Göster

APA
Göç, V., & Başarslan, M. S. (2024). The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 12(4), 893-901. https://doi.org/10.29109/gujsc.1498509

                                     16168      16167     16166     21432        logo.png   


    e-ISSN:2147-9526