Araştırma Makalesi
BibTex RIS Kaynak Göster

Analysis of feature extraction techniques for sentiment analysis of tweets

Yıl 2024, Cilt: 8 Sayı: 4, 741 - 753, 31.10.2024
https://doi.org/10.31127/tuje.1477502

Öz

Over the past few years, sentiment analysis has moved from social networking services like LinkedIn, Facebook, YouTube, Twitter, and online product-based reviews to determine public opinion or emotion using social media textual contents. The methodology includes data selection, text pre-processing, feature extraction, classification model, and result analysis. Text pre-processing is an important stage in structuring data for improved performance of our methodology. The feature extraction technique (FET) is a crucial step in sentiment analysis as it is difficult to obtain effective and useful information from highly unstructured social media data. A number of feature extraction techniques are available to extract useful features. In this work, popular feature extraction techniques including bag of words (BOW), term frequency and inverse document frequency (TF-IDF), and Word2vec are compared and analyzed for the sentiment analysis of social media contents. A method is proposed for processing text data from social media networks for sentiment analysis that uses support vector machine as a classifier. The experiments are carried on three datasets of different context namely US Airline, Movie Review, and News from Twitter. The results show that TF-IDF consistently outperformed other techniques with best accuracy of 82.33%, 92.31%, and 99.10% for Airline, Movie Review, and News datasets respectively. It is also found that the proposed method performed better than some existing methods.

Etik Beyan

I confirm that this work is original and has not been published elsewhere, nor is it currently under for publication elsewhere.

Kaynakça

  • Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
  • Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
  • Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
  • Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
  • Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
  • Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
  • Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
  • Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
  • Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
  • Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
  • Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
  • Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
  • Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
  • Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
  • Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
  • Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
  • Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
  • Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
  • Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
  • Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
  • Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
  • Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
  • Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
  • Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
  • Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
  • Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
  • Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
  • Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
  • Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w
Yıl 2024, Cilt: 8 Sayı: 4, 741 - 753, 31.10.2024
https://doi.org/10.31127/tuje.1477502

Öz

Kaynakça

  • Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
  • Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
  • Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
  • Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
  • Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
  • Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
  • Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
  • Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
  • Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
  • Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
  • Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
  • Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
  • Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
  • Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
  • Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
  • Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
  • Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
  • Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
  • Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
  • Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
  • Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
  • Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
  • Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
  • Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
  • Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
  • Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
  • Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
  • Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
  • Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w
Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Otomatik Yazılım Mühendisliği, Pekiştirmeli Öğrenme
Bölüm Articles
Yazarlar

Satyendra Sıngh 0009-0009-7907-0063

Krishan Kumar

Brajesh Kumar Bu kişi benim

Erken Görünüm Tarihi 28 Ekim 2024
Yayımlanma Tarihi 31 Ekim 2024
Gönderilme Tarihi 2 Mayıs 2024
Kabul Tarihi 6 Haziran 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 8 Sayı: 4

Kaynak Göster

APA Sıngh, S., Kumar, K., & Kumar, B. (2024). Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering, 8(4), 741-753. https://doi.org/10.31127/tuje.1477502
AMA Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. Ekim 2024;8(4):741-753. doi:10.31127/tuje.1477502
Chicago Sıngh, Satyendra, Krishan Kumar, ve Brajesh Kumar. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8, sy. 4 (Ekim 2024): 741-53. https://doi.org/10.31127/tuje.1477502.
EndNote Sıngh S, Kumar K, Kumar B (01 Ekim 2024) Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering 8 4 741–753.
IEEE S. Sıngh, K. Kumar, ve B. Kumar, “Analysis of feature extraction techniques for sentiment analysis of tweets”, TUJE, c. 8, sy. 4, ss. 741–753, 2024, doi: 10.31127/tuje.1477502.
ISNAD Sıngh, Satyendra vd. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8/4 (Ekim 2024), 741-753. https://doi.org/10.31127/tuje.1477502.
JAMA Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8:741–753.
MLA Sıngh, Satyendra vd. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering, c. 8, sy. 4, 2024, ss. 741-53, doi:10.31127/tuje.1477502.
Vancouver Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8(4):741-53.
Flag Counter