Research Article
BibTex RIS Cite

Analysis of feature extraction techniques for sentiment analysis of tweets

Year 2024, Volume: 8 Issue: 4, 741 - 753, 31.10.2024
https://doi.org/10.31127/tuje.1477502

Abstract

Over the past few years, sentiment analysis has moved from social networking services like LinkedIn, Facebook, YouTube, Twitter, and online product-based reviews to determine public opinion or emotion using social media textual contents. The methodology includes data selection, text pre-processing, feature extraction, classification model, and result analysis. Text pre-processing is an important stage in structuring data for improved performance of our methodology. The feature extraction technique (FET) is a crucial step in sentiment analysis as it is difficult to obtain effective and useful information from highly unstructured social media data. A number of feature extraction techniques are available to extract useful features. In this work, popular feature extraction techniques including bag of words (BOW), term frequency and inverse document frequency (TF-IDF), and Word2vec are compared and analyzed for the sentiment analysis of social media contents. A method is proposed for processing text data from social media networks for sentiment analysis that uses support vector machine as a classifier. The experiments are carried on three datasets of different context namely US Airline, Movie Review, and News from Twitter. The results show that TF-IDF consistently outperformed other techniques with best accuracy of 82.33%, 92.31%, and 99.10% for Airline, Movie Review, and News datasets respectively. It is also found that the proposed method performed better than some existing methods.

Ethical Statement

I confirm that this work is original and has not been published elsewhere, nor is it currently under for publication elsewhere.

References

  • Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
  • Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
  • Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
  • Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
  • Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
  • Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
  • Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
  • Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
  • Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
  • Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
  • Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
  • Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
  • Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
  • Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
  • Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
  • Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
  • Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
  • Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
  • Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
  • Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
  • Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
  • Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
  • Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
  • Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
  • Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
  • Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
  • Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
  • Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
  • Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w
Year 2024, Volume: 8 Issue: 4, 741 - 753, 31.10.2024
https://doi.org/10.31127/tuje.1477502

Abstract

References

  • Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
  • Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
  • Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
  • Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
  • Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
  • Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
  • Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
  • Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
  • Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
  • Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
  • Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
  • Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
  • Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
  • Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
  • Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
  • Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
  • Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
  • Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
  • Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
  • Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
  • Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
  • Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
  • Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
  • Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
  • Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
  • Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
  • Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
  • Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
  • Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
  • Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
  • Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w
There are 33 citations in total.

Details

Primary Language English
Subjects Automated Software Engineering, Reinforcement Learning
Journal Section Articles
Authors

Satyendra Sıngh 0009-0009-7907-0063

Krishan Kumar

Brajesh Kumar This is me

Early Pub Date October 28, 2024
Publication Date October 31, 2024
Submission Date May 2, 2024
Acceptance Date June 6, 2024
Published in Issue Year 2024 Volume: 8 Issue: 4

Cite

APA Sıngh, S., Kumar, K., & Kumar, B. (2024). Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering, 8(4), 741-753. https://doi.org/10.31127/tuje.1477502
AMA Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. October 2024;8(4):741-753. doi:10.31127/tuje.1477502
Chicago Sıngh, Satyendra, Krishan Kumar, and Brajesh Kumar. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8, no. 4 (October 2024): 741-53. https://doi.org/10.31127/tuje.1477502.
EndNote Sıngh S, Kumar K, Kumar B (October 1, 2024) Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering 8 4 741–753.
IEEE S. Sıngh, K. Kumar, and B. Kumar, “Analysis of feature extraction techniques for sentiment analysis of tweets”, TUJE, vol. 8, no. 4, pp. 741–753, 2024, doi: 10.31127/tuje.1477502.
ISNAD Sıngh, Satyendra et al. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8/4 (October 2024), 741-753. https://doi.org/10.31127/tuje.1477502.
JAMA Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8:741–753.
MLA Sıngh, Satyendra et al. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering, vol. 8, no. 4, 2024, pp. 741-53, doi:10.31127/tuje.1477502.
Vancouver Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8(4):741-53.
Flag Counter