Analysis of feature extraction techniques for sentiment analysis of tweets

Satyendra Sıngh; Krishan Kumar; Brajesh Kumar

doi:10.31127/tuje.1477502

Research Article

Analysis of feature extraction techniques for sentiment analysis of tweets

Year 2024, Volume: 8 Issue: 4, 741 - 753, 31.10.2024

Satyendra Sıngh , Krishan Kumar , Brajesh Kumar

https://doi.org/10.31127/tuje.1477502

Cited By: 2

Abstract

Over the past few years, sentiment analysis has moved from social networking services like LinkedIn, Facebook, YouTube, Twitter, and online product-based reviews to determine public opinion or emotion using social media textual contents. The methodology includes data selection, text pre-processing, feature extraction, classification model, and result analysis. Text pre-processing is an important stage in structuring data for improved performance of our methodology. The feature extraction technique (FET) is a crucial step in sentiment analysis as it is difficult to obtain effective and useful information from highly unstructured social media data. A number of feature extraction techniques are available to extract useful features. In this work, popular feature extraction techniques including bag of words (BOW), term frequency and inverse document frequency (TF-IDF), and Word2vec are compared and analyzed for the sentiment analysis of social media contents. A method is proposed for processing text data from social media networks for sentiment analysis that uses support vector machine as a classifier. The experiments are carried on three datasets of different context namely US Airline, Movie Review, and News from Twitter. The results show that TF-IDF consistently outperformed other techniques with best accuracy of 82.33%, 92.31%, and 99.10% for Airline, Movie Review, and News datasets respectively. It is also found that the proposed method performed better than some existing methods.

Keywords

Sentiment Analysis, Twitter, BOW, TF-IDF, Word2Vec, SVM

Ethical Statement

I confirm that this work is original and has not been published elsewhere, nor is it currently under for publication elsewhere.

References

Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w

Year 2024, Volume: 8 Issue: 4, 741 - 753, 31.10.2024

Satyendra Sıngh , Krishan Kumar , Brajesh Kumar

https://doi.org/10.31127/tuje.1477502

Cited By: 2

Abstract

References

Lauriola, I., Lavelli, A., & Aiolli, F. (2021). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443–456. https://doi.org/10.1016/j.neucom.2021.05.103
Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Seuk Wai, P., Wai Chung, Y., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807–37827. https://doi.org/10.1109/access.2018.2851311
Naseem, U., Razzak, I., & Eklund, P. W. (2020). A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10082-6
Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2018). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and Applications, 31(10), 6625–6642. https://doi.org/10.1007/s00521-018-3477-2
Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078. https://doi.org/10.3390/e21111078
Li, J., Zhang, H., & Wei, Z. (2020). The weighted Word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/ACCESS.2020.301384
Umer, M., Ashraf, I., Mehmood, A., Kumari, S., Ullah, S., & Sang Choi, G. (2020). Sentiment analysis of tweets using a unified convolutional neural network‐long short‐term memory network model. Computational Intelligence. https://doi.org/10.1111/coin.12415
Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656. https://doi.org/10.1016/j.ipm.2021.102656
Gaye, B., Zhang, D., & Wulamu, A. (2021). A tweet sentiment classification approach using a hybrid stacked ensemble technique. Information, 12(9), 374. https://doi.org/10.3390/info12090374
Kamyab, M., Liu, G., & Adjeisah, M. (2021). Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis. Applied Sciences, 11(23), 11255. https://doi.org/10.3390/app112311255
Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10(22), 2810. https://doi.org/10.3390/electronics10222810
Subba, B., & Kumari, S. (2021). A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings. Computational Intelligence. https://doi.org/10.1111/coin.12478
Tabinda Kokab, S., Asghar, S., & Naz, S. (2022). Transformer-based deep learning models for the sentiment analysis of social media data. Array, 100157. https://doi.org/10.1016/j.array.2022.100157
Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. https://doi.org/10.1016/j.cosrev.2021.100413
Alamoudi, E. S., & Alghamdi, N. S. (2021). Sentiment classification and aspect-based sentiment analysis on Yelp reviews using deep learning and word embeddings. Journal of Decision Systems, 1–23. https://doi.org/10.1080/12460125.2020.1864106
Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132, 937–946. https://doi.org/10.1016/j.procs.2018.05.109
Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684. https://doi.org/10.1016/j.asoc.2020.106684
Pathak, A. R., Pandey, M., & Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 107440. https://doi.org/10.1016/j.asoc.2021.107440
Alhumoud, S. O., & Al Wazrah, A. A. (2021). Arabic sentiment analysis using recurrent neural networks: A review. Artificial Intelligence Review. https://doi.org/10.1007/s10462-021-09989-9
Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
Alorini, G., Rawat, D. B., & Alorini, D. (2021). LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. IEEE Xplore. https://doi.org/10.1109/ICC42927.2021.9500897
Ji, S., Satish, N., Li, S., & Dubey, P. K. (2019). Parallelizing Word2Vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2090–2100. https://doi.org/10.1109/tpds.2019.2904058
Jang, B., Kim, I., & Kim, J. W. (2019). Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE, 14(8), e0220976. https://doi.org/10.1371/journal.pone.0220976
Borg, A., Boldt, M., Rosander, O., & Ahlstrand, J. (2020). E-mail classification with machine learning and word embeddings for improved customer support. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05058-4
Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126. https://doi.org/10.1016/j.eswa.2016.03.028
Elhag, M., Idris, N., Mahmud, R., Qazi, A., Ibrahim, Jaafar Zubairu Maitama, Naseem, U., Shah Alam Khan, & Yang, S. (2021). A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection. Sustainability, 13(18), 10018. https://doi.org/10.3390/su131810018
Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: A Twitter case study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-7586-4
Chen, J., Chen, Y., He, Y., Xu, Y., Zhao, S., & Zhang, Y. (2021). A classified feature representation three-way decision model for sentiment analysis. Applied Intelligence, 52(7), 7995–8007. https://doi.org/10.1007/s10489-021-02809-1
Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246. https://doi.org/10.1016/j.eswa.2017.02.002
Alsayat, A. (2021). Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-021-06227-w

There are 33 citations in total.

Details

Primary Language	English
Subjects	Automated Software Engineering, Reinforcement Learning
Journal Section	Articles
Authors	Satyendra Sıngh 0009-0009-7907-0063 Krishan Kumar Brajesh Kumar This is me
Early Pub Date	October 28, 2024
Publication Date	October 31, 2024
Submission Date	May 2, 2024
Acceptance Date	June 6, 2024
Published in Issue	Year 2024 Volume: 8 Issue: 4

Cite

APA	Sıngh, S., Kumar, K., & Kumar, B. (2024). Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering, 8(4), 741-753. https://doi.org/10.31127/tuje.1477502
AMA	Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. October 2024;8(4):741-753. doi:10.31127/tuje.1477502
Chicago	Sıngh, Satyendra, Krishan Kumar, and Brajesh Kumar. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8, no. 4 (October 2024): 741-53. https://doi.org/10.31127/tuje.1477502.
EndNote	Sıngh S, Kumar K, Kumar B (October 1, 2024) Analysis of feature extraction techniques for sentiment analysis of tweets. Turkish Journal of Engineering 8 4 741–753.
IEEE	S. Sıngh, K. Kumar, and B. Kumar, “Analysis of feature extraction techniques for sentiment analysis of tweets”, TUJE, vol. 8, no. 4, pp. 741–753, 2024, doi: 10.31127/tuje.1477502.
ISNAD	Sıngh, Satyendra et al. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering 8/4 (October 2024), 741-753. https://doi.org/10.31127/tuje.1477502.
JAMA	Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8:741–753.
MLA	Sıngh, Satyendra et al. “Analysis of Feature Extraction Techniques for Sentiment Analysis of Tweets”. Turkish Journal of Engineering, vol. 8, no. 4, 2024, pp. 741-53, doi:10.31127/tuje.1477502.
Vancouver	Sıngh S, Kumar K, Kumar B. Analysis of feature extraction techniques for sentiment analysis of tweets. TUJE. 2024;8(4):741-53.

Cited By

A novel hyperparameter tuning method for enhanced intrusion detection in network security

Turkish Journal of Engineering

https://doi.org/10.31127/tuje.1624366

Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

Turkish Journal of Engineering

https://doi.org/10.31127/tuje.1507442

Article Files

Full Text