Investigation of Word Vector Models Trained with Turkish Hotel Comments by Sentiment Analysis
Year 2020,
Volume: 24 Issue: 2, 455 - 463, 26.08.2020
Hüseyin Ahmetoğlu
,
Resul Daş
Abstract
One of the important research areas of Natural Language Processing and text classification is sentiment analysis. Studies in this area are growing rapidly. This technique manifests itself in all kinds of applications of digital life. There are many techniques developed for sentiment analysis, but recently, word embedding methods of natural language processing have become widely used in sentiment analysis. Word2Vec is one of the most useful word embedding methods that can convert words into meaningful vectors. In order to create word vectors with this method, large word pools are needed. Pre-trained models make it possible to achieve more accurate results in sentiment analysis. In this study, Turkish hotel reviews of approved users were collected by data scraping methods for examination of sentiment analysis. Obtained from the original data by training with Word2Vec word vectors were created. With these vectors, a classification model has been developed with Gated Recurrent Unit which is a kind of Recurrent Neural Networks. The vectors formed by assigning random values to wider corpus-trained word vectors were re-examined with the same deep learning method and the obtained classification successes were compared. According to the results, it was observed that the broader corpus independent of the private area increased the success of classification.
References
- [1] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” in Advances in Neural Information Processing Systems, 2013.
- [2] Z. Hailong, G.Wenyan, and J. Bo, “Machine learning and lexicon based methods for sentiment classification: A survey,” in Proceedings - 11th Web Information System and Application Conference, WISA 2014, pp. 262–265, 2014.
- [3] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, 2014.
- [4] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177, 2004.
- [5] X. Ding, B. Liu, and P. S. Yu, “A holistic lexiconbased approach to opinion mining,” in WSDM’08 - Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 231–239, 2008.
- [6] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-basedmethods for sentiment analysis,” Computational Linguistics, vol. 37, no. 2, pp. 267–307, 2011.
- [7] O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications,” Expert Systems with Applications, vol. 77, pp. 236–246, 2017.
- [8] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, “Learning sentiment-specific word embedding for twitter sentiment classification,” in 52nd Annual Meeting of the Association for Computational 462 H.Ahmetoğlu, R.Da¸s / Duygu Analizi Linguistics, ACL 2014 - Proceedings of the Conference, vol. 1, pp. 1555–1565, 2014.
- [9] A. Severyn and A. Moschitti, “UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification,” pp. 464–469, the 38th International ACM SIGIR Conference, 2015.
- [10] X. Fu, W. Liu, Y. Xu, and L. Cui, “Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis,” Neurocomputing, vol. 241, pp. 18–27, 2017.
- [11] P. Qin, W. Xu, and J. Guo, “An empirical convolutional neural network approach for semantic relation classification,” Neurocomputing, vol. 190, pp. 1–9, 2016.
- [12] Y. Kim, “Convolutional neural networks for sentence classification,” in EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751, 2014.
- [13] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, and H. Veisi, “Sentiment analysis based on improved pre-trained word embeddings,” Expert Systems with Applications, vol. 117, pp. 139–147, 2019.
- [14] Y. Wang, M. Huang, xiaoyan Zhu, and L. Zhao, “Attention-based LSTM for Aspect-level Sentiment Classification,” pp. 606–615, 2016.
- [15] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- [16] Beautiful Soup, “Beautiful soup documentation.” https://www.crummy.com/software/BeautifulSoup/ bs4/doc/, 2019. [Online; accessed 12-October-2019].
- [17] Stokastik, “Understanding word vectors and word2vec.” https://www.stokastik.in/ understanding-word-vectors-and-word2vec/, 2019. [Online; accessed 12-October-2019].
- [18] H. Ahmetoğlu and R. Da¸s, “Derin Ö˘grenme ile büyük veri kumelerinden saldırı türlerinin sınıflandırılması,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–9, Sep. 2019.
- [19] Shervine Amidi-Stanford University, “Recurrent neural networks.” https://stanford.edu/~shervine/l/en/teaching/cs-230/cheatsheet-recurrent-neural-networks, 2019. [Online; accessed 12-October-2019].
- [20] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1724–1734, 2014.
- [21] R. Rehurek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, (Valletta, Malta), pp. 45–50, ELRA, May 2010. http://is.muni.cz/publication/884893/en.
- [22] L. Van Der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008.
- [23] F. Chollet et al., “Keras.” https://keras.io, 2015. [Online;accessed 12-October-2019].
- [24] W. Contributors, “Wikimedia downloads.” https://dumps.wikimedia.org/, 2019. [Online; accessed 12-October-2019].
Türkçe Otel Yorumlarıyla Eğitilen Kelime Vektörü Modellerinin Duygu Analizi ile İncelenmesi
Year 2020,
Volume: 24 Issue: 2, 455 - 463, 26.08.2020
Hüseyin Ahmetoğlu
,
Resul Daş
Abstract
Doğal dil işlemenin(Natural Language Processing-NLP) ve metin sınıflandırmanın önemli araştırma alanlarından biri de duygu analizidir. Bu alanda çalışmalar hızla büyümektedir. Bu teknik dijital yaşamın her çeşit uygulama alanında kendini göstermektedir. Duygu analizi için geliştirilen birçok teknik vardır ancak son zamanlarda doğal dil işlemenin kelime vektör modeli metotları duygu analizinde yaygın olarak kullanılmaya başlamıştır. Word2Vec kelimeleri anlamlı vektörlere dönüştürebilen en kullanışlı kelime vektör modeli yöntemleri arasındadır. Bu yöntem ile kelime vektörleri oluşturabilmek için büyük kelime havuzlarına ihtiyaç vardır. Önceden eğitilmiş modeller duygu analizinde daha doğru sonuçlara ulaşabilmeyi mümkün kılarlar. Bu çalışmada duygu analizinde incelenmek üzere, onaylanmış kullanıcıların Türkçe otel yorumları veri kazıma yöntemleri ile toplanmıştır. Elde edilen bu özgün veriler Word2Vec ile eğitilerek kelime vektörleri oluşturulmuştur. Bu vektörler ile tekrarlanan yapay sinir ağının (Recurrent Neural Networks-RNN) bir çeşidi olan geçitli tekrarlayan birimler (Gated Recurrent Unit-GRU) ile bir sınıflandırma modeli geliştirilmiştir. Daha geniş kelime torbalarıyla eğitilmiş kelime vektörleri ile rastgele değerler atanarak oluşturulan vektörler, aynı derin öğrenme yöntemiyle yeniden incelenmiş ve elde edilen sınıflandırma başarıları karşılaştırılmıştır. Elde edilen sonuçlara göre özel alandan bağımsız, daha geniş kapsamlı kelime torbalarının sınıflandırma başarısını arttırdığı gözlemlenmiştir.
References
- [1] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” in Advances in Neural Information Processing Systems, 2013.
- [2] Z. Hailong, G.Wenyan, and J. Bo, “Machine learning and lexicon based methods for sentiment classification: A survey,” in Proceedings - 11th Web Information System and Application Conference, WISA 2014, pp. 262–265, 2014.
- [3] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, 2014.
- [4] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177, 2004.
- [5] X. Ding, B. Liu, and P. S. Yu, “A holistic lexiconbased approach to opinion mining,” in WSDM’08 - Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 231–239, 2008.
- [6] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-basedmethods for sentiment analysis,” Computational Linguistics, vol. 37, no. 2, pp. 267–307, 2011.
- [7] O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications,” Expert Systems with Applications, vol. 77, pp. 236–246, 2017.
- [8] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, “Learning sentiment-specific word embedding for twitter sentiment classification,” in 52nd Annual Meeting of the Association for Computational 462 H.Ahmetoğlu, R.Da¸s / Duygu Analizi Linguistics, ACL 2014 - Proceedings of the Conference, vol. 1, pp. 1555–1565, 2014.
- [9] A. Severyn and A. Moschitti, “UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification,” pp. 464–469, the 38th International ACM SIGIR Conference, 2015.
- [10] X. Fu, W. Liu, Y. Xu, and L. Cui, “Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis,” Neurocomputing, vol. 241, pp. 18–27, 2017.
- [11] P. Qin, W. Xu, and J. Guo, “An empirical convolutional neural network approach for semantic relation classification,” Neurocomputing, vol. 190, pp. 1–9, 2016.
- [12] Y. Kim, “Convolutional neural networks for sentence classification,” in EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751, 2014.
- [13] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, and H. Veisi, “Sentiment analysis based on improved pre-trained word embeddings,” Expert Systems with Applications, vol. 117, pp. 139–147, 2019.
- [14] Y. Wang, M. Huang, xiaoyan Zhu, and L. Zhao, “Attention-based LSTM for Aspect-level Sentiment Classification,” pp. 606–615, 2016.
- [15] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- [16] Beautiful Soup, “Beautiful soup documentation.” https://www.crummy.com/software/BeautifulSoup/ bs4/doc/, 2019. [Online; accessed 12-October-2019].
- [17] Stokastik, “Understanding word vectors and word2vec.” https://www.stokastik.in/ understanding-word-vectors-and-word2vec/, 2019. [Online; accessed 12-October-2019].
- [18] H. Ahmetoğlu and R. Da¸s, “Derin Ö˘grenme ile büyük veri kumelerinden saldırı türlerinin sınıflandırılması,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–9, Sep. 2019.
- [19] Shervine Amidi-Stanford University, “Recurrent neural networks.” https://stanford.edu/~shervine/l/en/teaching/cs-230/cheatsheet-recurrent-neural-networks, 2019. [Online; accessed 12-October-2019].
- [20] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1724–1734, 2014.
- [21] R. Rehurek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, (Valletta, Malta), pp. 45–50, ELRA, May 2010. http://is.muni.cz/publication/884893/en.
- [22] L. Van Der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008.
- [23] F. Chollet et al., “Keras.” https://keras.io, 2015. [Online;accessed 12-October-2019].
- [24] W. Contributors, “Wikimedia downloads.” https://dumps.wikimedia.org/, 2019. [Online; accessed 12-October-2019].