An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish and English Reviews
Öz
Sentiment analysis is the task of identifying overall attitude of the given text documents by using text analysis and natural language processing techniques. In this study, we present experimental results of sentiment analysis on movie and product reviews datasets that are in Turkish and English languages by using a Support Vector Machine (SVM) classifier. Moreover, we compare different document vector computation techniques and show their effects on the sentiment analysis. We empirically evaluate SVM types, kernel types, weighting schemes such as TF or TF*IDF, TF variances, IDF variances, tokenization methods, feature selection systems, text preprocessing techniques and vector normalizations. We have obtained 91.33% accuracy as the best on our collected Turkish product reviews dataset by using C-SVC SVM type with linear kernel, log normalization TF* probabilistic IDF weighting scheme, L2 vector normalization, Chi-square feature selection, and unigram word tokenization. A very detailed comparison of the document vector computation methods over Turkish and English datasets are also presented.
Anahtar Kelimeler
Kaynakça
- 1. Kaya, M., Fidan, G., Toroslu, I.H., 2012. Sentiment Analysis of Turkish Political News, In Proceedings of the the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, 01:174-180.
- 2. Chang, C.C., Lin, C.J. 2011. LIBSVM: A Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2:3, p. 27.
- 3. Melville, P., Gryc, W., Lawrence, R. D., 2009. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1275-1284.
- 4. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2:1-2, p. 1-135.
- 5. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
- 6. Vinodhini, G., Chandrasekaran, R., 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, 2: 6, p. 282-292.
- 7. Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs Up?: Sentiment Classification Using Machine Learning Techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10:79-86.
- 8. Brown, R. W., 1957. Linguistic Determinism and the Part of Speech, The Journal of Abnormal and Social Psychology, 55:1-5.
Ayrıntılar
Birincil Dil
Türkçe
Konular
-
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
15 Aralık 2016
Gönderilme Tarihi
3 Mayıs 2017
Kabul Tarihi
23 Kasım 2016
Yayımlandığı Sayı
Yıl 2016 Cilt: 31 Sayı: 2
Cited By
Kelime Temsil Yöntemleri ile Kelime Benzerliklerinin İncelenmesi
Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi
https://doi.org/10.21605/cukurovaummfd.609119Determining the orientation in choosing furniture based on social media based on data mining algorithms: Twitter example
Turkish Journal of Forestry | Türkiye Ormancılık Dergisi
https://doi.org/10.18182/tjf.609967Scalable Gender Profiling from Turkish Texts Using Deep Embeddings and Meta-Heuristic Feature Selection
Journal of Theoretical and Applied Electronic Commerce Research
https://doi.org/10.3390/jtaer20040253