Research Article
BibTex RIS Cite

An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish and English Reviews

Year 2016, Volume: 31 Issue: 2, 464 - 482, 15.12.2016
https://doi.org/10.21605/cukurovaummfd.310341

Abstract

Sentiment analysis is the task of identifying overall attitude of the given text documents by using text analysis and natural language processing techniques. In this study, we present experimental results of sentiment analysis on movie and product reviews datasets that are in Turkish and English languages by using a Support Vector Machine (SVM) classifier. Moreover, we compare different document vector computation techniques and show their effects on the sentiment analysis. We empirically evaluate SVM types, kernel types, weighting schemes such as TF or TF*IDF, TF variances, IDF variances, tokenization methods, feature selection systems, text preprocessing techniques and vector normalizations. We have obtained 91.33% accuracy as the best on our collected Turkish product reviews dataset by using C-SVC SVM type with linear kernel, log normalization TF* probabilistic IDF weighting scheme, L2 vector normalization, Chi-square feature selection, and unigram word tokenization. A very detailed comparison of the document vector computation methods over Turkish and English datasets are also presented.

References

  • 1. Kaya, M., Fidan, G., Toroslu, I.H., 2012. Sentiment Analysis of Turkish Political News, In Proceedings of the the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, 01:174-180.
  • 2. Chang, C.C., Lin, C.J. 2011. LIBSVM: A Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2:3, p. 27.
  • 3. Melville, P., Gryc, W., Lawrence, R. D., 2009. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1275-1284.
  • 4. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2:1-2, p. 1-135.
  • 5. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
  • 6. Vinodhini, G., Chandrasekaran, R., 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, 2: 6, p. 282-292.
  • 7. Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs Up?: Sentiment Classification Using Machine Learning Techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10:79-86.
  • 8. Brown, R. W., 1957. Linguistic Determinism and the Part of Speech, The Journal of Abnormal and Social Psychology, 55:1-5.
  • 9. Pang, B., Lee, L., 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 271-278.
  • 10. Hu, M., Liu, B., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168-177.
  • 11. Dave, K., Lawrence, S., Pennock, D. M., 2003. Mining the Peanut Gallery, Opinion Extraction and Semantic Classification of Product Reviews, Proceedings of the 12th International Conference on World Wide Web, 519-528.
  • 12. Li, G., Liu, F., 2012. Application of a Clustering Method on Sentiment Analysis, Journal of Information Science, 38:2, p. 127-139.
  • 13. Wilson, T., Wiebe, J., Hoffmann, P., 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 347-354.
  • 14. Wilson, T., Wiebe, J., Hoffmann, P., 2009. Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis, Computational Linguistics, 35:3, p. 399-433.
  • 15. Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 440-447.
  • 16. Prabowo, R., Thelwall, M., 2009. Sentiment Analysis: A Combined Approach, Journal of Informetrics, 3:2, p. 143-157.
  • 17. Martineau, J., Finin, T., 2009. Delta Tfidf: An Improved Feature Space for Sentiment Analysis, Proceedings of the Third International Icwsm Conference, 258-261.
  • 18. O’keefe, T., Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis, Proceedings of the 14th Australasian Document Computing Symposium, Sydney, 67-74.
  • 19. Paltoglou, G., Thelwall, M., 2010. A Study of Information Retrieval Weighting Schemes for Sentiment Analysis, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 1386-1395.
  • 20. Jang, H., Shin, H., 2010. Language-Specific Sentiment Analysis in Morphologically Rich Languages, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 498-506.
  • 21. Arora, S., Mayfield, E., Penstein-Rosé, C., Nyberg, E., 2010. Sentiment Classification Using Automatically Extracted Subgraph Features, In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 131-139.
  • 22. Pang, B., Lee, L., 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization With Respect To Rating Scales, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 3:1, p. 115-124.
  • 23. Yessenalina, A., Choi, Y., Cardie, C., 2010. Automatically Generating Annotator Rationales to Improve Sentiment Classification, In Proceedings of the ACL 2010 Conference Short Papers, 336-341.
  • 24. Eroğul, ., 2009. Sentiment Analysis in Turkish, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
  • 25. Boynukalin, Z., 2012. Emotion Analysis of Turkish Texts by Using Machine Learning Methods, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
  • 26. Seker, S.E., Al-Naami, K., 2013. Sentimental Analysis on Turkish Blogs via Ensemble Classifier, In Proc. The 2013 International Conference on Data Mining, 10-16.
  • 27. Aytekin, Ç., 2013. An Opinion Mining Task in Turkish Language: A Model for Assigning Opinions in Turkish Blogs to the Polarities, Journalism and Mass Communication, 3:3, p. 179-198.
  • 28. Demirtas, E., Pechenizkiy, M., 2013. Cross-Lingual Polarity Detection with Machine Translation, Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 9.
  • 29. Akba, F., çan, A., Sezer, E., Sever, H., 2014. Assessment of Feature Selection Metrics for Sentiment Analyses: Turkish Movie Reviews, 8th European Conference on Data Mining 2014, 180-184.
  • 30. Yıldırım, E., Çetin, F.S., Eryiğit, G., Temel, T., 2015. The Impact of NLP on Turkish Sentiment Analysis, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7:1.
  • 31. Dehkharghani, R., Saygin, Y., Yanikoglu, B., Oflazer, K., 2015. Sentiturknet: A Turkish Polarity Lexicon for Sentiment Analysis, Language Resources and Evaluation, 1-19.
  • 32. Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, International Joint Conference on Artificial Intelligence, 14:2, p. 1137-1143.
  • 33. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M., 2010. Short Text Classification in Twitter to Improve Information Filtering, In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 841-842.
  • 34. Parlar, T., 2016. Feature Selection for Sentiment Analysis in Turkish Texts, Ph.D. Dissertation, Çukurova niversity, Institute of Natural and Applied Sciences, the Faculty of Engineering and Architecture Electrical & Electronics Engineering.
  • 35. Cavnar, W.B., Trenkle, J.M., 1994. N-Gram-Based Text Categorization, In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 161-175.
  • 36. Hsu, C.W., Chang, C.C., Lin, C.J., 2010. A Practical Guide to Support Vector Classification, Department of Computer Science National, Taiwan University, Taipei 106, Taiwan, 1-16.
  • 37. Lan, M., Tan, C.L., Su, J., Lu, Y., 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:4, p. 721-735.
  • 38. https://Github.Com/Wolfgarbe/Symspell. Accessed, 10.06.2016.
  • 39. https://Code.Google.Com/Archive/P/Tr-Spell/. Accessed, 10.06.2016.
  • 40. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, 11:1, p. 10-18.

Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme

Year 2016, Volume: 31 Issue: 2, 464 - 482, 15.12.2016
https://doi.org/10.21605/cukurovaummfd.310341

Abstract

Duygu analizi, verilen bir metin belgesinin genel yargısını, metin analizi ve doğal dil işleme teknikleri kullanarak belirleme işlemidir. Bu çalışmada, İngilizce ve Türkçe dillerinde yazılmış film ve ürün yorumlarının, Destek Vektör Makineleri (DVM) sınıflayıcısı kullanarak yapılan, duygu analizi deney sonuçları yer almaktadır. Bunun yanında, farklı doküman vektör hesaplama yöntemleri karşılaştırılmakta ve bu tekniklerin duygu analizi üzerindeki etkileri gösterilmektedir. DVM türleri, kernel çeşitleri, TF veya TF*IDF gibi ağırlıklandırma yöntemleri, TF türleri, IDF türleri, öznitelik oluşturma yöntemleri, öznitelik seçme sistemleri, metin önişleme teknikleri ve vektör normalizasyon teknikleri deneysel olarak analiz edilmektedir. Oluşturduğumuz Türkçe ürün yorumları veri kümesi üzerinde, doğrusal kernel ile kullanılan C-SVC DVM türü, log normalleştirme TF* olasılıklı IDF ağırlıklandırma yöntemi, L2 vektör normalizasyonu, Ki-kare öznitelik seçme ve tekli kelime öznitelikleri kullanılarak %91.33 doğruluk ile en iyi sonuç elde edilmiştir. Ayrıca doküman vektörü hesaplama yöntemlerinin Türkçe ve İngilizce veri kümeleri üzerindeki detaylı karşılaştırmaları da çalışmada yer almaktadır.

References

  • 1. Kaya, M., Fidan, G., Toroslu, I.H., 2012. Sentiment Analysis of Turkish Political News, In Proceedings of the the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, 01:174-180.
  • 2. Chang, C.C., Lin, C.J. 2011. LIBSVM: A Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2:3, p. 27.
  • 3. Melville, P., Gryc, W., Lawrence, R. D., 2009. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1275-1284.
  • 4. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2:1-2, p. 1-135.
  • 5. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
  • 6. Vinodhini, G., Chandrasekaran, R., 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, 2: 6, p. 282-292.
  • 7. Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs Up?: Sentiment Classification Using Machine Learning Techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10:79-86.
  • 8. Brown, R. W., 1957. Linguistic Determinism and the Part of Speech, The Journal of Abnormal and Social Psychology, 55:1-5.
  • 9. Pang, B., Lee, L., 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 271-278.
  • 10. Hu, M., Liu, B., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168-177.
  • 11. Dave, K., Lawrence, S., Pennock, D. M., 2003. Mining the Peanut Gallery, Opinion Extraction and Semantic Classification of Product Reviews, Proceedings of the 12th International Conference on World Wide Web, 519-528.
  • 12. Li, G., Liu, F., 2012. Application of a Clustering Method on Sentiment Analysis, Journal of Information Science, 38:2, p. 127-139.
  • 13. Wilson, T., Wiebe, J., Hoffmann, P., 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 347-354.
  • 14. Wilson, T., Wiebe, J., Hoffmann, P., 2009. Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis, Computational Linguistics, 35:3, p. 399-433.
  • 15. Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 440-447.
  • 16. Prabowo, R., Thelwall, M., 2009. Sentiment Analysis: A Combined Approach, Journal of Informetrics, 3:2, p. 143-157.
  • 17. Martineau, J., Finin, T., 2009. Delta Tfidf: An Improved Feature Space for Sentiment Analysis, Proceedings of the Third International Icwsm Conference, 258-261.
  • 18. O’keefe, T., Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis, Proceedings of the 14th Australasian Document Computing Symposium, Sydney, 67-74.
  • 19. Paltoglou, G., Thelwall, M., 2010. A Study of Information Retrieval Weighting Schemes for Sentiment Analysis, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 1386-1395.
  • 20. Jang, H., Shin, H., 2010. Language-Specific Sentiment Analysis in Morphologically Rich Languages, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 498-506.
  • 21. Arora, S., Mayfield, E., Penstein-Rosé, C., Nyberg, E., 2010. Sentiment Classification Using Automatically Extracted Subgraph Features, In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 131-139.
  • 22. Pang, B., Lee, L., 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization With Respect To Rating Scales, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 3:1, p. 115-124.
  • 23. Yessenalina, A., Choi, Y., Cardie, C., 2010. Automatically Generating Annotator Rationales to Improve Sentiment Classification, In Proceedings of the ACL 2010 Conference Short Papers, 336-341.
  • 24. Eroğul, ., 2009. Sentiment Analysis in Turkish, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
  • 25. Boynukalin, Z., 2012. Emotion Analysis of Turkish Texts by Using Machine Learning Methods, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
  • 26. Seker, S.E., Al-Naami, K., 2013. Sentimental Analysis on Turkish Blogs via Ensemble Classifier, In Proc. The 2013 International Conference on Data Mining, 10-16.
  • 27. Aytekin, Ç., 2013. An Opinion Mining Task in Turkish Language: A Model for Assigning Opinions in Turkish Blogs to the Polarities, Journalism and Mass Communication, 3:3, p. 179-198.
  • 28. Demirtas, E., Pechenizkiy, M., 2013. Cross-Lingual Polarity Detection with Machine Translation, Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 9.
  • 29. Akba, F., çan, A., Sezer, E., Sever, H., 2014. Assessment of Feature Selection Metrics for Sentiment Analyses: Turkish Movie Reviews, 8th European Conference on Data Mining 2014, 180-184.
  • 30. Yıldırım, E., Çetin, F.S., Eryiğit, G., Temel, T., 2015. The Impact of NLP on Turkish Sentiment Analysis, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7:1.
  • 31. Dehkharghani, R., Saygin, Y., Yanikoglu, B., Oflazer, K., 2015. Sentiturknet: A Turkish Polarity Lexicon for Sentiment Analysis, Language Resources and Evaluation, 1-19.
  • 32. Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, International Joint Conference on Artificial Intelligence, 14:2, p. 1137-1143.
  • 33. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M., 2010. Short Text Classification in Twitter to Improve Information Filtering, In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 841-842.
  • 34. Parlar, T., 2016. Feature Selection for Sentiment Analysis in Turkish Texts, Ph.D. Dissertation, Çukurova niversity, Institute of Natural and Applied Sciences, the Faculty of Engineering and Architecture Electrical & Electronics Engineering.
  • 35. Cavnar, W.B., Trenkle, J.M., 1994. N-Gram-Based Text Categorization, In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 161-175.
  • 36. Hsu, C.W., Chang, C.C., Lin, C.J., 2010. A Practical Guide to Support Vector Classification, Department of Computer Science National, Taiwan University, Taipei 106, Taiwan, 1-16.
  • 37. Lan, M., Tan, C.L., Su, J., Lu, Y., 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:4, p. 721-735.
  • 38. https://Github.Com/Wolfgarbe/Symspell. Accessed, 10.06.2016.
  • 39. https://Code.Google.Com/Archive/P/Tr-Spell/. Accessed, 10.06.2016.
  • 40. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, 11:1, p. 10-18.
There are 40 citations in total.

Details

Journal Section Articles
Authors

Furkan Gözükara

Selma Ayşe Özel This is me

Publication Date December 15, 2016
Published in Issue Year 2016 Volume: 31 Issue: 2

Cite

APA Gözükara, F., & Özel, S. A. (2016). Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 31(2), 464-482. https://doi.org/10.21605/cukurovaummfd.310341
AMA Gözükara F, Özel SA. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. December 2016;31(2):464-482. doi:10.21605/cukurovaummfd.310341
Chicago Gözükara, Furkan, and Selma Ayşe Özel. “Türkçe Ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31, no. 2 (December 2016): 464-82. https://doi.org/10.21605/cukurovaummfd.310341.
EndNote Gözükara F, Özel SA (December 1, 2016) Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31 2 464–482.
IEEE F. Gözükara and S. A. Özel, “Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”, cukurovaummfd, vol. 31, no. 2, pp. 464–482, 2016, doi: 10.21605/cukurovaummfd.310341.
ISNAD Gözükara, Furkan - Özel, Selma Ayşe. “Türkçe Ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31/2 (December 2016), 464-482. https://doi.org/10.21605/cukurovaummfd.310341.
JAMA Gözükara F, Özel SA. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. 2016;31:464–482.
MLA Gözükara, Furkan and Selma Ayşe Özel. “Türkçe Ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, vol. 31, no. 2, 2016, pp. 464-82, doi:10.21605/cukurovaummfd.310341.
Vancouver Gözükara F, Özel SA. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. 2016;31(2):464-82.