Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme

Furkan Gözükara; Selma Ayşe Özel

doi:10.21605/cukurovaummfd.310341

EN TR

An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish and English Reviews

Öz

Sentiment analysis is the task of identifying overall attitude of the given text documents by using text analysis and natural language processing techniques. In this study, we present experimental results of sentiment analysis on movie and product reviews datasets that are in Turkish and English languages by using a Support Vector Machine (SVM) classifier. Moreover, we compare different document vector computation techniques and show their effects on the sentiment analysis. We empirically evaluate SVM types, kernel types, weighting schemes such as TF or TF*IDF, TF variances, IDF variances, tokenization methods, feature selection systems, text preprocessing techniques and vector normalizations. We have obtained 91.33% accuracy as the best on our collected Turkish product reviews dataset by using C-SVC SVM type with linear kernel, log normalization TF* probabilistic IDF weighting scheme, L2 vector normalization, Chi-square feature selection, and unigram word tokenization. A very detailed comparison of the document vector computation methods over Turkish and English datasets are also presented.

Anahtar Kelimeler

Sentiment analysis,Classification,Data mining,Product reviews,Support vector machines

Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme

Öz

Duygu analizi, verilen bir metin belgesinin genel yargısını, metin analizi ve doğal dil işleme teknikleri kullanarak belirleme işlemidir. Bu çalışmada, İngilizce ve Türkçe dillerinde yazılmış film ve ürün yorumlarının, Destek Vektör Makineleri (DVM) sınıflayıcısı kullanarak yapılan, duygu analizi deney sonuçları yer almaktadır. Bunun yanında, farklı doküman vektör hesaplama yöntemleri karşılaştırılmakta ve bu tekniklerin duygu analizi üzerindeki etkileri gösterilmektedir. DVM türleri, kernel çeşitleri, TF veya TF*IDF gibi ağırlıklandırma yöntemleri, TF türleri, IDF türleri, öznitelik oluşturma yöntemleri, öznitelik seçme sistemleri, metin önişleme teknikleri ve vektör normalizasyon teknikleri deneysel olarak analiz edilmektedir. Oluşturduğumuz Türkçe ürün yorumları veri kümesi üzerinde, doğrusal kernel ile kullanılan C-SVC DVM türü, log normalleştirme TF* olasılıklı IDF ağırlıklandırma yöntemi, L2 vektör normalizasyonu, Ki-kare öznitelik seçme ve tekli kelime öznitelikleri kullanılarak %91.33 doğruluk ile en iyi sonuç elde edilmiştir. Ayrıca doküman vektörü hesaplama yöntemlerinin Türkçe ve İngilizce veri kümeleri üzerindeki detaylı karşılaştırmaları da çalışmada yer almaktadır.

Anahtar Kelimeler

Duygu analizi,Sınıflandırma,Veri madenciliği,Ürün yorumları,Destek vektör makineleri

Kaynakça

1. Kaya, M., Fidan, G., Toroslu, I.H., 2012. Sentiment Analysis of Turkish Political News, In Proceedings of the the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, 01:174-180.
2. Chang, C.C., Lin, C.J. 2011. LIBSVM: A Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2:3, p. 27.
3. Melville, P., Gryc, W., Lawrence, R. D., 2009. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1275-1284.
4. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2:1-2, p. 1-135.
5. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
6. Vinodhini, G., Chandrasekaran, R., 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, 2: 6, p. 282-292.
7. Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs Up?: Sentiment Classification Using Machine Learning Techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10:79-86.
8. Brown, R. W., 1957. Linguistic Determinism and the Part of Speech, The Journal of Abnormal and Social Psychology, 55:1-5.

9. Pang, B., Lee, L., 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 271-278.
10. Hu, M., Liu, B., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168-177.
11. Dave, K., Lawrence, S., Pennock, D. M., 2003. Mining the Peanut Gallery, Opinion Extraction and Semantic Classification of Product Reviews, Proceedings of the 12th International Conference on World Wide Web, 519-528.
12. Li, G., Liu, F., 2012. Application of a Clustering Method on Sentiment Analysis, Journal of Information Science, 38:2, p. 127-139.
13. Wilson, T., Wiebe, J., Hoffmann, P., 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 347-354.
14. Wilson, T., Wiebe, J., Hoffmann, P., 2009. Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis, Computational Linguistics, 35:3, p. 399-433.
15. Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 440-447.
16. Prabowo, R., Thelwall, M., 2009. Sentiment Analysis: A Combined Approach, Journal of Informetrics, 3:2, p. 143-157.
17. Martineau, J., Finin, T., 2009. Delta Tfidf: An Improved Feature Space for Sentiment Analysis, Proceedings of the Third International Icwsm Conference, 258-261.
18. O’keefe, T., Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis, Proceedings of the 14th Australasian Document Computing Symposium, Sydney, 67-74.
19. Paltoglou, G., Thelwall, M., 2010. A Study of Information Retrieval Weighting Schemes for Sentiment Analysis, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 1386-1395.
20. Jang, H., Shin, H., 2010. Language-Specific Sentiment Analysis in Morphologically Rich Languages, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 498-506.
21. Arora, S., Mayfield, E., Penstein-Rosé, C., Nyberg, E., 2010. Sentiment Classification Using Automatically Extracted Subgraph Features, In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 131-139.
22. Pang, B., Lee, L., 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization With Respect To Rating Scales, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 3:1, p. 115-124.
23. Yessenalina, A., Choi, Y., Cardie, C., 2010. Automatically Generating Annotator Rationales to Improve Sentiment Classification, In Proceedings of the ACL 2010 Conference Short Papers, 336-341.
24. Eroğul, ., 2009. Sentiment Analysis in Turkish, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
25. Boynukalin, Z., 2012. Emotion Analysis of Turkish Texts by Using Machine Learning Methods, M.S. Thesis. The Graduate School of Natural and Applied Sciences of Middle East Technical University.
26. Seker, S.E., Al-Naami, K., 2013. Sentimental Analysis on Turkish Blogs via Ensemble Classifier, In Proc. The 2013 International Conference on Data Mining, 10-16.
27. Aytekin, Ç., 2013. An Opinion Mining Task in Turkish Language: A Model for Assigning Opinions in Turkish Blogs to the Polarities, Journalism and Mass Communication, 3:3, p. 179-198.
28. Demirtas, E., Pechenizkiy, M., 2013. Cross-Lingual Polarity Detection with Machine Translation, Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 9.
29. Akba, F., çan, A., Sezer, E., Sever, H., 2014. Assessment of Feature Selection Metrics for Sentiment Analyses: Turkish Movie Reviews, 8th European Conference on Data Mining 2014, 180-184.
30. Yıldırım, E., Çetin, F.S., Eryiğit, G., Temel, T., 2015. The Impact of NLP on Turkish Sentiment Analysis, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7:1.
31. Dehkharghani, R., Saygin, Y., Yanikoglu, B., Oflazer, K., 2015. Sentiturknet: A Turkish Polarity Lexicon for Sentiment Analysis, Language Resources and Evaluation, 1-19.
32. Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, International Joint Conference on Artificial Intelligence, 14:2, p. 1137-1143.
33. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M., 2010. Short Text Classification in Twitter to Improve Information Filtering, In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 841-842.
34. Parlar, T., 2016. Feature Selection for Sentiment Analysis in Turkish Texts, Ph.D. Dissertation, Çukurova niversity, Institute of Natural and Applied Sciences, the Faculty of Engineering and Architecture Electrical & Electronics Engineering.
35. Cavnar, W.B., Trenkle, J.M., 1994. N-Gram-Based Text Categorization, In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 161-175.
36. Hsu, C.W., Chang, C.C., Lin, C.J., 2010. A Practical Guide to Support Vector Classification, Department of Computer Science National, Taiwan University, Taipei 106, Taiwan, 1-16.
37. Lan, M., Tan, C.L., Su, J., Lu, Y., 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:4, p. 721-735.
38. https://Github.Com/Wolfgarbe/Symspell. Accessed, 10.06.2016.
39. https://Code.Google.Com/Archive/P/Tr-Spell/. Accessed, 10.06.2016.
40. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, 11:1, p. 10-18.

Ayrıntılar

Birincil Dil

Türkçe

Konular

-

Bölüm

Araştırma Makalesi

Yazarlar

Furkan Gözükara
Türkiye

Selma Ayşe Özel Bu kişi benim
Türkiye

Yayımlanma Tarihi

15 Aralık 2016

Gönderilme Tarihi

3 Mayıs 2017

Kabul Tarihi

23 Kasım 2016

Yayımlandığı Sayı

Yıl 2016 Cilt: 31 Sayı: 2

DOI

https://doi.org/10.21605/cukurovaummfd.310341

IZ

https://izlik.org/JA55ZR83PE

Kaynak Göster

RIS / Bibtex

APA

Gözükara, F., & Özel, S. A. (2016). Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 31(2), 464-482. https://doi.org/10.21605/cukurovaummfd.310341

AMA

1.Gözükara F, Özel SA. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. 2016;31(2):464-482. doi:10.21605/cukurovaummfd.310341

Chicago

Gözükara, Furkan, ve Selma Ayşe Özel. 2016. “Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31 (2): 464-82. https://doi.org/10.21605/cukurovaummfd.310341.

EndNote

Gözükara F, Özel SA (01 Aralık 2016) Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31 2 464–482.

IEEE

[1]F. Gözükara ve S. A. Özel, “Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”, cukurovaummfd, c. 31, sy 2, ss. 464–482, Ara. 2016, doi: 10.21605/cukurovaummfd.310341.

ISNAD

Gözükara, Furkan - Özel, Selma Ayşe. “Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 31/2 (01 Aralık 2016): 464-482. https://doi.org/10.21605/cukurovaummfd.310341.

JAMA

1.Gözükara F, Özel SA. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. 2016;31:464–482.

MLA

Gözükara, Furkan, ve Selma Ayşe Özel. “Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme”. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, c. 31, sy 2, Aralık 2016, ss. 464-82, doi:10.21605/cukurovaummfd.310341.

Vancouver

1.Furkan Gözükara, Selma Ayşe Özel. Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme. cukurovaummfd. 01 Aralık 2016;31(2):464-82. doi:10.21605/cukurovaummfd.310341

Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme

An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish and English Reviews

Öz

Anahtar Kelimeler

Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Birincil Dil

Konular

Bölüm

Yazarlar

Yayımlanma Tarihi

Gönderilme Tarihi

Kabul Tarihi

Yayımlandığı Sayı

DOI

IZ

Kaynak Göster

Cited By

Kelime Temsil Yöntemleri ile Kelime Benzerliklerinin İncelenmesi

Determining the orientation in choosing furniture based on social media based on data mining algorithms: Twitter example

Scalable Gender Profiling from Turkish Texts Using Deep Embeddings and Meta-Heuristic Feature Selection