Sentiment analysis is an active
research area since early 2000s as a field of text classification. Most of the
studies in this field focus on the analysis using the text in English language,
where the Turkish and the other languages have fallen behind. The purpose of
this research is to contribute to the text analysis in Turkish language using
the contents that we access through web sites. In particular, we deduce the
sentiment behind noisy product reviews and comments in a highly popular
commercial web page. In this context, we generate a unique dataset that includes
9100 product review samples for training our classification model. There are
different word representation methods that are utilized in sentiment analysis,
such as bag-of-words and n-gram models. In this work, we generated our word
models using the word2vec algorithm. In this model, each word in the vocabulary
is represented as a vector of 300 dimensions. We utilize 70% of our dataset in
the training of a Random Forest Model and make binary classification of
sentiments as being positive or negative, utilizing the ratings of the user for
the product as classification labels. In the highly noisy and unfiltered
comments, we achieve an accuracy of 84.23%.
Primary Language | English |
---|---|
Journal Section | Review Articles |
Authors | |
Publication Date | December 21, 2017 |
Submission Date | November 11, 2017 |
Acceptance Date | December 21, 2017 |
Published in Issue | Year 2017 Volume: 59 Issue: 2 |
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering
This work is licensed under a Creative Commons Attribution 4.0 International License.