Research Article
BibTex RIS Cite

SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS

Year 2017, Volume: 59 Issue: 2, 69 - 79, 21.12.2017

Abstract

Sentiment analysis is an active
research area since early 2000s as a field of text classification. Most of the
studies in this field focus on the analysis using the text in English language,
where the Turkish and the other languages have fallen behind. The purpose of
this research is to contribute to the text analysis in Turkish language using
the contents that we access through web sites. In particular, we deduce the
sentiment behind noisy product reviews and comments in a highly popular
commercial web page. In this context, we generate a unique dataset that includes
9100 product review samples for training our classification model. There are
different word representation methods that are utilized in sentiment analysis,
such as bag-of-words and n-gram models. In this work, we generated our word
models using the word2vec algorithm. In this model, each word in the vocabulary
is represented as a vector of 300 dimensions. We utilize 70% of our dataset in
the training of a Random Forest Model and make binary classification of
sentiments as being positive or negative, utilizing the ratings of the user for
the product as classification labels. In the highly noisy and unfiltered
comments, we achieve an accuracy of 84.23%.

References

  • Wiebe, J. “Learning Subjective Adjectives from Corpora”, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, July 30- August 03 (2000): 735-740.
  • Das, S.R. and Chen, M. Y. 2001. “Yahoo! for Amazon: Extracting Market Sentiment from Stock Message Boards”. In Proceedings of the 8th Asia Pacific Finance Association Annual Conference, (2001).
  • Morinaga, S., Yamanishi, K., Tateishi, K. and Fukushima, T. “Mining Product Reputations on the Web”. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2002).
  • Tong, R. M. “An Operational System for Detecting and Tracking Opinions in On-Line Discussion”. In Proceedings of SIGIR Workshop on Operational Text Classification, (2001).
  • Pang, B., Lee, L. and Vaithyanathan. S. “Thumbs up? Sentiment Classification Using Machine Learning Techniques”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), (2002): 79–86.
  • Turney, P. 2002, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (2002): 417–424.
  • Nasukawa, T. and Yi, Jeonghee. “Sentiment analysis: Capturing Favorability Using Natural Language Processing”. In Proceedings of the KCAP-03, 2nd Intl. Conf. on Knowledge Capture, (2003).
  • Bollen, J., Mao, H. and Zeng, X. 2010. “Twitter Mood Predicts the Stock Market”. Journal of Computational Science, (2010): 2(1), 1–8.
  • Kim, Y., Jernite, Y., Sontag, D. and Rush, A. “Character-Aware Neural Language Models”. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), (2016).
  • Zhang, X., Zhao, J. and LeCun, Y. “Character-level Convolutional Networks for Text Classification”. In Proceedings of NIPS, (2015).
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. “Distributed Representations of Words and Phrases and their Compositionality”. In Proceedings of NIPS, (2013).
  • Pennington, J., Socher, R., and Manning, C. D. 2014. “Glove: Global Vectors for Word Representation”. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP), (2014): 12.
  • Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. “Enriching Word Vectors with Subword Information”. arXiv preprint, (2016): 1607.04606.
  • Camacho-Collados, J. and Pilehvar, M.T. “On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis”. arXiv preprint, (2017): 1707.01780
  • Lan, M., Zhang, Z., Lu, Y., and Wu, J. 2016. “Three Convolutional Neural Network-Based Models for Learning Sentiment Word Vectors towards Sentiment Analysis”. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN-16), (2016): 3172-3179.
  • Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., and Zhou, M. 2016. “Sentiment Embeddings with Applications to Sentiment Analysis”. IEEE Trans. Knowl. Data Eng., (2015): 28 (2), 496-509.
  • Yu, L.-C., Wang J., Lai, K. R. and Zhang X. “Refining Word Embeddings for Sentiment Analysis”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, (2017): 545-550.
Year 2017, Volume: 59 Issue: 2, 69 - 79, 21.12.2017

Abstract

References

  • Wiebe, J. “Learning Subjective Adjectives from Corpora”, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, July 30- August 03 (2000): 735-740.
  • Das, S.R. and Chen, M. Y. 2001. “Yahoo! for Amazon: Extracting Market Sentiment from Stock Message Boards”. In Proceedings of the 8th Asia Pacific Finance Association Annual Conference, (2001).
  • Morinaga, S., Yamanishi, K., Tateishi, K. and Fukushima, T. “Mining Product Reputations on the Web”. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2002).
  • Tong, R. M. “An Operational System for Detecting and Tracking Opinions in On-Line Discussion”. In Proceedings of SIGIR Workshop on Operational Text Classification, (2001).
  • Pang, B., Lee, L. and Vaithyanathan. S. “Thumbs up? Sentiment Classification Using Machine Learning Techniques”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), (2002): 79–86.
  • Turney, P. 2002, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (2002): 417–424.
  • Nasukawa, T. and Yi, Jeonghee. “Sentiment analysis: Capturing Favorability Using Natural Language Processing”. In Proceedings of the KCAP-03, 2nd Intl. Conf. on Knowledge Capture, (2003).
  • Bollen, J., Mao, H. and Zeng, X. 2010. “Twitter Mood Predicts the Stock Market”. Journal of Computational Science, (2010): 2(1), 1–8.
  • Kim, Y., Jernite, Y., Sontag, D. and Rush, A. “Character-Aware Neural Language Models”. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), (2016).
  • Zhang, X., Zhao, J. and LeCun, Y. “Character-level Convolutional Networks for Text Classification”. In Proceedings of NIPS, (2015).
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. “Distributed Representations of Words and Phrases and their Compositionality”. In Proceedings of NIPS, (2013).
  • Pennington, J., Socher, R., and Manning, C. D. 2014. “Glove: Global Vectors for Word Representation”. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP), (2014): 12.
  • Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. “Enriching Word Vectors with Subword Information”. arXiv preprint, (2016): 1607.04606.
  • Camacho-Collados, J. and Pilehvar, M.T. “On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis”. arXiv preprint, (2017): 1707.01780
  • Lan, M., Zhang, Z., Lu, Y., and Wu, J. 2016. “Three Convolutional Neural Network-Based Models for Learning Sentiment Word Vectors towards Sentiment Analysis”. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN-16), (2016): 3172-3179.
  • Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., and Zhou, M. 2016. “Sentiment Embeddings with Applications to Sentiment Analysis”. IEEE Trans. Knowl. Data Eng., (2015): 28 (2), 496-509.
  • Yu, L.-C., Wang J., Lai, K. R. and Zhang X. “Refining Word Embeddings for Sentiment Analysis”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, (2017): 545-550.
There are 17 citations in total.

Details

Primary Language English
Journal Section Review Articles
Authors

Nergis Pervan This is me 0000-0003-3241-6812

Hacer Yalım Keleş 0000-0002-1671-4126

Publication Date December 21, 2017
Submission Date November 11, 2017
Acceptance Date December 21, 2017
Published in Issue Year 2017 Volume: 59 Issue: 2

Cite

APA Pervan, N., & Yalım Keleş, H. (2017). SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 59(2), 69-79.
AMA Pervan N, Yalım Keleş H. SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. December 2017;59(2):69-79.
Chicago Pervan, Nergis, and Hacer Yalım Keleş. “SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 59, no. 2 (December 2017): 69-79.
EndNote Pervan N, Yalım Keleş H (December 1, 2017) SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 59 2 69–79.
IEEE N. Pervan and H. Yalım Keleş, “SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS”, Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng., vol. 59, no. 2, pp. 69–79, 2017.
ISNAD Pervan, Nergis - Yalım Keleş, Hacer. “SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 59/2 (December 2017), 69-79.
JAMA Pervan N, Yalım Keleş H. SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2017;59:69–79.
MLA Pervan, Nergis and Hacer Yalım Keleş. “SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 59, no. 2, 2017, pp. 69-79.
Vancouver Pervan N, Yalım Keleş H. SENTIMENT ANALYSIS USING A RANDOM FOREST CLASSIFIER ON TURKISH WEB COMMENTS. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2017;59(2):69-7.

Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.