Sentiment classification has become one of the most popular text classification domains especially in recent years. As it is valid for all text classification problems, high dimensionality of the feature space is one of the most important concerns for sentiment classification due to accuracy considerations. This study analyses the performance of six recent text feature selection methods for document level sentiment classification using two widely-known classifiers namely Support Vector Machines (SVM) and naïve Bayes (NB). Three datasets including different types of sentiment data were utilized in the experiments. These datasets are named as Cornell movie review, Sentiment140, and Nine public sentiment. For evaluation, two different success measures namely Micro-F1 and Macro-F1 were used. Also, 3-fold cross-validation is preferred for a fair system performance evaluation. Experiments indicated that distinguishing feature selector (DFS) and discriminative features selection (DFSS) methods are superior to the other four feature selection methods for sentiment classification. The highest classification performances with SVM classifier were obtained when it is combined with DFSS feature selection method in general. On the other hand, highest classification performances with NB classifier were obtained when it is combined with DFS feature selection method.
Journal Section | Articles |
---|---|
Authors | |
Publication Date | September 1, 2018 |
Published in Issue | Year 2018 Volume: 19 Issue: 3 |