COMPARATIVE ANALYSIS OF RECENT FEATURE SELECTION METHODS FOR SENTIMENT CLASSIFICATION

Alper Kürşat Uysal

Year 2018, Volume: 19 Issue: 3, 645 - 659, 01.09.2018

Alper Kürşat Uysal

Abstract

References

Feldman R. Techniques and applications for sentiment analysis. Communications of the ACM 2013; 56: 82-89.
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, et al. Recursive deep models for semantic compositionality over a sentiment treebank, in Proceedings of the conference on empirical methods in natural language processing (EMNLP); 2013, pp. 1631-1642.
Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications 2017; 77: 236-246.
Uysal AK, Murphey YL. Sentiment classification: Feature selection based approaches versus deep learning, in 17th IEEE International Conference on Computer and Information Technology (CIT); 2017, pp. 23-30.
Uysal AK. An improved global feature selection scheme for text classification. Expert Systems with Applications 2016; 43: 82-92.
Saraee M, Bagheri A. Feature selection methods in Persian sentiment analysis, in 18th International Conference on Applications of Natural Language to Information Systems; Salford, UK; 2013, pp. 303-308.
Agarwal B, Mittal N. Optimal feature selection for sentiment analysis, in Computational Linguistics and Intelligent Text Processing: 14th International Conference (CICLing 2013); Samos, Greece; 2013, pp. 13-24.
Wang H, Yin P, Yao J, Liu JNK. Text feature selection for sentiment classification of Chinese online reviews. Journal of Experimental & Theoretical Artificial Intelligence 2013; 25: 425-439.
Omar N, Albared M, Al-Moslmi T, Al-Shabi A. A comparative study of feature selection and machine learning algorithms for Arabic sentiment classification, in 10th Asia Information Retrieval Societies Conference (AIRS 2014); Kuching, Malaysia; 2014, pp. 429-443.
Akba F, Uçan A, Sezer E, Sever H. Assessment of feature selection metrics for sentiment analyses: Turkish movie reviews, in 8th European Conference on Data Mining; 2014, pp. 180-184.
Prusa JD, Khoshgoftaar TM, Dittman DJ. Impact of feature selection techniques for tweet sentiment classification, in FLAIRS Conference; 2015, pp. 299-304.
Onan A, Korukoğlu S. A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science 2017; 43: 25-38.
Ogura H, Amano H, Kondo M. Feature selection with a measure of deviations from Poisson in text categorization. Decision Support Systems 2009; 36: 6826-6832.
Uysal AK, Gunal S. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems 2012; 36: 226-235.
Zong W, Wu F, Chu L-K, Sculli D. A discriminative and semantic feature selection method for text categorization. International Journal of Production Economics 2015; 165: 215-222.
Feng L, Zuo W, Wang Y. Improved comprehensive measurement feature selection method for text categorization, in Network and Information Systems for Computers (ICNISC), 2015 International Conference on; 2015, pp. 125-128.
Rehman A, Javed K, Babri HA, Saeed M. Relative discrimination criterion – A novel feature ranking method for text data. Expert Systems with Applications 2015; 42: 3670-3681.
Zhou H, Guo J, Wang Y, Zhao M. A feature selection approach based on interclass and intraclass relative contributions of terms. Computational Intelligence and Neuroscience 2016; 1-8.
Joachims T. Text categorization with support vector machines: Learning with many relevant features, in 10th European Conference on Machine Learning; Chemnitz, Germany; 1998, pp. 137-142.
Chen J, Huang H, Tian S, Qu Y. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications 2009; 36: 5432-5435.
Kumar MA, Gopal M. A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters 2010; 31: 1437-1444.
Yu B, Zhu DH. Combining neural networks and semantic feature space for email classification. Knowledge-Based Systems 2009; 22: 376-381.
Theodoridis S, Koutroumbas K. Pattern Recognition, 4 ed., Academic Press, 2008.
Jiang L, Cai Z, Zhang H, Wang D. Naive Bayes text classifiers: A locally weighted learning approach. Journal of Experimental & Theoretical Artificial Intelligence 2013; 25: 273-286.
Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, in Proceedings of the 42nd annual meeting on Association for Computational Linguistics; 2004, pp. 271.
Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 2009; 1: 1-6.
Whitehead M, Yaeger L. Building a general purpose cross-domain sentiment mining model, in World Congress on Computer Science and Information Engineering; 2009, pp. 472-476.
Manning CD, Raghavan P, Schutze H. Introduction to Information Retrieval. New York, USA, Cambridge University Press, 2008.
Gunal S. Hybrid feature selection for text classification. Turkish Journal of Electrical Engineering and Computer Sciences 2012; 20: 1296-1311.
Porter MF. An algorithm for suffix stripping. Program 1980; 14: 130-137.

COMPARATIVE ANALYSIS OF RECENT FEATURE SELECTION METHODS FOR SENTIMENT CLASSIFICATION

Year 2018, Volume: 19 Issue: 3, 645 - 659, 01.09.2018

Alper Kürşat Uysal

Abstract

Sentiment classification has become one of the most popular text classification domains especially in recent years. As it is valid for all text classification problems, high dimensionality of the feature space is one of the most important concerns for sentiment classification due to accuracy considerations. This study analyses the performance of six recent text feature selection methods for document level sentiment classification using two widely-known classifiers namely Support Vector Machines (SVM) and naïve Bayes (NB). Three datasets including different types of sentiment data were utilized in the experiments. These datasets are named as Cornell movie review, Sentiment140, and Nine public sentiment. For evaluation, two different success measures namely Micro-F1 and Macro-F1 were used. Also, 3-fold cross-validation is preferred for a fair system performance evaluation. Experiments indicated that distinguishing feature selector (DFS) and discriminative features selection (DFSS) methods are superior to the other four feature selection methods for sentiment classification. The highest classification performances with SVM classifier were obtained when it is combined with DFSS feature selection method in general. On the other hand, highest classification performances with NB classifier were obtained when it is combined with DFS feature selection method.

Keywords

Pattern recognition, sentiment classification, feature selection

References

Feldman R. Techniques and applications for sentiment analysis. Communications of the ACM 2013; 56: 82-89.
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, et al. Recursive deep models for semantic compositionality over a sentiment treebank, in Proceedings of the conference on empirical methods in natural language processing (EMNLP); 2013, pp. 1631-1642.
Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications 2017; 77: 236-246.
Uysal AK, Murphey YL. Sentiment classification: Feature selection based approaches versus deep learning, in 17th IEEE International Conference on Computer and Information Technology (CIT); 2017, pp. 23-30.
Uysal AK. An improved global feature selection scheme for text classification. Expert Systems with Applications 2016; 43: 82-92.
Saraee M, Bagheri A. Feature selection methods in Persian sentiment analysis, in 18th International Conference on Applications of Natural Language to Information Systems; Salford, UK; 2013, pp. 303-308.
Agarwal B, Mittal N. Optimal feature selection for sentiment analysis, in Computational Linguistics and Intelligent Text Processing: 14th International Conference (CICLing 2013); Samos, Greece; 2013, pp. 13-24.
Wang H, Yin P, Yao J, Liu JNK. Text feature selection for sentiment classification of Chinese online reviews. Journal of Experimental & Theoretical Artificial Intelligence 2013; 25: 425-439.
Omar N, Albared M, Al-Moslmi T, Al-Shabi A. A comparative study of feature selection and machine learning algorithms for Arabic sentiment classification, in 10th Asia Information Retrieval Societies Conference (AIRS 2014); Kuching, Malaysia; 2014, pp. 429-443.
Akba F, Uçan A, Sezer E, Sever H. Assessment of feature selection metrics for sentiment analyses: Turkish movie reviews, in 8th European Conference on Data Mining; 2014, pp. 180-184.
Prusa JD, Khoshgoftaar TM, Dittman DJ. Impact of feature selection techniques for tweet sentiment classification, in FLAIRS Conference; 2015, pp. 299-304.
Onan A, Korukoğlu S. A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science 2017; 43: 25-38.
Ogura H, Amano H, Kondo M. Feature selection with a measure of deviations from Poisson in text categorization. Decision Support Systems 2009; 36: 6826-6832.
Uysal AK, Gunal S. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems 2012; 36: 226-235.
Zong W, Wu F, Chu L-K, Sculli D. A discriminative and semantic feature selection method for text categorization. International Journal of Production Economics 2015; 165: 215-222.
Feng L, Zuo W, Wang Y. Improved comprehensive measurement feature selection method for text categorization, in Network and Information Systems for Computers (ICNISC), 2015 International Conference on; 2015, pp. 125-128.
Rehman A, Javed K, Babri HA, Saeed M. Relative discrimination criterion – A novel feature ranking method for text data. Expert Systems with Applications 2015; 42: 3670-3681.
Zhou H, Guo J, Wang Y, Zhao M. A feature selection approach based on interclass and intraclass relative contributions of terms. Computational Intelligence and Neuroscience 2016; 1-8.
Joachims T. Text categorization with support vector machines: Learning with many relevant features, in 10th European Conference on Machine Learning; Chemnitz, Germany; 1998, pp. 137-142.
Chen J, Huang H, Tian S, Qu Y. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications 2009; 36: 5432-5435.
Kumar MA, Gopal M. A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters 2010; 31: 1437-1444.
Yu B, Zhu DH. Combining neural networks and semantic feature space for email classification. Knowledge-Based Systems 2009; 22: 376-381.
Theodoridis S, Koutroumbas K. Pattern Recognition, 4 ed., Academic Press, 2008.
Jiang L, Cai Z, Zhang H, Wang D. Naive Bayes text classifiers: A locally weighted learning approach. Journal of Experimental & Theoretical Artificial Intelligence 2013; 25: 273-286.
Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, in Proceedings of the 42nd annual meeting on Association for Computational Linguistics; 2004, pp. 271.
Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 2009; 1: 1-6.
Whitehead M, Yaeger L. Building a general purpose cross-domain sentiment mining model, in World Congress on Computer Science and Information Engineering; 2009, pp. 472-476.
Manning CD, Raghavan P, Schutze H. Introduction to Information Retrieval. New York, USA, Cambridge University Press, 2008.
Gunal S. Hybrid feature selection for text classification. Turkish Journal of Electrical Engineering and Computer Sciences 2012; 20: 1296-1311.
Porter MF. An algorithm for suffix stripping. Program 1980; 14: 130-137.

There are 30 citations in total.

Details

Journal Section	Articles
Authors	Alper Kürşat Uysal This is me
Publication Date	September 1, 2018
Published in Issue	Year 2018 Volume: 19 Issue: 3

Cite

AMA	Uysal AK. COMPARATIVE ANALYSIS OF RECENT FEATURE SELECTION METHODS FOR SENTIMENT CLASSIFICATION. Estuscience - Se. September 2018;19(3):645-659.

Article Files

Full Text