Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms

Ali Öztürk

Research Article

Year 2018, Volume: 1 Issue: 1, 16 - 20, 26.12.2018

Ali Öztürk

Abstract

References

[1] H. Almuallim ve T.G. Dietterich, “Learning Boolean concepts in the presence of many irrelevant features”, Artificial Intelligence, cilt 69, ss. 279-306, 1994.
[2] G. John, R. Kohavi ve K. Pfleger, “Irrelevant features and the subset selection problem”, Proc. 5th International Conference on Machine Learning, New Brunswick, NJ, 1994, ss. 121-129.
[3] R. Kohavi ve G.John,“Wrappers for feature subset selection”, Artificial Intelligence, cilt 97, ss. 273-324, 1997.
[4] G. Forman, "An extensive empirical study of feature selection metrics for text classification," J. Mach. Learn. Res., cilt 3, ss. 1289–1305, 2003.
[5] T. Liu, S. Liu ve Z. Chen, "An evaluation on feature selection for text clustering", Proc. 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, AAAI Press, ss. 488–495, 2003.
[6] M. Mustra, M. Grgic ve K. Delac, "Breast density classification using multiple feature selection", Automatika, cilt 53, ss. 1289–1305, 2012.
[7] H. Abusamra, "A comparative study of feature selection and classification methods for gene expression data of glioma", Procedia Computer Science, cilt 23, ss. 5–14, 2013.
[8] C. Liu, D. Jiang ve W. Yang, "Global geometric similarity scheme for feature selection in fault diagnosis", Expert Systems with Applications, cilt 41, sayı 8, ss. 3585–3595, 2014.
[9] M. Lichman,“UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science”, 2013.
[10] E. Frank ve H.I.Witten, “Generating Accurate Rule Sets Without Global Optimization”, Proc. 15th Int. Conf. on Machine Learning, ss. 144-151.
[11] J.R.Quinlan,“Simplifying decision trees”, Int. Journal of Man-Machine Studies, cilt 12, ss. 221-234, 1987.
[12] W.W. Cohen, “Fast Effective Rule Induction”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 115-123.
[13] J.R. Quinlan,“C4.5: Programs for Machine Learning”, Machine Learning, cilt 16, ss. 235-240, 1994.
[14] J.R. Quinlan,“MDL and categorical theories (continued)”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 464-470.
[15] J. Friedman, T. Hastie ve R.Tibshirani,“Additive logistic regression: A statistical view of boosting”, Annals of Statistics, cilt 28(2), ss. 337-407, 2000.
[16] A. Ozturk ve R.Seherli,“Nonlinear Short-term Prediction of Aluminum Foil Thickness via Global Regressor Combination”, Applied Artificial Intelligence, cilt 31(7-8), ss. 568-592, 2017.
[17] L. Breiman,“Random Forests”, Machine Learning, cilt 45(1), ss. 5-32, 2001.
[18] J. Tang, S. Alelyani ve H. Liu, "Feature Selection for Classification: A Review," C. Aggarwal (ed.), Data Classification: Algorithms and Applications. CRC Press, 2014.
[19] I.T. Jolliffe, Principal Component Analysis, John Wiley & Sons, 2002.
[20] A. Tharwat,“Principal component analysis -a tutorial”, Int. J. App. Pattern Recognition, cilt 3, ss. 197-238, 2016.
[21] C.M. Bishop, Pattern Recognition and Machine Learning, (Singapur – Springer), 2006.
[22] Eibe Frank, Mark A. Hall ve Ian H. Witten. The WEKA Workbench. "Data Mining: Practical Machine Learning Tools and Techniques" için Çevrimiçi Ek, Morgan Kaufmann, 4. Baskı, 2016.
[23] L. Rokach ve O. Maimon, “Decision Trees”, The Data Mining and Knowledge Discovery Handbook, 2005, bölüm 9, ss. 165-192.

Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms

Year 2018, Volume: 1 Issue: 1, 16 - 20, 26.12.2018

Ali Öztürk

Abstract

There are two main approaches for feature subset selection, i.e., wrapper and filter based. In wrapper based approach, which is a supervised method, the feature subset selection algorithm acts as a wrapper around an induction algorithm. The induction algorithm is actually a black-box for the feature subset selection algorithm and is mostly the classifier itself. The filter approach is an unsupervised method and attempts to assess the merits of features from the data while ignoring the performance of the induction algorithm. In this study, the effects of the feature subset selection approaches on the classification performance of rule-based learning algorithms, i.e., C4.5, RIPPER, PART, BFTree were investigated. These algorithms are fast in case of wrapper based approach. For various datasets, significant accuracy improvements were achieved with the wrapper based feature subset selection method. Other algorithms like Multilayer Perceptron (MLP) and Random Forests (RF) were also applied on the same datasets for the purpose of accuracy comparison. These two algorithms were very inefficient in terms of time when they were used in wrapper approach.

Keywords

Rule-based learning, feature extraction, wrapper, filtering

References

[1] H. Almuallim ve T.G. Dietterich, “Learning Boolean concepts in the presence of many irrelevant features”, Artificial Intelligence, cilt 69, ss. 279-306, 1994.
[2] G. John, R. Kohavi ve K. Pfleger, “Irrelevant features and the subset selection problem”, Proc. 5th International Conference on Machine Learning, New Brunswick, NJ, 1994, ss. 121-129.
[3] R. Kohavi ve G.John,“Wrappers for feature subset selection”, Artificial Intelligence, cilt 97, ss. 273-324, 1997.
[4] G. Forman, "An extensive empirical study of feature selection metrics for text classification," J. Mach. Learn. Res., cilt 3, ss. 1289–1305, 2003.
[5] T. Liu, S. Liu ve Z. Chen, "An evaluation on feature selection for text clustering", Proc. 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, AAAI Press, ss. 488–495, 2003.
[6] M. Mustra, M. Grgic ve K. Delac, "Breast density classification using multiple feature selection", Automatika, cilt 53, ss. 1289–1305, 2012.
[7] H. Abusamra, "A comparative study of feature selection and classification methods for gene expression data of glioma", Procedia Computer Science, cilt 23, ss. 5–14, 2013.
[8] C. Liu, D. Jiang ve W. Yang, "Global geometric similarity scheme for feature selection in fault diagnosis", Expert Systems with Applications, cilt 41, sayı 8, ss. 3585–3595, 2014.
[9] M. Lichman,“UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science”, 2013.
[10] E. Frank ve H.I.Witten, “Generating Accurate Rule Sets Without Global Optimization”, Proc. 15th Int. Conf. on Machine Learning, ss. 144-151.
[11] J.R.Quinlan,“Simplifying decision trees”, Int. Journal of Man-Machine Studies, cilt 12, ss. 221-234, 1987.
[12] W.W. Cohen, “Fast Effective Rule Induction”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 115-123.
[13] J.R. Quinlan,“C4.5: Programs for Machine Learning”, Machine Learning, cilt 16, ss. 235-240, 1994.
[14] J.R. Quinlan,“MDL and categorical theories (continued)”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 464-470.
[15] J. Friedman, T. Hastie ve R.Tibshirani,“Additive logistic regression: A statistical view of boosting”, Annals of Statistics, cilt 28(2), ss. 337-407, 2000.
[16] A. Ozturk ve R.Seherli,“Nonlinear Short-term Prediction of Aluminum Foil Thickness via Global Regressor Combination”, Applied Artificial Intelligence, cilt 31(7-8), ss. 568-592, 2017.
[17] L. Breiman,“Random Forests”, Machine Learning, cilt 45(1), ss. 5-32, 2001.
[18] J. Tang, S. Alelyani ve H. Liu, "Feature Selection for Classification: A Review," C. Aggarwal (ed.), Data Classification: Algorithms and Applications. CRC Press, 2014.
[19] I.T. Jolliffe, Principal Component Analysis, John Wiley & Sons, 2002.
[20] A. Tharwat,“Principal component analysis -a tutorial”, Int. J. App. Pattern Recognition, cilt 3, ss. 197-238, 2016.
[21] C.M. Bishop, Pattern Recognition and Machine Learning, (Singapur – Springer), 2006.
[22] Eibe Frank, Mark A. Hall ve Ian H. Witten. The WEKA Workbench. "Data Mining: Practical Machine Learning Tools and Techniques" için Çevrimiçi Ek, Morgan Kaufmann, 4. Baskı, 2016.
[23] L. Rokach ve O. Maimon, “Decision Trees”, The Data Mining and Knowledge Discovery Handbook, 2005, bölüm 9, ss. 165-192.

There are 23 citations in total.

Details

Primary Language	English
Subjects	Data Mining and Knowledge Discovery
Journal Section	Research Article
Authors	Ali Öztürk
Publication Date	December 26, 2018
Published in Issue	Year 2018 Volume: 1 Issue: 1

Cite

IEEE	A. Öztürk, “Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms”, International Journal of Data Science and Applications, vol. 1, no. 1, pp. 16–20, 2018.

Download Cover Image

Article Files

Full Text

AI Research and Application Center, Sakarya University of Applied Sciences, Sakarya, Türkiye.