Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms

Ali Öztürk

Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms

Öz

There are two main approaches for feature subset selection, i.e., wrapper and filter based. In wrapper based approach, which is a supervised method, the feature subset selection algorithm acts as a wrapper around an induction algorithm. The induction algorithm is actually a black-box for the feature subset selection algorithm and is mostly the classifier itself. The filter approach is an unsupervised method and attempts to assess the merits of features from the data while ignoring the performance of the induction algorithm. In this study, the effects of the feature subset selection approaches on the classification performance of rule-based learning algorithms, i.e., C4.5, RIPPER, PART, BFTree were investigated. These algorithms are fast in case of wrapper based approach. For various datasets, significant accuracy improvements were achieved with the wrapper based feature subset selection method. Other algorithms like Multilayer Perceptron (MLP) and Random Forests (RF) were also applied on the same datasets for the purpose of accuracy comparison. These two algorithms were very inefficient in terms of time when they were used in wrapper approach.

Anahtar Kelimeler

Rule-based learning, feature extraction, wrapper, filtering

Kaynakça

[1] H. Almuallim ve T.G. Dietterich, “Learning Boolean concepts in the presence of many irrelevant features”, Artificial Intelligence, cilt 69, ss. 279-306, 1994.
[2] G. John, R. Kohavi ve K. Pfleger, “Irrelevant features and the subset selection problem”, Proc. 5th International Conference on Machine Learning, New Brunswick, NJ, 1994, ss. 121-129.
[3] R. Kohavi ve G.John,“Wrappers for feature subset selection”, Artificial Intelligence, cilt 97, ss. 273-324, 1997.
[4] G. Forman, "An extensive empirical study of feature selection metrics for text classification," J. Mach. Learn. Res., cilt 3, ss. 1289–1305, 2003.
[5] T. Liu, S. Liu ve Z. Chen, "An evaluation on feature selection for text clustering", Proc. 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, AAAI Press, ss. 488–495, 2003.
[6] M. Mustra, M. Grgic ve K. Delac, "Breast density classification using multiple feature selection", Automatika, cilt 53, ss. 1289–1305, 2012.
[7] H. Abusamra, "A comparative study of feature selection and classification methods for gene expression data of glioma", Procedia Computer Science, cilt 23, ss. 5–14, 2013.
[8] C. Liu, D. Jiang ve W. Yang, "Global geometric similarity scheme for feature selection in fault diagnosis", Expert Systems with Applications, cilt 41, sayı 8, ss. 3585–3595, 2014.
[9] M. Lichman,“UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science”, 2013.
[10] E. Frank ve H.I.Witten, “Generating Accurate Rule Sets Without Global Optimization”, Proc. 15th Int. Conf. on Machine Learning, ss. 144-151.

[11] J.R.Quinlan,“Simplifying decision trees”, Int. Journal of Man-Machine Studies, cilt 12, ss. 221-234, 1987.
[12] W.W. Cohen, “Fast Effective Rule Induction”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 115-123.
[13] J.R. Quinlan,“C4.5: Programs for Machine Learning”, Machine Learning, cilt 16, ss. 235-240, 1994.
[14] J.R. Quinlan,“MDL and categorical theories (continued)”, Proc. 12th Int. Conf. on Machine Learning, 1995, ss. 464-470.
[15] J. Friedman, T. Hastie ve R.Tibshirani,“Additive logistic regression: A statistical view of boosting”, Annals of Statistics, cilt 28(2), ss. 337-407, 2000.
[16] A. Ozturk ve R.Seherli,“Nonlinear Short-term Prediction of Aluminum Foil Thickness via Global Regressor Combination”, Applied Artificial Intelligence, cilt 31(7-8), ss. 568-592, 2017.
[17] L. Breiman,“Random Forests”, Machine Learning, cilt 45(1), ss. 5-32, 2001.
[18] J. Tang, S. Alelyani ve H. Liu, "Feature Selection for Classification: A Review," C. Aggarwal (ed.), Data Classification: Algorithms and Applications. CRC Press, 2014.
[19] I.T. Jolliffe, Principal Component Analysis, John Wiley & Sons, 2002.
[20] A. Tharwat,“Principal component analysis -a tutorial”, Int. J. App. Pattern Recognition, cilt 3, ss. 197-238, 2016.
[21] C.M. Bishop, Pattern Recognition and Machine Learning, (Singapur – Springer), 2006.
[22] Eibe Frank, Mark A. Hall ve Ian H. Witten. The WEKA Workbench. "Data Mining: Practical Machine Learning Tools and Techniques" için Çevrimiçi Ek, Morgan Kaufmann, 4. Baskı, 2016.
[23] L. Rokach ve O. Maimon, “Decision Trees”, The Data Mining and Knowledge Discovery Handbook, 2005, bölüm 9, ss. 165-192.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Veri Madenciliği ve Bilgi Keşfi

Bölüm

Araştırma Makalesi

Yazarlar

Ali Öztürk
Türkiye

Yayımlanma Tarihi

27 Aralık 2018

Gönderilme Tarihi

27 Kasım 2018

Kabul Tarihi

21 Aralık 2018

Yayımlandığı Sayı

Yıl 2018 Cilt: 1 Sayı: 1

IZ

https://izlik.org/JA82UP86DR

APA

Öztürk, A. (2018). Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms. Data Science and Applications, 1(1), 16-20. https://izlik.org/JA82UP86DR

AMA

1.Öztürk A. Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms. DataSCI. 2018;1(1):16-20. https://izlik.org/JA82UP86DR

Chicago

Öztürk, Ali. 2018. “Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms”. Data Science and Applications 1 (1): 16-20. https://izlik.org/JA82UP86DR.

EndNote

Öztürk A (01 Aralık 2018) Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms. Data Science and Applications 1 1 16–20.

IEEE

[1]A. Öztürk, “Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms”, DataSCI, c. 1, sy 1, ss. 16–20, Ara. 2018, [çevrimiçi]. Erişim adresi: https://izlik.org/JA82UP86DR

ISNAD

Öztürk, Ali. “Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms”. Data Science and Applications 1/1 (01 Aralık 2018): 16-20. https://izlik.org/JA82UP86DR.

JAMA

1.Öztürk A. Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms. DataSCI. 2018;1:16–20.

MLA

Öztürk, Ali. “Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms”. Data Science and Applications, c. 1, sy 1, Aralık 2018, ss. 16-20, https://izlik.org/JA82UP86DR.

Vancouver

1.Ali Öztürk. Performance Evaluation of Feature Subset Selection Approaches on Rule-Based Learning Algorithms. DataSCI [Internet]. 01 Aralık 2018;1(1):16-20. Erişim adresi: https://izlik.org/JA82UP86DR