Feature selection is a pivotal process in machine learning, essential for enhancing model performance by reducing dimensionality, improving generalization, and mitigating overfitting. By eliminating irrelevant or redundant features, simpler and more interpretable models are achieved, which generally perform better. In this study, we introduce an advanced hybrid method combining ensemble feature selection and regularization techniques, designed to optimize model accuracy while significantly reducing the number of features required. Applied to a customer satisfaction dataset, our method was first tested without feature selection, where the model achieved a ROC AUC value of 0.946 on the test set using all 369 features. However, after applying our proposed feature selection method, the model achieved a higher ROC AUC value of 0.954, utilizing only 12 key features and completing the task in approximately 43% less time. These findings demonstrate the effectiveness of our approach in producing a more efficient and superior-performing model.
Hosmer Jr DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression, John Wiley & Sons, London, UK, pp: 254.
Hossin M, Sulaiman MN. 2015. A review on evaluation metrics for data classification evaluations. Inter J Data Dining Knowledge Manage Process, 5(2): 1-8.
Jimenez-del-Toro O, Otálora S, Andersson M, Eurén K, Hedlund M, Rousson M, Müller H, Atzori M. 2017. Analysis of histopathology images: From traditional machine learning to deep learning, Elsevier, New York, USA, pp: 135.
Kabir MM, Islam MM, Murase K. 2010. A new wrapper feature selection approach using neural network. Neurocomputing, 73(16-18): 3273-3283.
Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge Inform Systems, 12: 95-116.
Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. 2002. Logistic regression, Springer, USA, pp: 142.
Kohavi R, John GH. 1997. Wrappers for feature subset selection. Artificial Intel, 97(1-2): 273-324.
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. 2017. Feature selection: A data perspective. ACM Comput Surveys, 50(6):1-45.
Liaw A, Wiener M. 2002. Classification and regression by random Forest. R news, 2(3): 18-22.
Luftensteiner S, Mayr M, Chasparis G. 2021. Filter-based feature selection methods for industrial sensor data: a review. International Conference on Big Data Analytics and Knowledge Discovery, Virtual Event, September 27–30, pp: 242-249.
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA. 2009. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform, 10: 1-16.
Miao J, Niu L. 2016. A survey on feature selection. Procedia Comput Sci, 91: 919-926.
Moldovan D, Cioara T, Anghel I, Salomie I. 2017. Machine learning for sensor-based manufacturing processes. 13th IEEE international conference on intelligent computer communication and processing (ICCP), September 7-9, Cluj-Napoca, Romania, pp: 147-154.
Opitz D, Maclin R. 1999. Popular ensemble methods: An empirical study. J Artificial Intel Res, 11: 169-198.
Ramchandran A, Sangaiah AK. 2018. Unsupervised anomaly detection for high dimensional data—An exploratory analysis, Elsevier, New York, USA, pp: 254.
Remeseiro B, Bolon-Canedo V. 2019. A review of feature selection methods in medical applications. Comput Biol Med, 112: 103375.
Salehi F, Abbasi E, Hassibi B. 2019. The impact of regularization on high-dimensional logistic regression. Adv Neural Inform Proces Systems, 32: 1-11.
Shardlow M. 2016. An analysis of feature selection techniques. Univ Manchester, 1: 1-7.
Sugiyama M. 2015. Introduction to statistical machine learning. Morgan Kaufmann, New York, USA, pp: 425.
Tian Y, Zhang Y. 2022. A comprehensive survey on regularization strategies in machine learning. Informn Fusion, 80: 146-166.
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Method, 58(1): 267-288.
Yousefi T, Varlıklar Ö. 2024. Breast Cancer Prediction with Hybrid Filter-Wrapper Feature Selection. Inter J Adv Nat Sci Engin Res, 8: 411-419.
Yousefi T, Varlılar Aktaş Ö. 2024. Predicting customer satisfaction with hybrid basic filter-based feature selection method. 4th International Artificial Intelligence and Data Science Congress, 14-15 March, Izmir, Türkiye, pp: 1-10.
Zheng A, Casari A. 2018. Feature engineering for machine learning: principles and techniques for data scientists. O'Reilly Media, Inc., London, UK, pp: 358.
Zhou H, Zhang J, Zhou Y, Guo X, Ma Y. 2021. A feature selection algorithm of decision tree based on feature weight. Expert Syst Applicat, 164: 113842.
An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification
Feature selection is a pivotal process in machine learning, essential for enhancing model performance by reducing dimensionality, improving generalization, and mitigating overfitting. By eliminating irrelevant or redundant features, simpler and more interpretable models are achieved, which generally perform better. In this study, we introduce an advanced hybrid method combining ensemble feature selection and regularization techniques, designed to optimize model accuracy while significantly reducing the number of features required. Applied to a customer satisfaction dataset, our method was first tested without feature selection, where the model achieved a ROC AUC value of 0.946 on the test set using all 369 features. However, after applying our proposed feature selection method, the model achieved a higher ROC AUC value of 0.954, utilizing only 12 key features and completing the task in approximately 43% less time. These findings demonstrate the effectiveness of our approach in producing a more efficient and superior-performing model.
Hosmer Jr DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression, John Wiley & Sons, London, UK, pp: 254.
Hossin M, Sulaiman MN. 2015. A review on evaluation metrics for data classification evaluations. Inter J Data Dining Knowledge Manage Process, 5(2): 1-8.
Jimenez-del-Toro O, Otálora S, Andersson M, Eurén K, Hedlund M, Rousson M, Müller H, Atzori M. 2017. Analysis of histopathology images: From traditional machine learning to deep learning, Elsevier, New York, USA, pp: 135.
Kabir MM, Islam MM, Murase K. 2010. A new wrapper feature selection approach using neural network. Neurocomputing, 73(16-18): 3273-3283.
Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge Inform Systems, 12: 95-116.
Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. 2002. Logistic regression, Springer, USA, pp: 142.
Kohavi R, John GH. 1997. Wrappers for feature subset selection. Artificial Intel, 97(1-2): 273-324.
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. 2017. Feature selection: A data perspective. ACM Comput Surveys, 50(6):1-45.
Liaw A, Wiener M. 2002. Classification and regression by random Forest. R news, 2(3): 18-22.
Luftensteiner S, Mayr M, Chasparis G. 2021. Filter-based feature selection methods for industrial sensor data: a review. International Conference on Big Data Analytics and Knowledge Discovery, Virtual Event, September 27–30, pp: 242-249.
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA. 2009. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform, 10: 1-16.
Miao J, Niu L. 2016. A survey on feature selection. Procedia Comput Sci, 91: 919-926.
Moldovan D, Cioara T, Anghel I, Salomie I. 2017. Machine learning for sensor-based manufacturing processes. 13th IEEE international conference on intelligent computer communication and processing (ICCP), September 7-9, Cluj-Napoca, Romania, pp: 147-154.
Opitz D, Maclin R. 1999. Popular ensemble methods: An empirical study. J Artificial Intel Res, 11: 169-198.
Ramchandran A, Sangaiah AK. 2018. Unsupervised anomaly detection for high dimensional data—An exploratory analysis, Elsevier, New York, USA, pp: 254.
Remeseiro B, Bolon-Canedo V. 2019. A review of feature selection methods in medical applications. Comput Biol Med, 112: 103375.
Salehi F, Abbasi E, Hassibi B. 2019. The impact of regularization on high-dimensional logistic regression. Adv Neural Inform Proces Systems, 32: 1-11.
Shardlow M. 2016. An analysis of feature selection techniques. Univ Manchester, 1: 1-7.
Sugiyama M. 2015. Introduction to statistical machine learning. Morgan Kaufmann, New York, USA, pp: 425.
Tian Y, Zhang Y. 2022. A comprehensive survey on regularization strategies in machine learning. Informn Fusion, 80: 146-166.
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Method, 58(1): 267-288.
Yousefi T, Varlıklar Ö. 2024. Breast Cancer Prediction with Hybrid Filter-Wrapper Feature Selection. Inter J Adv Nat Sci Engin Res, 8: 411-419.
Yousefi T, Varlılar Aktaş Ö. 2024. Predicting customer satisfaction with hybrid basic filter-based feature selection method. 4th International Artificial Intelligence and Data Science Congress, 14-15 March, Izmir, Türkiye, pp: 1-10.
Zheng A, Casari A. 2018. Feature engineering for machine learning: principles and techniques for data scientists. O'Reilly Media, Inc., London, UK, pp: 358.
Zhou H, Zhang J, Zhou Y, Guo X, Ma Y. 2021. A feature selection algorithm of decision tree based on feature weight. Expert Syst Applicat, 164: 113842.
Toplam 33 adet kaynakça vardır.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Karar Desteği ve Grup Destek Sistemleri, Bilgi Sistemleri (Diğer), Uygulamalı Matematik (Diğer)
Yousefi, T., Varlıklar, Ö., & Odabas, M. S. (2024). An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. Black Sea Journal of Engineering and Science, 7(6), 1224-1231. https://doi.org/10.34248/bsengineering.1541950
AMA
Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. Kasım 2024;7(6):1224-1231. doi:10.34248/bsengineering.1541950
Chicago
Yousefi, Tohid, Özlem Varlıklar, ve Mehmet Serhat Odabas. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science 7, sy. 6 (Kasım 2024): 1224-31. https://doi.org/10.34248/bsengineering.1541950.
EndNote
Yousefi T, Varlıklar Ö, Odabas MS (01 Kasım 2024) An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. Black Sea Journal of Engineering and Science 7 6 1224–1231.
IEEE
T. Yousefi, Ö. Varlıklar, ve M. S. Odabas, “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”, BSJ Eng. Sci., c. 7, sy. 6, ss. 1224–1231, 2024, doi: 10.34248/bsengineering.1541950.
ISNAD
Yousefi, Tohid vd. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science 7/6 (Kasım 2024), 1224-1231. https://doi.org/10.34248/bsengineering.1541950.
JAMA
Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. 2024;7:1224–1231.
MLA
Yousefi, Tohid vd. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science, c. 7, sy. 6, 2024, ss. 1224-31, doi:10.34248/bsengineering.1541950.
Vancouver
Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. 2024;7(6):1224-31.