Araştırma Makalesi
BibTex RIS Kaynak Göster

An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification

Yıl 2024, , 1224 - 1231, 15.11.2024
https://doi.org/10.34248/bsengineering.1541950

Öz

Feature selection is a pivotal process in machine learning, essential for enhancing model performance by reducing dimensionality, improving generalization, and mitigating overfitting. By eliminating irrelevant or redundant features, simpler and more interpretable models are achieved, which generally perform better. In this study, we introduce an advanced hybrid method combining ensemble feature selection and regularization techniques, designed to optimize model accuracy while significantly reducing the number of features required. Applied to a customer satisfaction dataset, our method was first tested without feature selection, where the model achieved a ROC AUC value of 0.946 on the test set using all 369 features. However, after applying our proposed feature selection method, the model achieved a higher ROC AUC value of 0.954, utilizing only 12 key features and completing the task in approximately 43% less time. These findings demonstrate the effectiveness of our approach in producing a more efficient and superior-performing model.

Kaynakça

  • Azhagusundari B, Thanamani AS. 2013. Feature selection based on information gain. Inter J Innov Technol Explor Engin (IJITEE), 2(2): 18-21.
  • Biau G, Scornet E. 2016. A random forest guided tour. Test, 25: 197-227.
  • Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Comput Electr Engin, 40(1): 16-28.
  • Freeman C, Kulić D, Basir O. 2013. Feature-selected tree-based classification. IEEE Transact Cybernet, 43(6): 1990-2004.
  • Hasan MAM, Nasser M, Ahmad S, Molla KI. 2016. Feature selection for intrusion detection using random forest. J Inform Sec, 7(3): 129-140.
  • Hoerl AE, Kennard RW. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1): 55-67.
  • Hosmer Jr DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression, John Wiley & Sons, London, UK, pp: 254.
  • Hossin M, Sulaiman MN. 2015. A review on evaluation metrics for data classification evaluations. Inter J Data Dining Knowledge Manage Process, 5(2): 1-8.
  • Jimenez-del-Toro O, Otálora S, Andersson M, Eurén K, Hedlund M, Rousson M, Müller H, Atzori M. 2017. Analysis of histopathology images: From traditional machine learning to deep learning, Elsevier, New York, USA, pp: 135.
  • Kabir MM, Islam MM, Murase K. 2010. A new wrapper feature selection approach using neural network. Neurocomputing, 73(16-18): 3273-3283.
  • Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge Inform Systems, 12: 95-116.
  • Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. 2002. Logistic regression, Springer, USA, pp: 142.
  • Kohavi R, John GH. 1997. Wrappers for feature subset selection. Artificial Intel, 97(1-2): 273-324.
  • Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. 2017. Feature selection: A data perspective. ACM Comput Surveys, 50(6):1-45.
  • Liaw A, Wiener M. 2002. Classification and regression by random Forest. R news, 2(3): 18-22.
  • Luftensteiner S, Mayr M, Chasparis G. 2021. Filter-based feature selection methods for industrial sensor data: a review. International Conference on Big Data Analytics and Knowledge Discovery, Virtual Event, September 27–30, pp: 242-249.
  • McDonald GC. 2009. Ridge regression. Computa Stat, 1(1): 93-100.
  • Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA. 2009. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform, 10: 1-16.
  • Miao J, Niu L. 2016. A survey on feature selection. Procedia Comput Sci, 91: 919-926.
  • Moldovan D, Cioara T, Anghel I, Salomie I. 2017. Machine learning for sensor-based manufacturing processes. 13th IEEE international conference on intelligent computer communication and processing (ICCP), September 7-9, Cluj-Napoca, Romania, pp: 147-154.
  • Opitz D, Maclin R. 1999. Popular ensemble methods: An empirical study. J Artificial Intel Res, 11: 169-198.
  • Ramchandran A, Sangaiah AK. 2018. Unsupervised anomaly detection for high dimensional data—An exploratory analysis, Elsevier, New York, USA, pp: 254.
  • Ranstam J, Cook JA. 2018. LASSO regression. J British Surg, 105(10): 1348-1358.
  • Remeseiro B, Bolon-Canedo V. 2019. A review of feature selection methods in medical applications. Comput Biol Med, 112: 103375.
  • Salehi F, Abbasi E, Hassibi B. 2019. The impact of regularization on high-dimensional logistic regression. Adv Neural Inform Proces Systems, 32: 1-11.
  • Shardlow M. 2016. An analysis of feature selection techniques. Univ Manchester, 1: 1-7.
  • Sugiyama M. 2015. Introduction to statistical machine learning. Morgan Kaufmann, New York, USA, pp: 425.
  • Tian Y, Zhang Y. 2022. A comprehensive survey on regularization strategies in machine learning. Informn Fusion, 80: 146-166.
  • Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Method, 58(1): 267-288.
  • Yousefi T, Varlıklar Ö. 2024. Breast Cancer Prediction with Hybrid Filter-Wrapper Feature Selection. Inter J Adv Nat Sci Engin Res, 8: 411-419.
  • Yousefi T, Varlılar Aktaş Ö. 2024. Predicting customer satisfaction with hybrid basic filter-based feature selection method. 4th International Artificial Intelligence and Data Science Congress, 14-15 March, Izmir, Türkiye, pp: 1-10.
  • Zheng A, Casari A. 2018. Feature engineering for machine learning: principles and techniques for data scientists. O'Reilly Media, Inc., London, UK, pp: 358.
  • Zhou H, Zhang J, Zhou Y, Guo X, Ma Y. 2021. A feature selection algorithm of decision tree based on feature weight. Expert Syst Applicat, 164: 113842.

An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification

Yıl 2024, , 1224 - 1231, 15.11.2024
https://doi.org/10.34248/bsengineering.1541950

Öz

Feature selection is a pivotal process in machine learning, essential for enhancing model performance by reducing dimensionality, improving generalization, and mitigating overfitting. By eliminating irrelevant or redundant features, simpler and more interpretable models are achieved, which generally perform better. In this study, we introduce an advanced hybrid method combining ensemble feature selection and regularization techniques, designed to optimize model accuracy while significantly reducing the number of features required. Applied to a customer satisfaction dataset, our method was first tested without feature selection, where the model achieved a ROC AUC value of 0.946 on the test set using all 369 features. However, after applying our proposed feature selection method, the model achieved a higher ROC AUC value of 0.954, utilizing only 12 key features and completing the task in approximately 43% less time. These findings demonstrate the effectiveness of our approach in producing a more efficient and superior-performing model.

Kaynakça

  • Azhagusundari B, Thanamani AS. 2013. Feature selection based on information gain. Inter J Innov Technol Explor Engin (IJITEE), 2(2): 18-21.
  • Biau G, Scornet E. 2016. A random forest guided tour. Test, 25: 197-227.
  • Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Comput Electr Engin, 40(1): 16-28.
  • Freeman C, Kulić D, Basir O. 2013. Feature-selected tree-based classification. IEEE Transact Cybernet, 43(6): 1990-2004.
  • Hasan MAM, Nasser M, Ahmad S, Molla KI. 2016. Feature selection for intrusion detection using random forest. J Inform Sec, 7(3): 129-140.
  • Hoerl AE, Kennard RW. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1): 55-67.
  • Hosmer Jr DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression, John Wiley & Sons, London, UK, pp: 254.
  • Hossin M, Sulaiman MN. 2015. A review on evaluation metrics for data classification evaluations. Inter J Data Dining Knowledge Manage Process, 5(2): 1-8.
  • Jimenez-del-Toro O, Otálora S, Andersson M, Eurén K, Hedlund M, Rousson M, Müller H, Atzori M. 2017. Analysis of histopathology images: From traditional machine learning to deep learning, Elsevier, New York, USA, pp: 135.
  • Kabir MM, Islam MM, Murase K. 2010. A new wrapper feature selection approach using neural network. Neurocomputing, 73(16-18): 3273-3283.
  • Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge Inform Systems, 12: 95-116.
  • Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. 2002. Logistic regression, Springer, USA, pp: 142.
  • Kohavi R, John GH. 1997. Wrappers for feature subset selection. Artificial Intel, 97(1-2): 273-324.
  • Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. 2017. Feature selection: A data perspective. ACM Comput Surveys, 50(6):1-45.
  • Liaw A, Wiener M. 2002. Classification and regression by random Forest. R news, 2(3): 18-22.
  • Luftensteiner S, Mayr M, Chasparis G. 2021. Filter-based feature selection methods for industrial sensor data: a review. International Conference on Big Data Analytics and Knowledge Discovery, Virtual Event, September 27–30, pp: 242-249.
  • McDonald GC. 2009. Ridge regression. Computa Stat, 1(1): 93-100.
  • Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA. 2009. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform, 10: 1-16.
  • Miao J, Niu L. 2016. A survey on feature selection. Procedia Comput Sci, 91: 919-926.
  • Moldovan D, Cioara T, Anghel I, Salomie I. 2017. Machine learning for sensor-based manufacturing processes. 13th IEEE international conference on intelligent computer communication and processing (ICCP), September 7-9, Cluj-Napoca, Romania, pp: 147-154.
  • Opitz D, Maclin R. 1999. Popular ensemble methods: An empirical study. J Artificial Intel Res, 11: 169-198.
  • Ramchandran A, Sangaiah AK. 2018. Unsupervised anomaly detection for high dimensional data—An exploratory analysis, Elsevier, New York, USA, pp: 254.
  • Ranstam J, Cook JA. 2018. LASSO regression. J British Surg, 105(10): 1348-1358.
  • Remeseiro B, Bolon-Canedo V. 2019. A review of feature selection methods in medical applications. Comput Biol Med, 112: 103375.
  • Salehi F, Abbasi E, Hassibi B. 2019. The impact of regularization on high-dimensional logistic regression. Adv Neural Inform Proces Systems, 32: 1-11.
  • Shardlow M. 2016. An analysis of feature selection techniques. Univ Manchester, 1: 1-7.
  • Sugiyama M. 2015. Introduction to statistical machine learning. Morgan Kaufmann, New York, USA, pp: 425.
  • Tian Y, Zhang Y. 2022. A comprehensive survey on regularization strategies in machine learning. Informn Fusion, 80: 146-166.
  • Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Method, 58(1): 267-288.
  • Yousefi T, Varlıklar Ö. 2024. Breast Cancer Prediction with Hybrid Filter-Wrapper Feature Selection. Inter J Adv Nat Sci Engin Res, 8: 411-419.
  • Yousefi T, Varlılar Aktaş Ö. 2024. Predicting customer satisfaction with hybrid basic filter-based feature selection method. 4th International Artificial Intelligence and Data Science Congress, 14-15 March, Izmir, Türkiye, pp: 1-10.
  • Zheng A, Casari A. 2018. Feature engineering for machine learning: principles and techniques for data scientists. O'Reilly Media, Inc., London, UK, pp: 358.
  • Zhou H, Zhang J, Zhou Y, Guo X, Ma Y. 2021. A feature selection algorithm of decision tree based on feature weight. Expert Syst Applicat, 164: 113842.
Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Karar Desteği ve Grup Destek Sistemleri, Bilgi Sistemleri (Diğer), Uygulamalı Matematik (Diğer)
Bölüm Research Articles
Yazarlar

Tohid Yousefi 0000-0003-4288-8194

Özlem Varlıklar 0000-0001-6415-0698

Mehmet Serhat Odabas 0000-0002-1863-7566

Yayımlanma Tarihi 15 Kasım 2024
Gönderilme Tarihi 1 Eylül 2024
Kabul Tarihi 16 Ekim 2024
Yayımlandığı Sayı Yıl 2024

Kaynak Göster

APA Yousefi, T., Varlıklar, Ö., & Odabas, M. S. (2024). An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. Black Sea Journal of Engineering and Science, 7(6), 1224-1231. https://doi.org/10.34248/bsengineering.1541950
AMA Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. Kasım 2024;7(6):1224-1231. doi:10.34248/bsengineering.1541950
Chicago Yousefi, Tohid, Özlem Varlıklar, ve Mehmet Serhat Odabas. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science 7, sy. 6 (Kasım 2024): 1224-31. https://doi.org/10.34248/bsengineering.1541950.
EndNote Yousefi T, Varlıklar Ö, Odabas MS (01 Kasım 2024) An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. Black Sea Journal of Engineering and Science 7 6 1224–1231.
IEEE T. Yousefi, Ö. Varlıklar, ve M. S. Odabas, “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”, BSJ Eng. Sci., c. 7, sy. 6, ss. 1224–1231, 2024, doi: 10.34248/bsengineering.1541950.
ISNAD Yousefi, Tohid vd. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science 7/6 (Kasım 2024), 1224-1231. https://doi.org/10.34248/bsengineering.1541950.
JAMA Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. 2024;7:1224–1231.
MLA Yousefi, Tohid vd. “An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification”. Black Sea Journal of Engineering and Science, c. 7, sy. 6, 2024, ss. 1224-31, doi:10.34248/bsengineering.1541950.
Vancouver Yousefi T, Varlıklar Ö, Odabas MS. An Improved Hybrid Model Based on Ensemble Features and Regularization Selection for Classification. BSJ Eng. Sci. 2024;7(6):1224-31.

                                                24890