Research Article
BibTex RIS Cite

Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi

Year 2025, Volume: 37 Issue: 4, 422 - 429, 23.12.2025
https://doi.org/10.7240/jeps.1785840

Abstract

Bu çalışmada yüksek boyutlu veri setlerinde boyut indirgemeyi ve indirgenen modellerin tahmin performansını artırmayı hedefleyen K-Ortalamalar kümeleme temelli bir özellik seçimi yöntemi önerilmektedir. Önerilen yöntemde her bir bağımsız değişken özellik olarak tanımlanmaktadır. Tanımlanan bu özellikler K-Ortalamalar kümeleme algoritmasıyla kümelenir, her kümeden kümeyi temsil düzeyi en yüksek olan özellik seçilerek hafızaya alınır. Sonraki adımda hafızaya alınan yani kümeleri temsil eden bu özellikler ile çok değişkenli doğrusal regresyon, Ridge regresyon ve LASSO regresyon yöntemleri kullanılarak regresyon modelleri oluşturulur. Gerçekleştirilen boyut indirgeme işlemi çoklu bağlantı sorununu azaltmaktadır. Ayrıca önerilen indirgenmiş çok değişkenli doğrusal regresyon modeli, indirgenmiş Ridge regresyon modeli ve indirgenmiş LASSO regresyon modeli, çok değişkenli regresyon yöntemiyle karşılaştırılmıştır. c Elde edilen bulgular, önerilen boyut indirgeme modellerinin yüksek boyutlu veri ortamlarında hem etkinlik hem de verimlilik açısından kayda değer performans sergilediğini kanıtlamaktadır.

Supporting Institution

Tübitak

Project Number

123F266

Thanks

Bu çalışma, Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (TÜBİTAK) tarafından 123F266 numaralı proje ile desteklenmiştir. Projeye verdiği destekten ötürü TÜBİTAK’a teşekkürlerimizi sunarız.

References

  • Farrar, D.E. ve Glauber, R.R. (1967). Multicollinearity in regression analysis: the problem revisited. The review of economic and statistics, 92-107.
  • Guyon, I. ve Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Dash, M. ve Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.
  • Liu, H. ve Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer science ve business media.
  • Saeys, Y., Inza, I. ve Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
  • Chandrashekar, G. ve Sahin, F. (2014). A survey on feature selection methods. Computers and electrical engineering, 40(1), 16-28.
  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. ve Liu, H. (2017). Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6), 1-45.
  • Khaire, U.M. ve Dhanalakshmi, R. (2022). Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Technical Review, 39(2), 286-300.
  • Yang, P., Huang, H. ve Liu, C. (2021). Feature selection revisited in the single-cell era. Genome Biology, 22(1), 321.
  • Kohavi, R. ve John, G.H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
  • Zou, H. ve Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
  • Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Bolón-Canedo, V., Sánchez-Maroño, N. ve Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and information systems, 34(3), 483-519.
  • Peng, H., Long, F. ve Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency. max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Quinlan, J.R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
  • Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
  • Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651-666.
  • Marill, K.A. (2004). Advanced statistics: linear regression, part II: multiple linear regression. Academic emergency medicine, 11(1), 94-102.
  • Hoerl, A.E. ve Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
  • Faraway, J. J. (2002). Practical regression and ANOVA using R (Vol. 168). Bath: University of Bath.
  • Chai, T. ve Draxler, R.R. (2014). Root mean square error (RMSE) or mean absolute error (MAE). Geoscientific model development discussions, 7(1), 1525-1534.
  • Hyndman, R.J. ve Koehler, A.B. (2006). Another look at measures of forecast accuracy, International journal of forecasting. 22(4), 679-688.
  • Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns. International journal of forecasting, 9(4), 527-529.
  • Tak, N. ve İnan, D. (2022). Type-1 fuzzy forecasting functions with elastic net regularization. Expert Systems with Applications, 199, 116916.
  • Belsley, D. A. (1991). A guide to using the collinearity diagnostics. Computer Science in Economics and Management, 4(1), 33–50.
  • Adkins, L. C. (2022). Weak identification in nonlinear econometric models. Southern University College of Business E-Journal, 17(3), 2.
  • Williams, G. J. (2009). Rattle: a data mining GUI for R.

The Effect of Feature Selection Based Approaches on Multivariate Regression Models

Year 2025, Volume: 37 Issue: 4, 422 - 429, 23.12.2025
https://doi.org/10.7240/jeps.1785840

Abstract

This study proposes a K-Means Cluster based feature selection method that aims to reduce the dimensionality of high-dimensional data sets and improve the prediction performance of the reduced models. In the proposed method, each independent variable is defined as a feature. These defined features are clustered using the K-Means algorithm, and the feature with the highest cluster representation level is selected from each cluster and stored in memory. In the next step, regression models are created using multivariate linear regression, Ridge regression, and LASSO regression methods with these features stored in memory, which represent the clusters. The dimension reduction process reduces the multicollinearity problem. Additionally, the proposed reduced multivariate linear regression model, reduced Ridge regression model, and reduced LASSO regression model were compared with the multivariate regression method. In comparison based on actual data, the reduced models showed an improvement of 10% to 38% over the unreduced model according to the OMYH criterion and an improvement of 8% to 50% according to the HKOK criterion. The findings demonstrate that the proposed dimension reduction models exhibit remarkable performance in terms of both effectiveness and efficiency in high-dimensional data environments.

Supporting Institution

Tübitak

Project Number

123F266

Thanks

This study was supported by Scientific and Technological Research Council of T ü rk i y e (TÜBİTAK) 123F266 Grant Number. The authors thank TUBIT A K for their support

References

  • Farrar, D.E. ve Glauber, R.R. (1967). Multicollinearity in regression analysis: the problem revisited. The review of economic and statistics, 92-107.
  • Guyon, I. ve Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Dash, M. ve Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.
  • Liu, H. ve Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer science ve business media.
  • Saeys, Y., Inza, I. ve Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
  • Chandrashekar, G. ve Sahin, F. (2014). A survey on feature selection methods. Computers and electrical engineering, 40(1), 16-28.
  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. ve Liu, H. (2017). Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6), 1-45.
  • Khaire, U.M. ve Dhanalakshmi, R. (2022). Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Technical Review, 39(2), 286-300.
  • Yang, P., Huang, H. ve Liu, C. (2021). Feature selection revisited in the single-cell era. Genome Biology, 22(1), 321.
  • Kohavi, R. ve John, G.H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
  • Zou, H. ve Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
  • Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Bolón-Canedo, V., Sánchez-Maroño, N. ve Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and information systems, 34(3), 483-519.
  • Peng, H., Long, F. ve Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency. max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Quinlan, J.R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
  • Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
  • Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651-666.
  • Marill, K.A. (2004). Advanced statistics: linear regression, part II: multiple linear regression. Academic emergency medicine, 11(1), 94-102.
  • Hoerl, A.E. ve Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
  • Faraway, J. J. (2002). Practical regression and ANOVA using R (Vol. 168). Bath: University of Bath.
  • Chai, T. ve Draxler, R.R. (2014). Root mean square error (RMSE) or mean absolute error (MAE). Geoscientific model development discussions, 7(1), 1525-1534.
  • Hyndman, R.J. ve Koehler, A.B. (2006). Another look at measures of forecast accuracy, International journal of forecasting. 22(4), 679-688.
  • Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns. International journal of forecasting, 9(4), 527-529.
  • Tak, N. ve İnan, D. (2022). Type-1 fuzzy forecasting functions with elastic net regularization. Expert Systems with Applications, 199, 116916.
  • Belsley, D. A. (1991). A guide to using the collinearity diagnostics. Computer Science in Economics and Management, 4(1), 33–50.
  • Adkins, L. C. (2022). Weak identification in nonlinear econometric models. Southern University College of Business E-Journal, 17(3), 2.
  • Williams, G. J. (2009). Rattle: a data mining GUI for R.
There are 30 citations in total.

Details

Primary Language Turkish
Subjects Soft Computing
Journal Section Research Article
Authors

Ramazan Akman 0009-0006-7021-3185

Nihat Tak 0000-0001-8796-5101

Project Number 123F266
Submission Date September 17, 2025
Acceptance Date November 29, 2025
Publication Date December 23, 2025
Published in Issue Year 2025 Volume: 37 Issue: 4

Cite

APA Akman, R., & Tak, N. (2025). Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. International Journal of Advances in Engineering and Pure Sciences, 37(4), 422-429. https://doi.org/10.7240/jeps.1785840
AMA Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. December 2025;37(4):422-429. doi:10.7240/jeps.1785840
Chicago Akman, Ramazan, and Nihat Tak. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences 37, no. 4 (December 2025): 422-29. https://doi.org/10.7240/jeps.1785840.
EndNote Akman R, Tak N (December 1, 2025) Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. International Journal of Advances in Engineering and Pure Sciences 37 4 422–429.
IEEE R. Akman and N. Tak, “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”, JEPS, vol. 37, no. 4, pp. 422–429, 2025, doi: 10.7240/jeps.1785840.
ISNAD Akman, Ramazan - Tak, Nihat. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences 37/4 (December2025), 422-429. https://doi.org/10.7240/jeps.1785840.
JAMA Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. 2025;37:422–429.
MLA Akman, Ramazan and Nihat Tak. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences, vol. 37, no. 4, 2025, pp. 422-9, doi:10.7240/jeps.1785840.
Vancouver Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. 2025;37(4):422-9.