Araştırma Makalesi
BibTex RIS Kaynak Göster

Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi

Yıl 2025, Cilt: 37 Sayı: 4, 422 - 429, 23.12.2025
https://doi.org/10.7240/jeps.1785840

Öz

Bu çalışmada yüksek boyutlu veri setlerinde boyut indirgemeyi ve indirgenen modellerin tahmin performansını artırmayı hedefleyen K-Ortalamalar kümeleme temelli bir özellik seçimi yöntemi önerilmektedir. Önerilen yöntemde her bir bağımsız değişken özellik olarak tanımlanmaktadır. Tanımlanan bu özellikler K-Ortalamalar kümeleme algoritmasıyla kümelenir, her kümeden kümeyi temsil düzeyi en yüksek olan özellik seçilerek hafızaya alınır. Sonraki adımda hafızaya alınan yani kümeleri temsil eden bu özellikler ile çok değişkenli doğrusal regresyon, Ridge regresyon ve LASSO regresyon yöntemleri kullanılarak regresyon modelleri oluşturulur. Gerçekleştirilen boyut indirgeme işlemi çoklu bağlantı sorununu azaltmaktadır. Ayrıca önerilen indirgenmiş çok değişkenli doğrusal regresyon modeli, indirgenmiş Ridge regresyon modeli ve indirgenmiş LASSO regresyon modeli, çok değişkenli regresyon yöntemiyle karşılaştırılmıştır. c Elde edilen bulgular, önerilen boyut indirgeme modellerinin yüksek boyutlu veri ortamlarında hem etkinlik hem de verimlilik açısından kayda değer performans sergilediğini kanıtlamaktadır.

Destekleyen Kurum

Tübitak

Proje Numarası

123F266

Teşekkür

Bu çalışma, Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (TÜBİTAK) tarafından 123F266 numaralı proje ile desteklenmiştir. Projeye verdiği destekten ötürü TÜBİTAK’a teşekkürlerimizi sunarız.

Kaynakça

  • Farrar, D.E. ve Glauber, R.R. (1967). Multicollinearity in regression analysis: the problem revisited. The review of economic and statistics, 92-107.
  • Guyon, I. ve Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Dash, M. ve Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.
  • Liu, H. ve Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer science ve business media.
  • Saeys, Y., Inza, I. ve Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
  • Chandrashekar, G. ve Sahin, F. (2014). A survey on feature selection methods. Computers and electrical engineering, 40(1), 16-28.
  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. ve Liu, H. (2017). Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6), 1-45.
  • Khaire, U.M. ve Dhanalakshmi, R. (2022). Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Technical Review, 39(2), 286-300.
  • Yang, P., Huang, H. ve Liu, C. (2021). Feature selection revisited in the single-cell era. Genome Biology, 22(1), 321.
  • Kohavi, R. ve John, G.H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
  • Zou, H. ve Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
  • Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Bolón-Canedo, V., Sánchez-Maroño, N. ve Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and information systems, 34(3), 483-519.
  • Peng, H., Long, F. ve Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency. max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Quinlan, J.R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
  • Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
  • Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651-666.
  • Marill, K.A. (2004). Advanced statistics: linear regression, part II: multiple linear regression. Academic emergency medicine, 11(1), 94-102.
  • Hoerl, A.E. ve Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
  • Faraway, J. J. (2002). Practical regression and ANOVA using R (Vol. 168). Bath: University of Bath.
  • Chai, T. ve Draxler, R.R. (2014). Root mean square error (RMSE) or mean absolute error (MAE). Geoscientific model development discussions, 7(1), 1525-1534.
  • Hyndman, R.J. ve Koehler, A.B. (2006). Another look at measures of forecast accuracy, International journal of forecasting. 22(4), 679-688.
  • Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns. International journal of forecasting, 9(4), 527-529.
  • Tak, N. ve İnan, D. (2022). Type-1 fuzzy forecasting functions with elastic net regularization. Expert Systems with Applications, 199, 116916.
  • Belsley, D. A. (1991). A guide to using the collinearity diagnostics. Computer Science in Economics and Management, 4(1), 33–50.
  • Adkins, L. C. (2022). Weak identification in nonlinear econometric models. Southern University College of Business E-Journal, 17(3), 2.
  • Williams, G. J. (2009). Rattle: a data mining GUI for R.

The Effect of Feature Selection Based Approaches on Multivariate Regression Models

Yıl 2025, Cilt: 37 Sayı: 4, 422 - 429, 23.12.2025
https://doi.org/10.7240/jeps.1785840

Öz

This study proposes a K-Means Cluster based feature selection method that aims to reduce the dimensionality of high-dimensional data sets and improve the prediction performance of the reduced models. In the proposed method, each independent variable is defined as a feature. These defined features are clustered using the K-Means algorithm, and the feature with the highest cluster representation level is selected from each cluster and stored in memory. In the next step, regression models are created using multivariate linear regression, Ridge regression, and LASSO regression methods with these features stored in memory, which represent the clusters. The dimension reduction process reduces the multicollinearity problem. Additionally, the proposed reduced multivariate linear regression model, reduced Ridge regression model, and reduced LASSO regression model were compared with the multivariate regression method. In comparison based on actual data, the reduced models showed an improvement of 10% to 38% over the unreduced model according to the OMYH criterion and an improvement of 8% to 50% according to the HKOK criterion. The findings demonstrate that the proposed dimension reduction models exhibit remarkable performance in terms of both effectiveness and efficiency in high-dimensional data environments.

Destekleyen Kurum

Tübitak

Proje Numarası

123F266

Teşekkür

This study was supported by Scientific and Technological Research Council of T ü rk i y e (TÜBİTAK) 123F266 Grant Number. The authors thank TUBIT A K for their support

Kaynakça

  • Farrar, D.E. ve Glauber, R.R. (1967). Multicollinearity in regression analysis: the problem revisited. The review of economic and statistics, 92-107.
  • Guyon, I. ve Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Dash, M. ve Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.
  • Liu, H. ve Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer science ve business media.
  • Saeys, Y., Inza, I. ve Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
  • Chandrashekar, G. ve Sahin, F. (2014). A survey on feature selection methods. Computers and electrical engineering, 40(1), 16-28.
  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. ve Liu, H. (2017). Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6), 1-45.
  • Khaire, U.M. ve Dhanalakshmi, R. (2022). Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Technical Review, 39(2), 286-300.
  • Yang, P., Huang, H. ve Liu, C. (2021). Feature selection revisited in the single-cell era. Genome Biology, 22(1), 321.
  • Kohavi, R. ve John, G.H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
  • Zou, H. ve Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
  • Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Bolón-Canedo, V., Sánchez-Maroño, N. ve Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and information systems, 34(3), 483-519.
  • Peng, H., Long, F. ve Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency. max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Quinlan, J.R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
  • Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
  • Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651-666.
  • Marill, K.A. (2004). Advanced statistics: linear regression, part II: multiple linear regression. Academic emergency medicine, 11(1), 94-102.
  • Hoerl, A.E. ve Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
  • Faraway, J. J. (2002). Practical regression and ANOVA using R (Vol. 168). Bath: University of Bath.
  • Chai, T. ve Draxler, R.R. (2014). Root mean square error (RMSE) or mean absolute error (MAE). Geoscientific model development discussions, 7(1), 1525-1534.
  • Hyndman, R.J. ve Koehler, A.B. (2006). Another look at measures of forecast accuracy, International journal of forecasting. 22(4), 679-688.
  • Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns. International journal of forecasting, 9(4), 527-529.
  • Tak, N. ve İnan, D. (2022). Type-1 fuzzy forecasting functions with elastic net regularization. Expert Systems with Applications, 199, 116916.
  • Belsley, D. A. (1991). A guide to using the collinearity diagnostics. Computer Science in Economics and Management, 4(1), 33–50.
  • Adkins, L. C. (2022). Weak identification in nonlinear econometric models. Southern University College of Business E-Journal, 17(3), 2.
  • Williams, G. J. (2009). Rattle: a data mining GUI for R.
Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Esnek Hesaplama
Bölüm Araştırma Makalesi
Yazarlar

Ramazan Akman 0009-0006-7021-3185

Nihat Tak 0000-0001-8796-5101

Proje Numarası 123F266
Gönderilme Tarihi 17 Eylül 2025
Kabul Tarihi 29 Kasım 2025
Yayımlanma Tarihi 23 Aralık 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 37 Sayı: 4

Kaynak Göster

APA Akman, R., & Tak, N. (2025). Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. International Journal of Advances in Engineering and Pure Sciences, 37(4), 422-429. https://doi.org/10.7240/jeps.1785840
AMA Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. Aralık 2025;37(4):422-429. doi:10.7240/jeps.1785840
Chicago Akman, Ramazan, ve Nihat Tak. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences 37, sy. 4 (Aralık 2025): 422-29. https://doi.org/10.7240/jeps.1785840.
EndNote Akman R, Tak N (01 Aralık 2025) Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. International Journal of Advances in Engineering and Pure Sciences 37 4 422–429.
IEEE R. Akman ve N. Tak, “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”, JEPS, c. 37, sy. 4, ss. 422–429, 2025, doi: 10.7240/jeps.1785840.
ISNAD Akman, Ramazan - Tak, Nihat. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences 37/4 (Aralık2025), 422-429. https://doi.org/10.7240/jeps.1785840.
JAMA Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. 2025;37:422–429.
MLA Akman, Ramazan ve Nihat Tak. “Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi”. International Journal of Advances in Engineering and Pure Sciences, c. 37, sy. 4, 2025, ss. 422-9, doi:10.7240/jeps.1785840.
Vancouver Akman R, Tak N. Özellik Seçimi Temelli Yaklaşımların Çok Değişkenli Regresyon Modellerine Etkisi. JEPS. 2025;37(4):422-9.