Araştırma Makalesi
BibTex RIS Kaynak Göster

Rigde-Robust-Boosting Topluluk Regresyon Yaklaşımı

Yıl 2024, Cilt: 17 Sayı: 2, 30 - 45

Öz

Bu çalışmada, regresyon analizinde sıkça karşılaşılan iki önemli olan sorun, çoklu doğrusal bağlantı ve aykırı değerlerin etkileri incelenmiştir. Çoklu doğrusal bağlantı, regresyon modelindeki bağımsız değişkenler arasındaki yüksek düzeyde ilişkiyi ifade etmektedir. Bu durum, katsayıların tahmin edilmesini zorlaştırabilir ve tahmin edilen katsayıların güvenilirliğini azaltabilir. Aykırı değerler ise genel eğilimi etkileyebilir ve sonuçları yanıltabilir. Çalışmanın amacı, bu iki sorunu aynı anda ele alabilen bir regresyon modeli geliştirmektir. Önerilen Ridge-Robust-Boosting Topluluk Regresyon modeli, bu sorunlara çözüm sunmak üzere tasarlanmıştır. Bu model, Ridge regresyonunu kullanarak çoklu doğrusal bağlantıyı azaltmakta ve böylece bağımsız değişkenler arasındaki korelasyonu dengelemektedir. Ayrıca, Robust regresyonu kullanarak aykırı değerlere karşı dirençli olmayı hedeflemektedir. Bu sayede, nadir ancak etkili gözlemlerin tahminler üzerindeki etkisini azaltmaktadır. Ayrıca, Boosting yöntemlerini kullanarak tahmin performansını artırmayı hedeflemektedir. Çalışmanın amacı kapsamında yapılan simülasyon çalışması, 1000 gözlemden oluşan rastgele bir veri setine farklı çoklu doğrusallık ve aykırı değer eklenerek model performanslarının değerlendirilmesini içermektedir. Elde edilen sonuçlar, önerilen Ridge-Robust-Boosting Topluluk Regresyon modelinin, farklı çoklu doğrusallık düzeyleri ve aykırı değer oranlarına sahip veri setlerinde diğer regresyon modellerine göre daha üstün bir performans sergilediğini göstermektedir. Bu durum önerilen modelin genel olarak daha güvenilir ve esnek bir çözüm olduğunu ortaya koymaktadır.

Kaynakça

  • [1] P. Huber, 1981, Robust Statistics. Wiley, New York.
  • [2] D. N. Gujarati, 2004, Basic Econometrics (4th ed.). McGraw-Hill Companies.
  • [3] A. E. Hoerl, R. W. Kennard, 1970, Ridge regression biased estimation for nonorthogonal problems. Technometrics, 12, 55-67. https://doi.org/10.2307/1271436.
  • [4] M. J. Silvapulle, 1991, Robust ridge regression based on an M-estimator. Australian Journal of Statistics, 33(3), 319-333. https://doi.org/10.1111/j.1467-842X.1991.tb00438.x.
  • [5] K. Liu, 1993, A new class of biased estimate in linear regression. Communications in Statistics-Theory and Methods, 22, 393-402.http://dx.doi.org/10.1080/03610929308831027.
  • [6] A. Arslan, N. Billor, 2000, Robust Liu estimator for regression based on an M-estimator, Journal of Applied Statistics, 27(1), 39-47, https://doi.org/10.1080/02664760021817.
  • [7] M. R. Özkale, S. Kaçıranlar, 2007, The restricted and unrestricted two-parameter estimators. Communications in Statistics-Theory and Methods, 36(15), 2707-2725. https://doi.org/10.1080/03610920701386877.
  • [8] G. Khalaf, G. Shukur, 2005. Choosing ridge parameter for regression problems, Communications in Statistics-Theory and Methods, 34 (5), 1177-1182. https://doi.org/10.1081/STA-200056836.
  • [9] M. A., Alkhamisi, G. Khalaf, G. Shukur, 2006. Some modifications for choosing ridge parameters, Communications in Statistics-Theory and Methods, 35(11), 2005-2020. https://doi.org/10.1080/03610920600762905.
  • [10] Y. M., Al-Hassan, 2010. Performance of a new ridge regression estimator, Journal of the Association of Arab Universities for Basic and Applied Sciences, 9(1), 23-26. https://doi.org/10.1016/j.jaubas.2010.12.006.
  • [11] R. R., Hocking, F. M., Speed, M. J. Lynn, 1976. A class of biased estimators in linear regression, Technometrics, 18(4), 425-437. https://doi.org/10.1080/00401706.1976.10489474.
  • [12] M. A. Alkhamisi, G. Shukur, 2007. A Monte Carlo study of recent ridge parameters, Communications in Statistics-Simulation and Computation, 36(3), 535-547. https://doi.org/10.1080/03610910701208619.
  • [13] G. Muniz, B. M. G. Kibria, K. Mansson, G. Shukur, 2012. On developing ridge regression parameters: a graphical investigation, Statistics and Operations Research Transactions, 36(2), 115-138.
  • [14] Y., Asar, A. Karaibrahimoğlu, A. Genç, 2014. Modified ridge regression parameters: a comparative Monte Carlo study, Hacettepe Journal of Mathematics and Statistics, 43(5), 827-841.
  • [15] A. V. Dorugade, 2016. Improved ridge estimator in linear regression with multicollinearity, heteroscedastic errors and outliers, Journal of Modern Applied Statistical Methods, 15 (2): 362-381. https://doi.org/10.56801/10.56801/v15.i.856.
  • [16] A. F. Lukman, A. Olatunji, 2018. Newly proposed estimator for ridge parameter: an application to the Nigerian economy, Pakistan Journal of Statistics, 34(2), 91-98.
  • [17] S. Bhat, 2019. Performance of a weighted ridge estimator, International Journal of Agricultural and Statistical Sciences, 15(1), 347-354.
  • [18] M.N. Lattef, M. I. Alheety, 2020. Study of some kinds of ridge regression estimators in linear regression model, Tikrit Journal of Pure Science, 25(5), 130-142. https://doi.org/10.25130/tjps.v25i5.301.
  • [19] M. Qasim, K. Mansson, B. M. G. Kibria, 2021. On some beta ridge regression estimators: method, simulation and application, Journal of Statistical Computation and Simulation, 91(9), 1699-1712. https://doi.org/10.1080/00949655.2020.1867549.
  • [20] A. Irandoukht, 2021. Optimum ridge regression parameter using R-squared of prediction as a criterion for regression analysis, Journal of Statistical Theory and Applications, 20(2), 242- 250. https://doi.org/10.2991/jsta.d.210322.001.
  • [21] G. Khalaf, 2022. Improving the ordinary least squares estimator by ridge regression, Open Access Library Journal, 9(5), 1-8.
  • [22] M. Shabbir, S. Chand, F. Iqbal, 2023. A new ridge estimator for linear regression model with some challenging behavior of error term, Communications in Statistics-Simulation and Computation, 1-11. https://doi.org/10.1080/03610918.2023.2186874.
  • [23] N. Shaheen, I. Shah, A. Almohaimeed, S. Ali, H. N. Alqifari, 2023. Some modified ridge estimators for handling the multicollinearity problem, Mathematics, 11(11), 2522. https://doi.org/10.3390/math11112522.
  • [24] A. Han, 2023, Çoklu doğrusal bağlantı ve aykırı değer sorunu için Ridge-Robust-Boosting Topluluk Regresyon yaklaşımı. Yayınlanmamış Doktora Tezi, İnönü Üniversitesi Sosyal Bilimler Enstitüsü, Malatya, Türkiye.
  • [25] H. Zou, 2020, Comment: Ridge regression-still inspiring after 50 years. Technometrics, 62(4), 456-458. https://doi.org/10.1080/00401706.2020.180125.
  • [26] C. Aktaş, V. Yılmaz, 2003, Çoklu bağıntılı modellerde Liu ve Ridge regresyon kestiricilerinin karşılaştırılması. Anadolu Üniversitesi Bilim ve Teknoloji Dergisi, 4(2), 189-194.
  • [27] Y. Li, G. R. Arce, 2004, A maximum likelihood approach to least absolute deviation regression. EURASIP Journal on Advances in Signal Processing, 1-8. https://doi.org/10.1155/S1110865704401139.
  • [28] H. Theil, 1950, A rank-invariant method of linear and polynomial regression analysis. Proceedings of the Koninklijke Nederlandse Akademie Wetenschappen, 53, 386-392, 521-525, 1397-1412.
  • [29] P. K. Sen, 1968, Estimates of the regression coefficient based on Kendall’s Tau. Journal of the American Statistical Association, 63(324), 1379-1389. https://doi.org/10.2307/2285891.
  • [30] P. J. Rousseeuw, 1984, Least median of squares regression. Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105.
  • [31] L. Öztürk, 2003, Doğrusal regresyonda sağlam kestirim yöntemleri ve karşılaştırılmaları. Yayınlanmamış Doktora Tezi. Mimar Sinan Üniversitesi, İstanbul.
  • [32] H. Türkay, 2004, Doğrusal regresyon modellerinin robust (dayanıklı) yöntemlerle tahmini ve karşılaştırmalı uygulamaları. Yayınlanmamış Doktora Tezi, İstanbul Üniversitesi, İstanbul.
  • [33] P. J. Huber, 1973, Robust regression: Asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1, 799-821. http://dx.doi.org/10.1214/aos/1176342503.
  • [34] R. R. Wilcox, 2017, Introduction to Robust Estimation and Hypothesis Testing (4th ed.). Academic Press.
  • [35] K. V. Mardia, J. T. Kent, J. M. Bibby, 1979. Multivariate analysis. Academic Press, London.
  • [36] L. Breiman, 1996, Bagging predictors. Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655.
  • [37] J. H. Friedman, 2002, Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2.
  • [38] A. Natekin, A. Knoll, 2013, Gradient boosting machines, a tutorial. Frontiers Neurorobot, 7(21), 1-21. https://doi.org/10.3389/fnbot.2013.00021.
  • [39] R. Caruana, A. Niculescu-Mizil, G. Crew, A. Ksikes, 2004, Ensemble selection from libraries of models. In Proceedings of the twenty-first international conference on Machine learning (ICML ‘04). Association for Computing Machinery, New York, USA. https://doi.org/10.1145/1015330.1015432.
  • [40] J. H. Friedman, 2001, Greedy function approximation: A gradient boosting machine, Annals Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451.
  • [41] B. V. Dasarathy, B. V. Sheela, 1979, A composite classifier system design: Concepts and methodology. IEEE Xplore, 67(5), 708-713. Doi: 10.1109/PROC.1979.11321.
  • [42] L. K. Hansen, P. Salamon, 1990, Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001. http://dx.doi.org/10.1109/34.58871.
  • [43] R. E. Schapire, 1990, The strength of weak learnability. Machine Learning, 5(2), 197–227. https://doi.org/10.1007/BF00116037.
  • [44] H. Liu, A. Gegov, M. Cocea, 2016, Ensemble learning approaches. In Rule Based Systems for Big Data: A Machine Learning Approach. Switzerland: Springer, 63-73. https://doi.org/10.1007/978-3-319-23696-4_6.
  • [45] I. D. Mienye, Y. Sun, 2022, A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149. https://doi.org/10.1109/access.2022.3207287.
  • [46] S. Geman, D. Geman, 1984, Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6, 721-741. Doi: 10.1109/TPAMI.1984.4767596.

Ridge-Robust-Boosting Ensemble Regression Approach

Yıl 2024, Cilt: 17 Sayı: 2, 30 - 45

Öz

In this study, two important problems frequently encountered in regression analysis, multicollinearity and the effects of outliers, are examined. Multicollinearity refers to the high level of correlation between independent variables in the regression model. This may make it difficult to estimate the coefficients and reduce the reliability of the estimated coefficients. Outliers may affect the general trend and mislead the results. The aim of this study is to develop a regression model that can address these two problems simultaneously. The proposed Ridge-Robust-Boosting Ensemble Regression model is designed to provide a solution to these problems. This model reduces multicollinearity using Ridge regression and thus stabilizes the correlation between independent variables. It also aims to be robust to outliers by using robust regression. In this way, it reduces the impact of rare but influential observations on forecasts. Moreover, it aims to improve the forecasting performance by using boosting methods. The simulation study conducted for the purpose of the study includes the evaluation of model performances by adding different multicollinearity levels and outlier ratios to a random data set consisting of 1000 observations. The results obtained show that the proposed Ridge-Robust-Boosting Ensemble Regression model outperforms other regression models in data sets with different multicollinearity levels and outlier rates. This suggests that the proposed model is a more reliable and flexible solution in general.

Kaynakça

  • [1] P. Huber, 1981, Robust Statistics. Wiley, New York.
  • [2] D. N. Gujarati, 2004, Basic Econometrics (4th ed.). McGraw-Hill Companies.
  • [3] A. E. Hoerl, R. W. Kennard, 1970, Ridge regression biased estimation for nonorthogonal problems. Technometrics, 12, 55-67. https://doi.org/10.2307/1271436.
  • [4] M. J. Silvapulle, 1991, Robust ridge regression based on an M-estimator. Australian Journal of Statistics, 33(3), 319-333. https://doi.org/10.1111/j.1467-842X.1991.tb00438.x.
  • [5] K. Liu, 1993, A new class of biased estimate in linear regression. Communications in Statistics-Theory and Methods, 22, 393-402.http://dx.doi.org/10.1080/03610929308831027.
  • [6] A. Arslan, N. Billor, 2000, Robust Liu estimator for regression based on an M-estimator, Journal of Applied Statistics, 27(1), 39-47, https://doi.org/10.1080/02664760021817.
  • [7] M. R. Özkale, S. Kaçıranlar, 2007, The restricted and unrestricted two-parameter estimators. Communications in Statistics-Theory and Methods, 36(15), 2707-2725. https://doi.org/10.1080/03610920701386877.
  • [8] G. Khalaf, G. Shukur, 2005. Choosing ridge parameter for regression problems, Communications in Statistics-Theory and Methods, 34 (5), 1177-1182. https://doi.org/10.1081/STA-200056836.
  • [9] M. A., Alkhamisi, G. Khalaf, G. Shukur, 2006. Some modifications for choosing ridge parameters, Communications in Statistics-Theory and Methods, 35(11), 2005-2020. https://doi.org/10.1080/03610920600762905.
  • [10] Y. M., Al-Hassan, 2010. Performance of a new ridge regression estimator, Journal of the Association of Arab Universities for Basic and Applied Sciences, 9(1), 23-26. https://doi.org/10.1016/j.jaubas.2010.12.006.
  • [11] R. R., Hocking, F. M., Speed, M. J. Lynn, 1976. A class of biased estimators in linear regression, Technometrics, 18(4), 425-437. https://doi.org/10.1080/00401706.1976.10489474.
  • [12] M. A. Alkhamisi, G. Shukur, 2007. A Monte Carlo study of recent ridge parameters, Communications in Statistics-Simulation and Computation, 36(3), 535-547. https://doi.org/10.1080/03610910701208619.
  • [13] G. Muniz, B. M. G. Kibria, K. Mansson, G. Shukur, 2012. On developing ridge regression parameters: a graphical investigation, Statistics and Operations Research Transactions, 36(2), 115-138.
  • [14] Y., Asar, A. Karaibrahimoğlu, A. Genç, 2014. Modified ridge regression parameters: a comparative Monte Carlo study, Hacettepe Journal of Mathematics and Statistics, 43(5), 827-841.
  • [15] A. V. Dorugade, 2016. Improved ridge estimator in linear regression with multicollinearity, heteroscedastic errors and outliers, Journal of Modern Applied Statistical Methods, 15 (2): 362-381. https://doi.org/10.56801/10.56801/v15.i.856.
  • [16] A. F. Lukman, A. Olatunji, 2018. Newly proposed estimator for ridge parameter: an application to the Nigerian economy, Pakistan Journal of Statistics, 34(2), 91-98.
  • [17] S. Bhat, 2019. Performance of a weighted ridge estimator, International Journal of Agricultural and Statistical Sciences, 15(1), 347-354.
  • [18] M.N. Lattef, M. I. Alheety, 2020. Study of some kinds of ridge regression estimators in linear regression model, Tikrit Journal of Pure Science, 25(5), 130-142. https://doi.org/10.25130/tjps.v25i5.301.
  • [19] M. Qasim, K. Mansson, B. M. G. Kibria, 2021. On some beta ridge regression estimators: method, simulation and application, Journal of Statistical Computation and Simulation, 91(9), 1699-1712. https://doi.org/10.1080/00949655.2020.1867549.
  • [20] A. Irandoukht, 2021. Optimum ridge regression parameter using R-squared of prediction as a criterion for regression analysis, Journal of Statistical Theory and Applications, 20(2), 242- 250. https://doi.org/10.2991/jsta.d.210322.001.
  • [21] G. Khalaf, 2022. Improving the ordinary least squares estimator by ridge regression, Open Access Library Journal, 9(5), 1-8.
  • [22] M. Shabbir, S. Chand, F. Iqbal, 2023. A new ridge estimator for linear regression model with some challenging behavior of error term, Communications in Statistics-Simulation and Computation, 1-11. https://doi.org/10.1080/03610918.2023.2186874.
  • [23] N. Shaheen, I. Shah, A. Almohaimeed, S. Ali, H. N. Alqifari, 2023. Some modified ridge estimators for handling the multicollinearity problem, Mathematics, 11(11), 2522. https://doi.org/10.3390/math11112522.
  • [24] A. Han, 2023, Çoklu doğrusal bağlantı ve aykırı değer sorunu için Ridge-Robust-Boosting Topluluk Regresyon yaklaşımı. Yayınlanmamış Doktora Tezi, İnönü Üniversitesi Sosyal Bilimler Enstitüsü, Malatya, Türkiye.
  • [25] H. Zou, 2020, Comment: Ridge regression-still inspiring after 50 years. Technometrics, 62(4), 456-458. https://doi.org/10.1080/00401706.2020.180125.
  • [26] C. Aktaş, V. Yılmaz, 2003, Çoklu bağıntılı modellerde Liu ve Ridge regresyon kestiricilerinin karşılaştırılması. Anadolu Üniversitesi Bilim ve Teknoloji Dergisi, 4(2), 189-194.
  • [27] Y. Li, G. R. Arce, 2004, A maximum likelihood approach to least absolute deviation regression. EURASIP Journal on Advances in Signal Processing, 1-8. https://doi.org/10.1155/S1110865704401139.
  • [28] H. Theil, 1950, A rank-invariant method of linear and polynomial regression analysis. Proceedings of the Koninklijke Nederlandse Akademie Wetenschappen, 53, 386-392, 521-525, 1397-1412.
  • [29] P. K. Sen, 1968, Estimates of the regression coefficient based on Kendall’s Tau. Journal of the American Statistical Association, 63(324), 1379-1389. https://doi.org/10.2307/2285891.
  • [30] P. J. Rousseeuw, 1984, Least median of squares regression. Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105.
  • [31] L. Öztürk, 2003, Doğrusal regresyonda sağlam kestirim yöntemleri ve karşılaştırılmaları. Yayınlanmamış Doktora Tezi. Mimar Sinan Üniversitesi, İstanbul.
  • [32] H. Türkay, 2004, Doğrusal regresyon modellerinin robust (dayanıklı) yöntemlerle tahmini ve karşılaştırmalı uygulamaları. Yayınlanmamış Doktora Tezi, İstanbul Üniversitesi, İstanbul.
  • [33] P. J. Huber, 1973, Robust regression: Asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1, 799-821. http://dx.doi.org/10.1214/aos/1176342503.
  • [34] R. R. Wilcox, 2017, Introduction to Robust Estimation and Hypothesis Testing (4th ed.). Academic Press.
  • [35] K. V. Mardia, J. T. Kent, J. M. Bibby, 1979. Multivariate analysis. Academic Press, London.
  • [36] L. Breiman, 1996, Bagging predictors. Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655.
  • [37] J. H. Friedman, 2002, Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2.
  • [38] A. Natekin, A. Knoll, 2013, Gradient boosting machines, a tutorial. Frontiers Neurorobot, 7(21), 1-21. https://doi.org/10.3389/fnbot.2013.00021.
  • [39] R. Caruana, A. Niculescu-Mizil, G. Crew, A. Ksikes, 2004, Ensemble selection from libraries of models. In Proceedings of the twenty-first international conference on Machine learning (ICML ‘04). Association for Computing Machinery, New York, USA. https://doi.org/10.1145/1015330.1015432.
  • [40] J. H. Friedman, 2001, Greedy function approximation: A gradient boosting machine, Annals Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451.
  • [41] B. V. Dasarathy, B. V. Sheela, 1979, A composite classifier system design: Concepts and methodology. IEEE Xplore, 67(5), 708-713. Doi: 10.1109/PROC.1979.11321.
  • [42] L. K. Hansen, P. Salamon, 1990, Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001. http://dx.doi.org/10.1109/34.58871.
  • [43] R. E. Schapire, 1990, The strength of weak learnability. Machine Learning, 5(2), 197–227. https://doi.org/10.1007/BF00116037.
  • [44] H. Liu, A. Gegov, M. Cocea, 2016, Ensemble learning approaches. In Rule Based Systems for Big Data: A Machine Learning Approach. Switzerland: Springer, 63-73. https://doi.org/10.1007/978-3-319-23696-4_6.
  • [45] I. D. Mienye, Y. Sun, 2022, A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149. https://doi.org/10.1109/access.2022.3207287.
  • [46] S. Geman, D. Geman, 1984, Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6, 721-741. Doi: 10.1109/TPAMI.1984.4767596.
Toplam 46 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Büyük ve Karmaşık Veri Teorisi, İstatistiksel Analiz, İstatistiksel Teori, İstatistiksel Veri Bilimi
Bölüm Makaleler
Yazarlar

Ayşegül Han 0000-0002-3390-2129

Mehmet Güngör 0000-0001-6869-4043

Erken Görünüm Tarihi 25 Aralık 2024
Yayımlanma Tarihi
Gönderilme Tarihi 9 Temmuz 2024
Kabul Tarihi 16 Ekim 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 17 Sayı: 2

Kaynak Göster

IEEE A. Han ve M. Güngör, “Rigde-Robust-Boosting Topluluk Regresyon Yaklaşımı”, JSSA, c. 17, sy. 2, ss. 30–45, 2024.