Araştırma Makalesi
BibTex RIS Kaynak Göster

A genetic algorithm for robust regression in linear models

Yıl 2025, Cilt: 18 Sayı: 1, 1 - 15, 29.06.2025

Öz

Outliers negatively affect the parameter estimate. Therefore, observation values can be weighted to minimize the negative impact of outliers on the parameter estimate. In this study, a robust method is proposed in which observation values are weighted with Genetic Algorithm (GA), which can be used both for outlier detection and parameter estimation. The proposed Genetic Algorithm for Robust Regression (GA-RR) method and M-estimators were compared to the root mean square error (RMSE) and mean absolute error (MAE) performance criterion using simulation study. Furthermore, the performance of the methods was evaluated using real data.

Kaynakça

  • [1] E. Ronchetti, 2006, The Historical Development of Robust Statistics. ICOTS-7.
  • [2] H. Y. Wu, L. D. Wu 2005, A robust estimation method in orbit improvement. Chinese Astronomy and Astrophysics, 29(4), 430-437.
  • [3] M. Hu, W. M. Zhang, M. Zhong, 2017, Robust regression and its application in absolute gravimeters. Review of Scientific Instruments, 88(5), 054501.
  • [4] P. Leoni, P. Segaert, S. Serneels, T. Verdonck, 2018, Multivariate constrained robust M‐regression for shaping forward curves in electricity markets. Journal of Futures Markets, 38(11), 1391-1406.
  • [5] Q. Su, Y. Bommireddy, Y. Shah, S. Ganesh, M. Moreno, J. Liu, ... Z. K. Nagy, 2019, Data reconciliation in the Quality-by-Design (QbD) implementation of pharmaceutical continuous tablet manufacturing. International journal of pharmaceutics, 563, 259-272.
  • [6] D. E. Goldberg, 1989, Genetic algorithms in search, optimization, and machine learning. Addison. Reading.
  • [7] A. Hussain, S. Riaz, M. S. Amjad, E. U. Haq, 2022, Genetic algorithm with a new round-robin based tournament selection: Statistical properties analysis. PloS one, 17(9), e0274456.
  • [8] P. Vankeerberghen, J. Smeyers-Verbeke, R. Leardi, C. L. Karr, D. L. Massart, 1995, Robust regression and outlier detection for non-linear models using genetic algorithms. Chemometrics and Intelligent Laboratory Systems, 28(1), 73-87.
  • [9] Y. C. Hu, 2009, Functional-link nets with genetic-algorithm-based learning for robust nonlinear interval regression analysis. Neurocomputing, 72(7-9), 1808-1816.
  • [10] P. Wiegand, R. Pell, E. Comas, 2009, Simultaneous variable selection and outlier detection using a robust genetic algorithm. Chemometrics and Intelligent Laboratory Systems, 98(2), 108-114.
  • [11] D. Sykas, V. Karathanassi, 2015, An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms. In Third International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2015) (Vol. 9535, p. 953502). SPIE.
  • [12] A. Duraj, Ł. Chomątek, 2017, Outlier detection using the multiobjective genetic algorithm. Journal of Applied Computer Science, 25(2), 29-42.
  • [13] A. Toy, 2022, Robust regresyon tahmin edicilerine yönelik yeni bir yaklaşım. Doktora Tezi. Ondokuz Mayıs Üniversitesi, Lisansüstü Eğitim Enstitüsü.
  • [14] P. J. Rousseeuw, A. M. Leroy, 1987, Robust regression and outlier detection John Wiley and Sons. Inc., New York.
  • [15] R. A. Maronna, R. D. Martin, V. J. Yohai, 2006, Robust Statistics Theory and Methods John Wiley and Sons. Inc., USA.
  • [16] P. Pennacchi, 2008, Robust estimate of excitations in mechanical systems using M-estimators—Theoretical background and numerical applications. Journal of Sound and Vibration, 310(4-5), 923-946.
  • [17] P. W. Holland, R. E. Welsch, 1977, Robust regression using iteratively reweighted least-squares. Communications in Statistics-theory and Methods, 6(9), 813-827.
  • [18] J. Fox, 2002, Robust regression: Appendix to an R and S-PLUS companion to applied regression.
  • [19] Z. Zhang, 1997, Parameter estimation techniques: A tutorial with application to conic fitting. Image and vision Computing, 15(1), 59-76.
  • [20] D. B. Özyurt, R. W. Pike, 2004, Theory and practice of simultaneous data reconciliation and gross error detection for chemical processes. Computers and chemical engineering, 28(3), 381-402.
  • [21] M. Kumar, D. Husain, N. Upreti, D. Gupta, 2010, Genetic algorithm: Review and application. Available at SSRN 3529843.
  • [22] Z. Michalewicz, M. Schoenauer, 1996, Evolutionary algorithms for constrained parameter optimization problems. Evolutionary computation, 4(1), 1-32.
  • [23] X. S. Yang, 2020, Nature-inspired optimization algorithms. Academic Press.
  • [24] H. Jiawei, K. Micheline, P. Jian, 2016, Data mining concepts and techniques.
  • [25] K. A. Brownlee, 1965, Statistical Theory and Methodology in Science and Engineering. 2nd ed., John Wiley and Sons, New York.
  • [26] N. R. Draper, H. Smith, 1966, Applied Regression Analysis. John Wiley and Sons, New York.
  • [27] D. F. Andrews, D. Pregibon, 1978, Finding the outliers that matter, J. R. Stat. SOC. Series B.
  • [28] A. P. Dempster, M. Gasko-Green, 1981, New tools for residual analysis, Ann. Stat., 9, 945-959.
  • [29] W. J. J. Rey, 1983, Introduction to Robust and Quasi-Robust StatisticalMethods. Berlin, Springer.
  • [30] F. R. Hampel, E. M. Ronchetti, P.J. Rousseeuw, W. A. Stahel, 1986, Robust Statistics. The Approach Based on Influence Functions. Wiley, New York.
  • [31] S. Chatterjee, M. Machler, 1997, Robust regression : a weighted least squares approach, Communications in Statistics - Theory and Methods, 26:6, 1381-1394.
  • [32] D. Birkes, Y. Dodge, 1990, Alternative Methods of Regression. John Wiley& Sons, New York.
  • [33] P. Cizek, J. A. Visek, 2000, Least trimmed squares, Interdisciplinary Research Project 373: No. 2000, 53, Humboldt University of Berlin, Quantification and Simulation of Economic Processes, Berlin.
  • [34] J. Jureckova, J. Picek, 2006, Robust Statistical Methods with R, Chapman & Hall/CRC, Boca Rotan, FL.
  • [35] P. J. Rousseeuw, M. Hubert, 2018, Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8:e1236.
  • [36] H. S. Seo, 2019, Unified methods for variable selection and outlier detection in a linear regression. Communications for Statistical Applications and Methods, Vol. 26, No. 6, 575–582.

Doğrusal Modellerde Robust Regresyon için Yeni Bir Genetik Algoritma

Yıl 2025, Cilt: 18 Sayı: 1, 1 - 15, 29.06.2025

Öz

Aykırı değerler parametre tahminini olumsuz etkilemektedir. Bu nedenle aykırı değerlerin parametre tahmini üzerindeki olumsuz etkisini minimize etmek için gözlem değerleri ağırlıklandırılabilmektedir. Bu çalışmada, gözlem değerlerinin Genetik Algoritma (GA) ile ağırlıklandırıldığı, hem aykırı değerlerin tespiti hem de parametre tahmini için kullanılabilecek bir robust yöntem önerilmektedir. Önerilen Robust Regresyon için Genetik Algoritma (GA-RR) yöntemi ve M-tahminciler, simülasyon çalışması ile hata kareler ortalamasının karekökü (RMSE) ve ortalama mutlak hata (MAE) performans kriterince karşılaştırılmıştır. Ayrıca, gerçek veri kullanılarak yöntemlerin performansları değerlendirilmiştir.

Kaynakça

  • [1] E. Ronchetti, 2006, The Historical Development of Robust Statistics. ICOTS-7.
  • [2] H. Y. Wu, L. D. Wu 2005, A robust estimation method in orbit improvement. Chinese Astronomy and Astrophysics, 29(4), 430-437.
  • [3] M. Hu, W. M. Zhang, M. Zhong, 2017, Robust regression and its application in absolute gravimeters. Review of Scientific Instruments, 88(5), 054501.
  • [4] P. Leoni, P. Segaert, S. Serneels, T. Verdonck, 2018, Multivariate constrained robust M‐regression for shaping forward curves in electricity markets. Journal of Futures Markets, 38(11), 1391-1406.
  • [5] Q. Su, Y. Bommireddy, Y. Shah, S. Ganesh, M. Moreno, J. Liu, ... Z. K. Nagy, 2019, Data reconciliation in the Quality-by-Design (QbD) implementation of pharmaceutical continuous tablet manufacturing. International journal of pharmaceutics, 563, 259-272.
  • [6] D. E. Goldberg, 1989, Genetic algorithms in search, optimization, and machine learning. Addison. Reading.
  • [7] A. Hussain, S. Riaz, M. S. Amjad, E. U. Haq, 2022, Genetic algorithm with a new round-robin based tournament selection: Statistical properties analysis. PloS one, 17(9), e0274456.
  • [8] P. Vankeerberghen, J. Smeyers-Verbeke, R. Leardi, C. L. Karr, D. L. Massart, 1995, Robust regression and outlier detection for non-linear models using genetic algorithms. Chemometrics and Intelligent Laboratory Systems, 28(1), 73-87.
  • [9] Y. C. Hu, 2009, Functional-link nets with genetic-algorithm-based learning for robust nonlinear interval regression analysis. Neurocomputing, 72(7-9), 1808-1816.
  • [10] P. Wiegand, R. Pell, E. Comas, 2009, Simultaneous variable selection and outlier detection using a robust genetic algorithm. Chemometrics and Intelligent Laboratory Systems, 98(2), 108-114.
  • [11] D. Sykas, V. Karathanassi, 2015, An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms. In Third International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2015) (Vol. 9535, p. 953502). SPIE.
  • [12] A. Duraj, Ł. Chomątek, 2017, Outlier detection using the multiobjective genetic algorithm. Journal of Applied Computer Science, 25(2), 29-42.
  • [13] A. Toy, 2022, Robust regresyon tahmin edicilerine yönelik yeni bir yaklaşım. Doktora Tezi. Ondokuz Mayıs Üniversitesi, Lisansüstü Eğitim Enstitüsü.
  • [14] P. J. Rousseeuw, A. M. Leroy, 1987, Robust regression and outlier detection John Wiley and Sons. Inc., New York.
  • [15] R. A. Maronna, R. D. Martin, V. J. Yohai, 2006, Robust Statistics Theory and Methods John Wiley and Sons. Inc., USA.
  • [16] P. Pennacchi, 2008, Robust estimate of excitations in mechanical systems using M-estimators—Theoretical background and numerical applications. Journal of Sound and Vibration, 310(4-5), 923-946.
  • [17] P. W. Holland, R. E. Welsch, 1977, Robust regression using iteratively reweighted least-squares. Communications in Statistics-theory and Methods, 6(9), 813-827.
  • [18] J. Fox, 2002, Robust regression: Appendix to an R and S-PLUS companion to applied regression.
  • [19] Z. Zhang, 1997, Parameter estimation techniques: A tutorial with application to conic fitting. Image and vision Computing, 15(1), 59-76.
  • [20] D. B. Özyurt, R. W. Pike, 2004, Theory and practice of simultaneous data reconciliation and gross error detection for chemical processes. Computers and chemical engineering, 28(3), 381-402.
  • [21] M. Kumar, D. Husain, N. Upreti, D. Gupta, 2010, Genetic algorithm: Review and application. Available at SSRN 3529843.
  • [22] Z. Michalewicz, M. Schoenauer, 1996, Evolutionary algorithms for constrained parameter optimization problems. Evolutionary computation, 4(1), 1-32.
  • [23] X. S. Yang, 2020, Nature-inspired optimization algorithms. Academic Press.
  • [24] H. Jiawei, K. Micheline, P. Jian, 2016, Data mining concepts and techniques.
  • [25] K. A. Brownlee, 1965, Statistical Theory and Methodology in Science and Engineering. 2nd ed., John Wiley and Sons, New York.
  • [26] N. R. Draper, H. Smith, 1966, Applied Regression Analysis. John Wiley and Sons, New York.
  • [27] D. F. Andrews, D. Pregibon, 1978, Finding the outliers that matter, J. R. Stat. SOC. Series B.
  • [28] A. P. Dempster, M. Gasko-Green, 1981, New tools for residual analysis, Ann. Stat., 9, 945-959.
  • [29] W. J. J. Rey, 1983, Introduction to Robust and Quasi-Robust StatisticalMethods. Berlin, Springer.
  • [30] F. R. Hampel, E. M. Ronchetti, P.J. Rousseeuw, W. A. Stahel, 1986, Robust Statistics. The Approach Based on Influence Functions. Wiley, New York.
  • [31] S. Chatterjee, M. Machler, 1997, Robust regression : a weighted least squares approach, Communications in Statistics - Theory and Methods, 26:6, 1381-1394.
  • [32] D. Birkes, Y. Dodge, 1990, Alternative Methods of Regression. John Wiley& Sons, New York.
  • [33] P. Cizek, J. A. Visek, 2000, Least trimmed squares, Interdisciplinary Research Project 373: No. 2000, 53, Humboldt University of Berlin, Quantification and Simulation of Economic Processes, Berlin.
  • [34] J. Jureckova, J. Picek, 2006, Robust Statistical Methods with R, Chapman & Hall/CRC, Boca Rotan, FL.
  • [35] P. J. Rousseeuw, M. Hubert, 2018, Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8:e1236.
  • [36] H. S. Seo, 2019, Unified methods for variable selection and outlier detection in a linear regression. Communications for Statistical Applications and Methods, Vol. 26, No. 6, 575–582.
Toplam 36 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Uygulamalı İstatistik
Bölüm Makaleler
Yazarlar

Ahmet Toy 0000-0002-2647-7259

Erol Terzi 0000-0002-2309-827X

Erken Görünüm Tarihi 14 Haziran 2025
Yayımlanma Tarihi 29 Haziran 2025
Gönderilme Tarihi 14 Ocak 2025
Kabul Tarihi 30 Mayıs 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 18 Sayı: 1

Kaynak Göster

IEEE A. Toy ve E. Terzi, “A genetic algorithm for robust regression in linear models”, JSSA, c. 18, sy. 1, ss. 1–15, 2025.