Year 2025,
Volume: 54 Issue: 4, 1688 - 1707, 29.08.2025
Aiman Tahir
,
Maryam Ilyas
References
-
[1] A. Alin, Multicollinearity, Wiley Interdisciplinary Reviews: Computational Statistics
2 (3), 370-374, 2010.
-
[2] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, 2003.
-
[3] M. Alshakhs, P.J. Goedecke, J.E. Bailey and C. Madlock-Brown, Racial differences
in healthcare expenditures for prevalent multimorbidity combinations in the usa: A
cross sectional study, BMC Medicine 21(1), 399, 2023.
-
[4] K.C. Arum and F.I. Ugwuowo, Combining principal component and robust ridge estimators
in linear regression model with multicollinearity and outlier, Concurr. Comput.
Pract. Exp. 34(10), 2022.
-
[5] K.C. Arum, F.I. Ugwuowo, H.E. Oranye, T.O. Alakija, T.E. Ugah and O. C.
Asogwa, Combating outliers and multicollinearity in linear regression model using robust
Kibria-Lukman mixed with principal component estimator, simulation and computation,
Sci. Afr. 19, 2023.
-
[6] F.A. Awwad, I. Dawoud and M.R. Abonazel, Development of robust ÖzkaleKaçiranlar
and YangChang estimators for regression models in the presence of multicollinearity
and outliers. Concurr. Comput. Pract. Exp. 34(6), 2022.
-
[7] V. Barnett, and T. Lewis, Outliers in statistical data, John Wiley and Sons, New
York, 1994.
-
[8] M.J. Bayarri and J. Morales, Bayesian measures of surprise for outlier detection. J.
Stat. Plan. Inference 111(1-2), 3-22, 2003.
-
[9] J.O. Berger, E. Moreno, L.R. Pericchi, M.J. Bayarri, J.M. Bernardo, J.A. Cano and
S. Sivaganesan, An overview of robust Bayesian analysis. Test 3(1), 5-124, 1994.
-
[10] M.A. Beaumont, W. Zhang and D.J. Balding, Approximate Bayesian computation in
population genetics. Genetics 162(4), 2025-2035, 2002.
-
[11] L.D. Broemeling, Bayesian analysis of linear models. CRC Press, 2017.
-
[12] K. Chaloner and R. Brant, A Bayesian approach to outlier detection and residual
analysis. Biometrika 75(4), 651-659, 1988.
-
[13] G.E. Box, Sampling and Bayes inference in scientific modelling and robustness. J. R.
Stat. Soc.,A: Stat.Soc.143(4), 383-404, 1980.
-
[14] G.E. Box and G.C. Tiao, A Bayesian approach to some outlier problems. Biometrika
55(1), 119-129, 1968.
-
[15] R.D. Cook and L. Forzani, Partial Least Squares Regression: and Related Dimension
Reduction Methods, CRC Press, 2024.
-
[16] M. Denhere and N. Billor, Robust principal component functional logistic regression,
Commun.Stat.Simul.Comput. 45 (1) 264-281, 2016.
-
[17] H. Dong, T. Tong, C. Ma and Y. Chi, Fast and provable tensor robust principal
component analysis via scaled gradient descent, Inf. Inference: A J. of the IMA 12(3),
1716-1758, 2023.
-
[18] A. Ebiwonjumi, R. Chifurira and K. Chinhamu, A robust principal component analysis
for Estimating economic growth in Nigeria in the presence of multicollinearity and
outlier, J. Stat. Appl. Probab. 12(2), 611627, 2023.
-
[19] S. Engelen, M. Hubert, K. V. Branden, and S. Verboven, Robust PCR and Robust
PLSR: a comparative study, In Theory and applications of recent robust methods
105-117, Birkhäuser, Basel, 2004.
-
[20] W.J. Egan, and S.L. Morgan, Outlier detection in multivariate analytical chemical
data, Anal.Chem. 70 (11), 2372-2379, 1998.
-
[21] P. Filzmoser, Robust principal component regression. Computer data analysis and
modeling. Robust and computer intensive methods, Belarusian State University, Minsk
132-137, 2001.
-
[22] P. Gagnon, M. Bédard, and A. Desgagné, An automatic robust Bayesian approach to
principal component regression, J. Appl. Stat. 48 (1), 84-104, 2021.
-
[23] P. Gagnon, A. Desgagné and M. Bédard, A new Bayesian approach to robustness
against outliers in linear regression, Bayesian Anal. 15(2), 389-414, 2020.
-
[24] D.G. Gibbons, A simulation study of some ridge estimators, J.Am.Stat.Assoc.
76(373), pp.131-139, 1981.
-
[25] I. Guttman, Care and handling of univariate or multivariate outliners in detecting
spuriosity-a Bayesian approach, Technometrics 15(4), 723-738, 1973.
-
[26] I. Guttman, R. Dutter and P.R. Freeman, Care and handling of univariate outliers
in the general linear model to detect spuriosityA Bayesian approach , Technometrics
20(2), 187-193, 1978.
-
[27] Y. Hamura, K. Irie, and S. Sugasawa, Posterior robustness with milder conditions:
Contamination models revisited, Stat.Probab.Lett. 210, 2024.
-
[28] S. Hashimoto, S. Sugasawa, Robust Bayesian regression with synthetic posterior distributions,
Entropy 22(6), 661, 2020.
-
[29] D. C. Hoaglin and R. E. Welsch, The hat matrix in regression and ANOVA, Am. Stat.
32 (1), 17-22, 1987.
-
[30] P.J. Huber, Robust regression: asymptotics, conjectures and Monte Carlo, Ann. Stat.,
799-821, 1973.
-
[31] P. Huber, Robust statistics, New York: John wiley and son, 1981.
-
[32] M. Hubert and S. Verboven, A robust PCR method for highdimensional regressors,
Journal of Chemometrics: A Journal of the Chemometrics Society 17 (89), 438-452,
2003.
-
[33] M. Hubert, P. J. Rousseeuw and K. V. Branden, ROBPCA: a new approach to robust
principal component analysis, Technometrics 47 (1), 64-79, 2005.
-
[34] W. Johnson and S. Geisser. A predictive view of the detection and characterization of
influential observations in regression analysis, J. Am. Stat. Assoc. 78(381), 137-144,
1983.
-
[35] E.T. Jaynes, Probability theory: The logic of science. Cambridge university press,
2003.
-
[36] I. T. Jolliffe, Principal components in regression analysis. In Principal Component
Analysis, 129-155, Springer, New York, 1986.
-
[37] K.L. Lange, R.J. Little and J.M. Taylor, Robust statistical modeling using the t distribution,
J. Am. Stat. Assoc. 84(408), 881-896, 1989.
-
[38] G. Li and Z. Chen, Projection-pursuit approach to robust dispersion matrices and
principal components: primary theory and Monte Carlo, J. Am. Stat. Assoc. 80(391),
759-766, 1985.
-
[39] A.F. Lukman, R.A. Farghali, B.G. Kibria and O.A. Oluyemi, Robust-stein estimator
for overcoming outliers and multicollinearity, Sci. Rep. 13(1), 2023.
-
[40] W.F. Massy, Principal components regression in exploratory statistical research, J.
Am. Stat. Assoc. 60 (309), 234256, 1965.
-
[41] G.C. McDonald and R.C. Schwing, Instabilities of regression estimates relating air
pollution to mortality, Technometrics 15 (3), 463-81, 1973.
-
[42] M. T. Molebatsi, Handling of multicollinearity problem in modelling non- performing
loans in africas portfolio data [Doctoral dissertation], 2023.
-
[43] E.J. Montenegro, J.E. Pitti and B.O. Olivares, Identification of the main subsistence
crops of teribe: A case study based on multivariate techniques, Idesia 39(3), 8394,
2021.
-
[44] B.O. Olivares, J.E. Pitti and E.J. Montenegro, Socioeconomic characterization of
bocas del toro in panama: An application of multivariate techniques, Rev. Bras. Gest.
Desenvolv. Reg. 16(3), 5971, 2020.
-
[45] R. J. Pell, Multiple outlier detection for multivariate calibration using robust statistical
techniques, Chemom. Intell. Lab. Syst. 52 (1), 87-104, 2000.
-
[46] D. PEna, and V. Yohai, A fast procedure for outlier diagnostics in large regression
problems, J. Am. Stat. Assoc. 94 (446), 434-445, 1999.
-
[47] P. J. Rousseeuw and A. M. Leroy, Robust regression and outlier detection, John Wiley
& Sons, 1987.
-
[48] P. J. Rousseeuw, Least median of squares regression, J. Am. Stat. Assoc. 79 (388),
871-880, 1984.
-
[49] P.J. Rousseeuw, and K. Van Driessen, Computing lts regression for large datasets,
Data Min. Knowl. Discov. 12, 2945, 2006.
-
[50] D.B. Rubin, Bayesianly justifiable and relevant frequency calculations for the applied
statistician, Ann. Stat. 1151-1172, 1984.
-
[51] N. Shrestha, Detecting multicollinearity in regression analysis, Am. J. Appl. Math.
Stat. 8 (2), 39-42, 2020.
-
[52] G.N. Singh, D. Bhattacharyya and A. Bandyopadhyay, Robust estimation strategy for
handling outliers, Commun. Statis.Theor. Meth. 53(15), 5311-5330, 2024.
-
[53] A. Tahir and M. Ilyas, Robust correlation scaled principal component regression,
Hacet. J. Math. Stat. 52 (2), 459486, 2023.
-
[54] B. Walczak and D. L. Massart, Robust principal components regression as a detection
tool for outliers, Chemom. Intell. Lab. Syst. 27 (1), 41-54, 1995.
-
[55] M. West, Outlier models and prior distributions in Bayesian linear regression, J. R.
Stat. Soc., B: Stat. Methodol. 46(3), 431-439, 1984.
-
[56] K. Worden, G. Manson and N.R. Fieller, Damage detection using outlier analysis, J.
Sound Vib., 229(3), 647-667, 2000.
-
[57] S. Xiao, L. Cheng, C. Ma, J. Yang, X. Xu and J. Chen, An adaptive identification
method for outliers in dam deformation monitoring data based on Bayesian model
selection and least trimmed squares estimation, J. Civ. Struct. Health Monit. 1-17,
2024.
-
[58] S. Yonekura, and S. Sugasawa, Adaptation of the tuning parameter in general
Bayesian inference with robust divergence, Stat. Comput. 33(2), 39, 2023.
-
[59] K.V. Yuen and H.Q. Mu, A novel probabilistic method for robust parametric identification
and outlier detection, Probabilistic Eng. Mech. 30, 4859, 2012.
-
[60] B. Yüzba, M. Arashi and S. Ejaz Ahmed, Shrinkage Estimation Strategies in Generalised
Ridge Regression Models: Low/HighDimension Regime, Int. Stat. Rev. 88(1),
229-51, 2020.
-
[61] A. Zellner, Bayesian analysis of regression error terms, J. Am. Stat. Assoc. 70(349),
138-144, 1975.
-
[62] M. H. Zhang, Q. S. Xu and D. L. Massart, Robust principal components regression
based on principal sensitivity vectors, Chemom. Intell. Lab. Syst. 67 (2), 175-185,
2003.
A robust probabilistic framework for principal component regression: optimizing parameter identification and outlier detection via approximate Bayesian computation
Year 2025,
Volume: 54 Issue: 4, 1688 - 1707, 29.08.2025
Aiman Tahir
,
Maryam Ilyas
Abstract
Anomalies and ill-conditioned predictors present considerable obstacles to reliable parameter estimation in regression models. This paper presents an innovative approach that combines principal component regression with approximate Bayesian computation to address these issues. Principal component regression mitigates the effects of ill-conditioned variables by transforming highly correlated predictors into orthogonal components. Meanwhile, approximate Bayesian computation enhances robustness by approximating the posterior distribution of error variance ($\sigma^2$). This flexible framework models uncertainty and noise effectively. The integration of these methods improves both parameter estimation and anomaly detection. By assigning probabilistic scores to potential outliers, the method provides a more accurate and nuanced identification of anomalies. Extensive validation through simulated and real-world datasets demonstrates the favorable performance of the proposed technique over existing robust methods. These findings highlight the potential of approximate Bayesian computation as a powerful tool to improve the robustness and precision of regression analyzes in noisy and complex data environments.
References
-
[1] A. Alin, Multicollinearity, Wiley Interdisciplinary Reviews: Computational Statistics
2 (3), 370-374, 2010.
-
[2] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, 2003.
-
[3] M. Alshakhs, P.J. Goedecke, J.E. Bailey and C. Madlock-Brown, Racial differences
in healthcare expenditures for prevalent multimorbidity combinations in the usa: A
cross sectional study, BMC Medicine 21(1), 399, 2023.
-
[4] K.C. Arum and F.I. Ugwuowo, Combining principal component and robust ridge estimators
in linear regression model with multicollinearity and outlier, Concurr. Comput.
Pract. Exp. 34(10), 2022.
-
[5] K.C. Arum, F.I. Ugwuowo, H.E. Oranye, T.O. Alakija, T.E. Ugah and O. C.
Asogwa, Combating outliers and multicollinearity in linear regression model using robust
Kibria-Lukman mixed with principal component estimator, simulation and computation,
Sci. Afr. 19, 2023.
-
[6] F.A. Awwad, I. Dawoud and M.R. Abonazel, Development of robust ÖzkaleKaçiranlar
and YangChang estimators for regression models in the presence of multicollinearity
and outliers. Concurr. Comput. Pract. Exp. 34(6), 2022.
-
[7] V. Barnett, and T. Lewis, Outliers in statistical data, John Wiley and Sons, New
York, 1994.
-
[8] M.J. Bayarri and J. Morales, Bayesian measures of surprise for outlier detection. J.
Stat. Plan. Inference 111(1-2), 3-22, 2003.
-
[9] J.O. Berger, E. Moreno, L.R. Pericchi, M.J. Bayarri, J.M. Bernardo, J.A. Cano and
S. Sivaganesan, An overview of robust Bayesian analysis. Test 3(1), 5-124, 1994.
-
[10] M.A. Beaumont, W. Zhang and D.J. Balding, Approximate Bayesian computation in
population genetics. Genetics 162(4), 2025-2035, 2002.
-
[11] L.D. Broemeling, Bayesian analysis of linear models. CRC Press, 2017.
-
[12] K. Chaloner and R. Brant, A Bayesian approach to outlier detection and residual
analysis. Biometrika 75(4), 651-659, 1988.
-
[13] G.E. Box, Sampling and Bayes inference in scientific modelling and robustness. J. R.
Stat. Soc.,A: Stat.Soc.143(4), 383-404, 1980.
-
[14] G.E. Box and G.C. Tiao, A Bayesian approach to some outlier problems. Biometrika
55(1), 119-129, 1968.
-
[15] R.D. Cook and L. Forzani, Partial Least Squares Regression: and Related Dimension
Reduction Methods, CRC Press, 2024.
-
[16] M. Denhere and N. Billor, Robust principal component functional logistic regression,
Commun.Stat.Simul.Comput. 45 (1) 264-281, 2016.
-
[17] H. Dong, T. Tong, C. Ma and Y. Chi, Fast and provable tensor robust principal
component analysis via scaled gradient descent, Inf. Inference: A J. of the IMA 12(3),
1716-1758, 2023.
-
[18] A. Ebiwonjumi, R. Chifurira and K. Chinhamu, A robust principal component analysis
for Estimating economic growth in Nigeria in the presence of multicollinearity and
outlier, J. Stat. Appl. Probab. 12(2), 611627, 2023.
-
[19] S. Engelen, M. Hubert, K. V. Branden, and S. Verboven, Robust PCR and Robust
PLSR: a comparative study, In Theory and applications of recent robust methods
105-117, Birkhäuser, Basel, 2004.
-
[20] W.J. Egan, and S.L. Morgan, Outlier detection in multivariate analytical chemical
data, Anal.Chem. 70 (11), 2372-2379, 1998.
-
[21] P. Filzmoser, Robust principal component regression. Computer data analysis and
modeling. Robust and computer intensive methods, Belarusian State University, Minsk
132-137, 2001.
-
[22] P. Gagnon, M. Bédard, and A. Desgagné, An automatic robust Bayesian approach to
principal component regression, J. Appl. Stat. 48 (1), 84-104, 2021.
-
[23] P. Gagnon, A. Desgagné and M. Bédard, A new Bayesian approach to robustness
against outliers in linear regression, Bayesian Anal. 15(2), 389-414, 2020.
-
[24] D.G. Gibbons, A simulation study of some ridge estimators, J.Am.Stat.Assoc.
76(373), pp.131-139, 1981.
-
[25] I. Guttman, Care and handling of univariate or multivariate outliners in detecting
spuriosity-a Bayesian approach, Technometrics 15(4), 723-738, 1973.
-
[26] I. Guttman, R. Dutter and P.R. Freeman, Care and handling of univariate outliers
in the general linear model to detect spuriosityA Bayesian approach , Technometrics
20(2), 187-193, 1978.
-
[27] Y. Hamura, K. Irie, and S. Sugasawa, Posterior robustness with milder conditions:
Contamination models revisited, Stat.Probab.Lett. 210, 2024.
-
[28] S. Hashimoto, S. Sugasawa, Robust Bayesian regression with synthetic posterior distributions,
Entropy 22(6), 661, 2020.
-
[29] D. C. Hoaglin and R. E. Welsch, The hat matrix in regression and ANOVA, Am. Stat.
32 (1), 17-22, 1987.
-
[30] P.J. Huber, Robust regression: asymptotics, conjectures and Monte Carlo, Ann. Stat.,
799-821, 1973.
-
[31] P. Huber, Robust statistics, New York: John wiley and son, 1981.
-
[32] M. Hubert and S. Verboven, A robust PCR method for highdimensional regressors,
Journal of Chemometrics: A Journal of the Chemometrics Society 17 (89), 438-452,
2003.
-
[33] M. Hubert, P. J. Rousseeuw and K. V. Branden, ROBPCA: a new approach to robust
principal component analysis, Technometrics 47 (1), 64-79, 2005.
-
[34] W. Johnson and S. Geisser. A predictive view of the detection and characterization of
influential observations in regression analysis, J. Am. Stat. Assoc. 78(381), 137-144,
1983.
-
[35] E.T. Jaynes, Probability theory: The logic of science. Cambridge university press,
2003.
-
[36] I. T. Jolliffe, Principal components in regression analysis. In Principal Component
Analysis, 129-155, Springer, New York, 1986.
-
[37] K.L. Lange, R.J. Little and J.M. Taylor, Robust statistical modeling using the t distribution,
J. Am. Stat. Assoc. 84(408), 881-896, 1989.
-
[38] G. Li and Z. Chen, Projection-pursuit approach to robust dispersion matrices and
principal components: primary theory and Monte Carlo, J. Am. Stat. Assoc. 80(391),
759-766, 1985.
-
[39] A.F. Lukman, R.A. Farghali, B.G. Kibria and O.A. Oluyemi, Robust-stein estimator
for overcoming outliers and multicollinearity, Sci. Rep. 13(1), 2023.
-
[40] W.F. Massy, Principal components regression in exploratory statistical research, J.
Am. Stat. Assoc. 60 (309), 234256, 1965.
-
[41] G.C. McDonald and R.C. Schwing, Instabilities of regression estimates relating air
pollution to mortality, Technometrics 15 (3), 463-81, 1973.
-
[42] M. T. Molebatsi, Handling of multicollinearity problem in modelling non- performing
loans in africas portfolio data [Doctoral dissertation], 2023.
-
[43] E.J. Montenegro, J.E. Pitti and B.O. Olivares, Identification of the main subsistence
crops of teribe: A case study based on multivariate techniques, Idesia 39(3), 8394,
2021.
-
[44] B.O. Olivares, J.E. Pitti and E.J. Montenegro, Socioeconomic characterization of
bocas del toro in panama: An application of multivariate techniques, Rev. Bras. Gest.
Desenvolv. Reg. 16(3), 5971, 2020.
-
[45] R. J. Pell, Multiple outlier detection for multivariate calibration using robust statistical
techniques, Chemom. Intell. Lab. Syst. 52 (1), 87-104, 2000.
-
[46] D. PEna, and V. Yohai, A fast procedure for outlier diagnostics in large regression
problems, J. Am. Stat. Assoc. 94 (446), 434-445, 1999.
-
[47] P. J. Rousseeuw and A. M. Leroy, Robust regression and outlier detection, John Wiley
& Sons, 1987.
-
[48] P. J. Rousseeuw, Least median of squares regression, J. Am. Stat. Assoc. 79 (388),
871-880, 1984.
-
[49] P.J. Rousseeuw, and K. Van Driessen, Computing lts regression for large datasets,
Data Min. Knowl. Discov. 12, 2945, 2006.
-
[50] D.B. Rubin, Bayesianly justifiable and relevant frequency calculations for the applied
statistician, Ann. Stat. 1151-1172, 1984.
-
[51] N. Shrestha, Detecting multicollinearity in regression analysis, Am. J. Appl. Math.
Stat. 8 (2), 39-42, 2020.
-
[52] G.N. Singh, D. Bhattacharyya and A. Bandyopadhyay, Robust estimation strategy for
handling outliers, Commun. Statis.Theor. Meth. 53(15), 5311-5330, 2024.
-
[53] A. Tahir and M. Ilyas, Robust correlation scaled principal component regression,
Hacet. J. Math. Stat. 52 (2), 459486, 2023.
-
[54] B. Walczak and D. L. Massart, Robust principal components regression as a detection
tool for outliers, Chemom. Intell. Lab. Syst. 27 (1), 41-54, 1995.
-
[55] M. West, Outlier models and prior distributions in Bayesian linear regression, J. R.
Stat. Soc., B: Stat. Methodol. 46(3), 431-439, 1984.
-
[56] K. Worden, G. Manson and N.R. Fieller, Damage detection using outlier analysis, J.
Sound Vib., 229(3), 647-667, 2000.
-
[57] S. Xiao, L. Cheng, C. Ma, J. Yang, X. Xu and J. Chen, An adaptive identification
method for outliers in dam deformation monitoring data based on Bayesian model
selection and least trimmed squares estimation, J. Civ. Struct. Health Monit. 1-17,
2024.
-
[58] S. Yonekura, and S. Sugasawa, Adaptation of the tuning parameter in general
Bayesian inference with robust divergence, Stat. Comput. 33(2), 39, 2023.
-
[59] K.V. Yuen and H.Q. Mu, A novel probabilistic method for robust parametric identification
and outlier detection, Probabilistic Eng. Mech. 30, 4859, 2012.
-
[60] B. Yüzba, M. Arashi and S. Ejaz Ahmed, Shrinkage Estimation Strategies in Generalised
Ridge Regression Models: Low/HighDimension Regime, Int. Stat. Rev. 88(1),
229-51, 2020.
-
[61] A. Zellner, Bayesian analysis of regression error terms, J. Am. Stat. Assoc. 70(349),
138-144, 1975.
-
[62] M. H. Zhang, Q. S. Xu and D. L. Massart, Robust principal components regression
based on principal sensitivity vectors, Chemom. Intell. Lab. Syst. 67 (2), 175-185,
2003.