High leverage points and vertical outliers resistant model selection in regression

K. Shende; Dattatraya Kashid

doi:10.15672/hujms.842589

Research Article

BibTex

RIS

Cite

Year 2021, , 1773 - 1792, 14.12.2021

K. Shende , Dattatraya Kashid

https://doi.org/10.15672/hujms.842589

Abstract

References

[1] M. Alguraibawi, H. Midi and A.H.M. Imon, A new robust diagnostic plot for classifying good and bad high leverage points in a multiple linear regression model, Math. Probl. Eng., Doi:10.1155/2015/279472, 2015.
[2] C.D.S. André, S.N. Elian, S.C. Narula and R.A. Tavares, Coefficients of determinations for variable selection in the msae regression, Comm. Statist. Theory Methods 29 (3), 623-642, 2000.
[3] B. Asikgil and A. Erar, Research into multiple outliers in linear regression analysis, Hacet. J. Math. Stat. 38 (2), 185-198, 2009.
[4] A.C. Atkinson, Fast very robust methods for the detection of multiple outliers, J. Amer. Statist. Assoc. 89 (428), 1329-1339, 1994.
[5] A. Bab-Hadiashar and D. Suter, Data Segmentation and Model Selection for Computer Vision: A Statistical Approach, Springer, 2000.
[6] D.A. Belsley, E. Kuh and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, 1980.
[7] H. Bozdogan and D.M.A. Haughton, Informational complexity criteria for regression models, Comput. Statist. Data Anal. 28 (1), 51-76, 1998.
[8] S. Chatterjee and A.S. Hadi, Influential observations, high leverage points and outliers in linear regression, Statist. Sci. 1 (3), 379-416, 1986.
[9] C.W. Coakley and T.P. Hettmansperger, A bounded influence, high breakdown, efficient regression estimator, J. Amer. Statist. Assoc. 88 (423), 872-880, 1993.
[10] C. Croux and C. Dehon, Estimators of the multiple correlation coefficient: Local robustness and confidence intervals, Statist. Papers 44 (3), 315-334, 2003.
[11] C. Dehon, M. Gassner and V. Verardi, Beware of ‘Good’ outliers and overoptimistic conclusions, Oxf. Bull. Econ. Stat. 71 (3), 437-452, 2009.
[12] A.S. Hadi and J.S. Simonoff, Procedures for the identification of multiple outliers in linear models, J. Amer. Statist. Assoc. 88 (424), 1264-1272, 1993.
[13] F. R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel, Robust Statistics: The Approach based on Influence Functions, Wiley, 1986.
[14] R.W. Hill, When there are outliers in the carriers: The univariate case, Comm. Statist. Theory Methods 11 (8), 849-868, 1982.
[15] P.W. Holland and R.E. Welsch, Robust regression using iteratively reweighted leastsquares, Comm. Statist. Theory Methods 6 (9), 813-827, 1977.
[16] B. Hu and J. Shao, Generalized linear model selection using R2 , J. Statist. Plann. Inference 138 (12), 3705-3712, 2008.
[17] P.J. Huber, Robust estimation of a location parameter, Ann. Math. Stat. 35 (1), 73-101, 1964.
[18] D.N. Kashid and S.R. Kulkarni, A more general criterion for subset selection in multiple linear regression, Comm. Statist. Theory Methods 31 (5), 795-811, 2002.
[19] C. Kim and S. Hwang, Influence subsets on the variable selection, Comm. Statist. Theory Methods 29 (2), 335-347, 2000.
[20] W.S. Krasker and R.E. Welsch, Efficient bounded-influence regression estimation, J. Amer. Statist. Assoc. 77 (379), 595-604, 1982.
[21] J.A.F. Machado, Robust Model Selection and M-estimation, Econometric Theory 9 (3), 478-493, 1993.
[22] C. Mallows, Some comments on Cp, Technometrics 15 (4), 661-675, 1973.
[23] R.A. Maronna, R.D. Martin and V.J. Yohai, Robust Statistics: Theory and Methods, Wiley, 2006.
[24] R.A. Maronna and V.J. Yohai, Asymptotic behavior of general M-estimates for regression and scale with random carriers, Probab. Theory Related Fields 58 (1), 7-20, 1981.
[25] C.R. Rao, Y. Wu, S. Konishi and R. Mukerjee, On model selection, Lecture Notes in Monograph Series 38, 1-64, 2001.
[26] O. Renaud and M.P. Victoria-Feser, A robust coefficient of determination for regression, J. Statist. Plann. Inference 140 (7), 1852-1862, 2010.
[27] E. Ronchetti, Robust model selection in regression, Statist. Probab. Lett. 3 (1), 21-23, 1985.
[28] E. Ronchetti and R.G. Staudte, A robust version of Mallows’s Cp, J. Amer. Statist. Assoc. 89 (426), 550-559, 1994.
[29] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Wiley, 2003.
[30] P.J. Rousseeuw and B.C. Van Zomeren, Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc. 85 (411), 633-639, 1990.
[31] S. Sommer and R.G. Staudte, Robust variable selection in regression in the presence of outliers and leverage points, Aust. N. Z. J. Stat. 37 (3), 323-336, 1995.
[32] K. Tharmaratnam and G. Claeskens, A comparison of robust versions of the AIC based on M, S and MM-estimators, Statistics 47 (1), 216-235, 2013.
[33] R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, Elsevier, 2012.
[34] R. Wilcox, Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction, CRC Press, 2012.

High leverage points and vertical outliers resistant model selection in regression

Year 2021, , 1773 - 1792, 14.12.2021

K. Shende , Dattatraya Kashid

https://doi.org/10.15672/hujms.842589

Abstract

It is necessary to consider only relevant predictor variables for prediction purpose because irrelevant predictors in the regression model will tend to misleading inference. There are so many model selection methods available in the literature; among these, some methods are resistant to vertical outliers, but still, the problem of the presence of vertical outliers and leverage points is not well studied. In this article, we have modified the Sp statistic using the generalized M-estimator for robust model selection in the presence of vertical outliers and high leverage points. The proposed model selection criterion selects only relevant predictor variables by probability one for a large sample size. We found the equivalence of this criterion and the existing Cp and Sp criteria. The superiority of a proposed criterion is demonstrated using simulated and real data.

Keywords

GM-estimator, adaptive $S_p$ statistic, $S_p$ statistic, $C_p$ statistic, consistency, robust model selection

References

[1] M. Alguraibawi, H. Midi and A.H.M. Imon, A new robust diagnostic plot for classifying good and bad high leverage points in a multiple linear regression model, Math. Probl. Eng., Doi:10.1155/2015/279472, 2015.
[2] C.D.S. André, S.N. Elian, S.C. Narula and R.A. Tavares, Coefficients of determinations for variable selection in the msae regression, Comm. Statist. Theory Methods 29 (3), 623-642, 2000.
[3] B. Asikgil and A. Erar, Research into multiple outliers in linear regression analysis, Hacet. J. Math. Stat. 38 (2), 185-198, 2009.
[4] A.C. Atkinson, Fast very robust methods for the detection of multiple outliers, J. Amer. Statist. Assoc. 89 (428), 1329-1339, 1994.
[5] A. Bab-Hadiashar and D. Suter, Data Segmentation and Model Selection for Computer Vision: A Statistical Approach, Springer, 2000.
[6] D.A. Belsley, E. Kuh and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, 1980.
[7] H. Bozdogan and D.M.A. Haughton, Informational complexity criteria for regression models, Comput. Statist. Data Anal. 28 (1), 51-76, 1998.
[8] S. Chatterjee and A.S. Hadi, Influential observations, high leverage points and outliers in linear regression, Statist. Sci. 1 (3), 379-416, 1986.
[9] C.W. Coakley and T.P. Hettmansperger, A bounded influence, high breakdown, efficient regression estimator, J. Amer. Statist. Assoc. 88 (423), 872-880, 1993.
[10] C. Croux and C. Dehon, Estimators of the multiple correlation coefficient: Local robustness and confidence intervals, Statist. Papers 44 (3), 315-334, 2003.
[11] C. Dehon, M. Gassner and V. Verardi, Beware of ‘Good’ outliers and overoptimistic conclusions, Oxf. Bull. Econ. Stat. 71 (3), 437-452, 2009.
[12] A.S. Hadi and J.S. Simonoff, Procedures for the identification of multiple outliers in linear models, J. Amer. Statist. Assoc. 88 (424), 1264-1272, 1993.
[13] F. R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel, Robust Statistics: The Approach based on Influence Functions, Wiley, 1986.
[14] R.W. Hill, When there are outliers in the carriers: The univariate case, Comm. Statist. Theory Methods 11 (8), 849-868, 1982.
[15] P.W. Holland and R.E. Welsch, Robust regression using iteratively reweighted leastsquares, Comm. Statist. Theory Methods 6 (9), 813-827, 1977.
[16] B. Hu and J. Shao, Generalized linear model selection using R2 , J. Statist. Plann. Inference 138 (12), 3705-3712, 2008.
[17] P.J. Huber, Robust estimation of a location parameter, Ann. Math. Stat. 35 (1), 73-101, 1964.
[18] D.N. Kashid and S.R. Kulkarni, A more general criterion for subset selection in multiple linear regression, Comm. Statist. Theory Methods 31 (5), 795-811, 2002.
[19] C. Kim and S. Hwang, Influence subsets on the variable selection, Comm. Statist. Theory Methods 29 (2), 335-347, 2000.
[20] W.S. Krasker and R.E. Welsch, Efficient bounded-influence regression estimation, J. Amer. Statist. Assoc. 77 (379), 595-604, 1982.
[21] J.A.F. Machado, Robust Model Selection and M-estimation, Econometric Theory 9 (3), 478-493, 1993.
[22] C. Mallows, Some comments on Cp, Technometrics 15 (4), 661-675, 1973.
[23] R.A. Maronna, R.D. Martin and V.J. Yohai, Robust Statistics: Theory and Methods, Wiley, 2006.
[24] R.A. Maronna and V.J. Yohai, Asymptotic behavior of general M-estimates for regression and scale with random carriers, Probab. Theory Related Fields 58 (1), 7-20, 1981.
[25] C.R. Rao, Y. Wu, S. Konishi and R. Mukerjee, On model selection, Lecture Notes in Monograph Series 38, 1-64, 2001.
[26] O. Renaud and M.P. Victoria-Feser, A robust coefficient of determination for regression, J. Statist. Plann. Inference 140 (7), 1852-1862, 2010.
[27] E. Ronchetti, Robust model selection in regression, Statist. Probab. Lett. 3 (1), 21-23, 1985.
[28] E. Ronchetti and R.G. Staudte, A robust version of Mallows’s Cp, J. Amer. Statist. Assoc. 89 (426), 550-559, 1994.
[29] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Wiley, 2003.
[30] P.J. Rousseeuw and B.C. Van Zomeren, Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc. 85 (411), 633-639, 1990.
[31] S. Sommer and R.G. Staudte, Robust variable selection in regression in the presence of outliers and leverage points, Aust. N. Z. J. Stat. 37 (3), 323-336, 1995.
[32] K. Tharmaratnam and G. Claeskens, A comparison of robust versions of the AIC based on M, S and MM-estimators, Statistics 47 (1), 216-235, 2013.
[33] R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, Elsevier, 2012.
[34] R. Wilcox, Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction, CRC Press, 2012.

There are 34 citations in total.

Details

Primary Language	English
Subjects	Statistics
Journal Section	Statistics
Authors	K. Shende 0000-0001-8439-8041 Dattatraya Kashid This is me 0000-0002-4835-2713
Publication Date	December 14, 2021
Published in Issue	Year 2021

Cite

APA	Shende, K., & Kashid, D. (2021). High leverage points and vertical outliers resistant model selection in regression. Hacettepe Journal of Mathematics and Statistics, 50(6), 1773-1792. https://doi.org/10.15672/hujms.842589
AMA	Shende K, Kashid D. High leverage points and vertical outliers resistant model selection in regression. Hacettepe Journal of Mathematics and Statistics. December 2021;50(6):1773-1792. doi:10.15672/hujms.842589
Chicago	Shende, K., and Dattatraya Kashid. “High Leverage Points and Vertical Outliers Resistant Model Selection in Regression”. Hacettepe Journal of Mathematics and Statistics 50, no. 6 (December 2021): 1773-92. https://doi.org/10.15672/hujms.842589.
EndNote	Shende K, Kashid D (December 1, 2021) High leverage points and vertical outliers resistant model selection in regression. Hacettepe Journal of Mathematics and Statistics 50 6 1773–1792.
IEEE	K. Shende and D. Kashid, “High leverage points and vertical outliers resistant model selection in regression”, Hacettepe Journal of Mathematics and Statistics, vol. 50, no. 6, pp. 1773–1792, 2021, doi: 10.15672/hujms.842589.
ISNAD	Shende, K. - Kashid, Dattatraya. “High Leverage Points and Vertical Outliers Resistant Model Selection in Regression”. Hacettepe Journal of Mathematics and Statistics 50/6 (December 2021), 1773-1792. https://doi.org/10.15672/hujms.842589.
JAMA	Shende K, Kashid D. High leverage points and vertical outliers resistant model selection in regression. Hacettepe Journal of Mathematics and Statistics. 2021;50:1773–1792.
MLA	Shende, K. and Dattatraya Kashid. “High Leverage Points and Vertical Outliers Resistant Model Selection in Regression”. Hacettepe Journal of Mathematics and Statistics, vol. 50, no. 6, 2021, pp. 1773-92, doi:10.15672/hujms.842589.
Vancouver	Shende K, Kashid D. High leverage points and vertical outliers resistant model selection in regression. Hacettepe Journal of Mathematics and Statistics. 2021;50(6):1773-92.

Article Files

Full Text

For more information about the journal, please visit: https://dergipark.org.tr/en/pub/hujms