Variance Estimates and Model Selection

Sıdıka Başçı; Asad Zaman; Arzdar Kiracı

Variance Estimates and Model Selection

Year 2010, Volume: 2 Issue: 2, 57 - 72, 01.09.2010

Abstract

The large majority of the criteria for model selection are functions of the usual variance estimate for a regression model. The validity of the usual variance estimate depends on some assumptions, most critically the validity of the model being estimated. This is often violated in model selection contexts, where model search takes place over invalid models. A cross validated variance estimate is more robust to specification errors (see, for example, Efron, 1983). We consider the effects of replacing the usual variance estimate by a cross validated variance estimate, namely, the Prediction Sum of Squares (PRESS) in the functions of several model selection criteria. Such replacements improve the probability of finding the true model, at least in large samples.

Keywords

Autoregressive Process, Lag Order Determination, Model Selection Criteria, Cross Validation

References

Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In 2nd International Symposium on Information Theory, ed. B.N. Petrov and F. Csàki. Budapest: Akadèmiai Kiadò, 267-281
Akaike, H. (1974). A New Look at the Statistical Model IdentiŞcation. IEEE Transactions on Automatic Control, AC-19, 716-723.
Allen, D.M. (1974). The Relationship Between Variable Selection and Data Augmentation and a Method for Prediction. Technometrics, 16, 125-7.
Amemiya, T. (1980). Selection of Regressors. International Economic Review, 21, 331-345.
Arlot, S. and A. Celisse (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79.
Başçı, S. (1998). Computer Intensive Techniques for Model Selection. Ph. D. Dissertation, Bilkent University.
Başçı, S. and A. Zaman (1998). Eﬀects of Skewness and Kurtosis on Model Selection Criteria. Economics Letters, 59, 17-22.
Başçı, S., M. Orhan, and A. Zaman (1998). Model Selection by Cross Validation: Computational Aspects. Working Paper, Bilkent University.
Bhansali, R.J. and D.Y. Downham (1977). Some properties of the order of an autoregressive model selected by a generalized Akaike's EPF criterion. Biometrika, 64, 547-551.
Billings, S.A., H.L. Wei (2008). An adaptive orthogonal search algorithm for model subset selection and non-linear system identification. International Journal of Control, 81(5), 714-724.
Breiman, L. and D. Freedman (1983) How many variables should be entered in a regression equation? Journal of the American Statistical Association. 78, 131-136.
Chatterjee, S. and A.S. Hadi (1988). Sensitivity Analysis in Linear Regression. John Wiley & Sons, Inc., New York.
Christopher, T.B.S., A.M. Mokhtaruddin, M.H.A. Husni and M.Y. Abdullah (1998). A simple equation to determine the breakdown of individual aggregate size fractions in the wet- sieving method. Soil & Tillage Research, 45, 287-297
Collett, D., K. Stepniewska (1999). Some practical issues in binary data analysis. Statistics in Medicine, 18(17-18), 2209-2221.
Davies, S.L., A.A. Neath and J.E. Cavanaugh (2005). Cross validation model selection criteria for linear regression based on the Kullback-Leibler discrepancy. Statistical Methodology, 2(4), 249–266.
Diebold, F.X. (1989). Forecast Combination and Encompassing: Reconciling Two Divergent Literatures. International Journal of Forecasting, 5(4), 589-92.
Efron, B. (1983). Estimating the Error Rate of a Prediction Rule: Improvement on Cross- Validation. Journal of the American Statistical Association, 78(382), 316-331.
Geweke, J. and R. Meese (1981). Estimating regression models of finite but unknown order. International Economic Review, 22, 55-70.
Hannan E.J. and B.G. Quinn, (1979). The Determination of the Order of an Autoregression. Journal of the Royal Statistical Society B, 41, 190-195.
Hendry, David F. (1995). Dynamic Econometrics. Oxford: Oxford University Press.
Horn, S.D., R.A. Horn, and D.B. Duncan (1975). Estimating Heteroskedastic variances in linear models. Journal of the American Statistical Association, 70, 380-385.
Hurvich, C.M. and C.L. Tsai (1989). Regression and Time Series Model Selection in Small Samples. Biometrika, 76(2), 297-307.
Jabri, M. El, S. Abouelkaram, J.L. Damez and P. Berge (2010). Image analysis study of the perimysial connective network, and its relationship with tenderness and composition of bovine meat. Journal of Food Engineering, 96(2), 316-322.
Judge, G.G., W.E. Griﬃths, R.C. Hill and T. Lee (1985). The Theory and Practice of Econometrics 2nd Edition. John Wiley & Sons, Inc., New York: Wiley Series in Probability and Mathematical Statistics.
Lang, L., S. Hui, G. Pennello, Z. Desta, S. Todd, A. Nguyen, D. Flockhart (2007). Estimating a Positive False Discovery Rate for Variable Selection in Pharmacogenetic Studies. Journal of Biopharmaceutical Statistics, 17(5), 883-902.
Li, K.-C. (1987). Asymptotic optimality for Cp, CL, cross-validation and generalized cross- validation: discrete index set. The Annals of Statistics, 15(3), 958-975.
Lin, L.I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255-268.
Li, L. and S. Hui (2007). Positive False Discovery Rate Estimate in Step-Wise Variable Selection. Communications in Statistics - Simulation and Computation, 36(6), 1217- 1231.
Liu, K. (1993). A new class of biased estimate in linear regression. Communications in Statistics - Simulation and Computation, 22(2), 393-402.
Linhart, H. and W. Zucchini (1986). Model Selection. Wiley, New York.
Lütkepohl, H. (1985). Comparison of Criteria for Estimating the Order of a Vector Autoregressive Process. Journal of Time Series Analysis, 6(1), 35-52.
Magee, L. and M.R. Veall (1991) Selecting Regressors for Prediction Using PRESS and White T Statistics. Journal of Business and Economic Statistics, 9, 91-96.
Mallows, C.L. (1973). Some comments on CP. Technometrics, 15, 661-675.
McQuarrie, A., R. Shumway, C.-L. Tsai (1997). The model selection criterion AICu. Statistics & Probability Letters, 34, 285-292.
McQuarrie, A.D.R. and C.-L. Tsai (1998). Regression and Time Series Model Selection. World Scientific Publishers, Singapore.
Molinaro, A.M., R. Simon and R.M. Pfeiffer (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301-3307.
Neri, P. (2009). Nonlinear characterization of a simple process in human vision. Journal of Vision, 9(12), 1, 1-29.
Nikolic, K. and D. Agababa (2009). Prediction of hepatic microsomal intrinsic clearance and human clearance values for drugs. Journal of Molecular Graphics and Modelling, 28(3), 245-252.
Özkale, M.R. and S. Kaçıranlar (2007). A Prediction-Oriented Criterion for Choosing the Biasing Parameter in Liu Estimation. Communications in Statistics - Theory and Methods, 36(10), 1889-1903
Peng, X. and Y. Wang (2007). A normal least squares support vector machine (NLS-SVM) and its learning algorithm. Neurocomputing, 72(16-18), 3734-3741.
Piepho, H.-P. and H.G. Jr. Gauch (2001). Marker Pair Selection for Mapping Quantitative Trait Loci, Genetics, 157, 433-444.
Rao, C. R, Y. Wu (2001). On model selection. With discussion by Sadanori Konishi and Rahul Mukerjee and a rejoinder by the authors. IMS Lecture Notes - Monograph Series, Model selectionh h (38), , 1-64..
Quinn, B.G. (1980). Order Determination for a Multivariate Autoregression. Journal of the Royal Statistical Society B, 42, 182-185.
Rissanen, J. (1978). Modeling by Shortest Data Description. Automatica, 14, 465-471.
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6, 461- 464.
Shao J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486-494.
Wang, J. and G.B. Schaalje (2009). Model Selection for Linear Mixed Models Using Predictive Criteria. Communications in Statistics - Simulation and Computation, 38(4) 788- 801.
Xinjun, P. (2010). TSVR: An efficient Twin Support Vector Machine for regression. Neural Networks, 23, 365-372
Xiongcai, C. and A. Sowmya (2009). Learning to tune level set methods. In Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, 310-315, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5378391&isnumber=537834 9 (accessed May 16, 2010).
Zaman, A. (1984). Avoiding Model Selection by the use of Shrinkage Techniques. Journal of Econometrics, 25, 239-246.

Year 2010, Volume: 2 Issue: 2, 57 - 72, 01.09.2010

Sıdıka Başçı Asad Zaman Arzdar Kiracı

Abstract

References

Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In 2nd International Symposium on Information Theory, ed. B.N. Petrov and F. Csàki. Budapest: Akadèmiai Kiadò, 267-281
Akaike, H. (1974). A New Look at the Statistical Model IdentiŞcation. IEEE Transactions on Automatic Control, AC-19, 716-723.
Allen, D.M. (1974). The Relationship Between Variable Selection and Data Augmentation and a Method for Prediction. Technometrics, 16, 125-7.
Amemiya, T. (1980). Selection of Regressors. International Economic Review, 21, 331-345.
Arlot, S. and A. Celisse (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79.
Başçı, S. (1998). Computer Intensive Techniques for Model Selection. Ph. D. Dissertation, Bilkent University.
Başçı, S. and A. Zaman (1998). Eﬀects of Skewness and Kurtosis on Model Selection Criteria. Economics Letters, 59, 17-22.
Başçı, S., M. Orhan, and A. Zaman (1998). Model Selection by Cross Validation: Computational Aspects. Working Paper, Bilkent University.
Bhansali, R.J. and D.Y. Downham (1977). Some properties of the order of an autoregressive model selected by a generalized Akaike's EPF criterion. Biometrika, 64, 547-551.
Billings, S.A., H.L. Wei (2008). An adaptive orthogonal search algorithm for model subset selection and non-linear system identification. International Journal of Control, 81(5), 714-724.
Breiman, L. and D. Freedman (1983) How many variables should be entered in a regression equation? Journal of the American Statistical Association. 78, 131-136.
Chatterjee, S. and A.S. Hadi (1988). Sensitivity Analysis in Linear Regression. John Wiley & Sons, Inc., New York.
Christopher, T.B.S., A.M. Mokhtaruddin, M.H.A. Husni and M.Y. Abdullah (1998). A simple equation to determine the breakdown of individual aggregate size fractions in the wet- sieving method. Soil & Tillage Research, 45, 287-297
Collett, D., K. Stepniewska (1999). Some practical issues in binary data analysis. Statistics in Medicine, 18(17-18), 2209-2221.
Davies, S.L., A.A. Neath and J.E. Cavanaugh (2005). Cross validation model selection criteria for linear regression based on the Kullback-Leibler discrepancy. Statistical Methodology, 2(4), 249–266.
Diebold, F.X. (1989). Forecast Combination and Encompassing: Reconciling Two Divergent Literatures. International Journal of Forecasting, 5(4), 589-92.
Efron, B. (1983). Estimating the Error Rate of a Prediction Rule: Improvement on Cross- Validation. Journal of the American Statistical Association, 78(382), 316-331.
Geweke, J. and R. Meese (1981). Estimating regression models of finite but unknown order. International Economic Review, 22, 55-70.
Hannan E.J. and B.G. Quinn, (1979). The Determination of the Order of an Autoregression. Journal of the Royal Statistical Society B, 41, 190-195.
Hendry, David F. (1995). Dynamic Econometrics. Oxford: Oxford University Press.
Horn, S.D., R.A. Horn, and D.B. Duncan (1975). Estimating Heteroskedastic variances in linear models. Journal of the American Statistical Association, 70, 380-385.
Hurvich, C.M. and C.L. Tsai (1989). Regression and Time Series Model Selection in Small Samples. Biometrika, 76(2), 297-307.
Jabri, M. El, S. Abouelkaram, J.L. Damez and P. Berge (2010). Image analysis study of the perimysial connective network, and its relationship with tenderness and composition of bovine meat. Journal of Food Engineering, 96(2), 316-322.
Judge, G.G., W.E. Griﬃths, R.C. Hill and T. Lee (1985). The Theory and Practice of Econometrics 2nd Edition. John Wiley & Sons, Inc., New York: Wiley Series in Probability and Mathematical Statistics.
Lang, L., S. Hui, G. Pennello, Z. Desta, S. Todd, A. Nguyen, D. Flockhart (2007). Estimating a Positive False Discovery Rate for Variable Selection in Pharmacogenetic Studies. Journal of Biopharmaceutical Statistics, 17(5), 883-902.
Li, K.-C. (1987). Asymptotic optimality for Cp, CL, cross-validation and generalized cross- validation: discrete index set. The Annals of Statistics, 15(3), 958-975.
Lin, L.I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255-268.
Li, L. and S. Hui (2007). Positive False Discovery Rate Estimate in Step-Wise Variable Selection. Communications in Statistics - Simulation and Computation, 36(6), 1217- 1231.
Liu, K. (1993). A new class of biased estimate in linear regression. Communications in Statistics - Simulation and Computation, 22(2), 393-402.
Linhart, H. and W. Zucchini (1986). Model Selection. Wiley, New York.
Lütkepohl, H. (1985). Comparison of Criteria for Estimating the Order of a Vector Autoregressive Process. Journal of Time Series Analysis, 6(1), 35-52.
Magee, L. and M.R. Veall (1991) Selecting Regressors for Prediction Using PRESS and White T Statistics. Journal of Business and Economic Statistics, 9, 91-96.
Mallows, C.L. (1973). Some comments on CP. Technometrics, 15, 661-675.
McQuarrie, A., R. Shumway, C.-L. Tsai (1997). The model selection criterion AICu. Statistics & Probability Letters, 34, 285-292.
McQuarrie, A.D.R. and C.-L. Tsai (1998). Regression and Time Series Model Selection. World Scientific Publishers, Singapore.
Molinaro, A.M., R. Simon and R.M. Pfeiffer (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301-3307.
Neri, P. (2009). Nonlinear characterization of a simple process in human vision. Journal of Vision, 9(12), 1, 1-29.
Nikolic, K. and D. Agababa (2009). Prediction of hepatic microsomal intrinsic clearance and human clearance values for drugs. Journal of Molecular Graphics and Modelling, 28(3), 245-252.
Özkale, M.R. and S. Kaçıranlar (2007). A Prediction-Oriented Criterion for Choosing the Biasing Parameter in Liu Estimation. Communications in Statistics - Theory and Methods, 36(10), 1889-1903
Peng, X. and Y. Wang (2007). A normal least squares support vector machine (NLS-SVM) and its learning algorithm. Neurocomputing, 72(16-18), 3734-3741.
Piepho, H.-P. and H.G. Jr. Gauch (2001). Marker Pair Selection for Mapping Quantitative Trait Loci, Genetics, 157, 433-444.
Rao, C. R, Y. Wu (2001). On model selection. With discussion by Sadanori Konishi and Rahul Mukerjee and a rejoinder by the authors. IMS Lecture Notes - Monograph Series, Model selectionh h (38), , 1-64..
Quinn, B.G. (1980). Order Determination for a Multivariate Autoregression. Journal of the Royal Statistical Society B, 42, 182-185.
Rissanen, J. (1978). Modeling by Shortest Data Description. Automatica, 14, 465-471.
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6, 461- 464.
Shao J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486-494.
Wang, J. and G.B. Schaalje (2009). Model Selection for Linear Mixed Models Using Predictive Criteria. Communications in Statistics - Simulation and Computation, 38(4) 788- 801.
Xinjun, P. (2010). TSVR: An efficient Twin Support Vector Machine for regression. Neural Networks, 23, 365-372
Xiongcai, C. and A. Sowmya (2009). Learning to tune level set methods. In Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, 310-315, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5378391&isnumber=537834 9 (accessed May 16, 2010).
Zaman, A. (1984). Avoiding Model Selection by the use of Shrinkage Techniques. Journal of Econometrics, 25, 239-246.

There are 50 citations in total.

Details

Subjects	Business Administration
Other ID	JA24KJ42HY
Journal Section	Articles
Authors	Sıdıka Başçı This is me Asad Zaman This is me Arzdar Kiracı This is me
Publication Date	September 1, 2010
Submission Date	September 1, 2010
Published in Issue	Year 2010 Volume: 2 Issue: 2

Cite

APA	Başçı, S., Zaman, A., & Kiracı, A. (2010). Variance Estimates and Model Selection. International Econometric Review, 2(2), 57-72.
AMA	Başçı S, Zaman A, Kiracı A. Variance Estimates and Model Selection. IER. December 2010;2(2):57-72.
Chicago	Başçı, Sıdıka, Asad Zaman, and Arzdar Kiracı. “Variance Estimates and Model Selection”. International Econometric Review 2, no. 2 (December 2010): 57-72.
EndNote	Başçı S, Zaman A, Kiracı A (December 1, 2010) Variance Estimates and Model Selection. International Econometric Review 2 2 57–72.
IEEE	S. Başçı, A. Zaman, and A. Kiracı, “Variance Estimates and Model Selection”, IER, vol. 2, no. 2, pp. 57–72, 2010.
ISNAD	Başçı, Sıdıka et al. “Variance Estimates and Model Selection”. International Econometric Review 2/2 (December 2010), 57-72.
JAMA	Başçı S, Zaman A, Kiracı A. Variance Estimates and Model Selection. IER. 2010;2:57–72.
MLA	Başçı, Sıdıka et al. “Variance Estimates and Model Selection”. International Econometric Review, vol. 2, no. 2, 2010, pp. 57-72.
Vancouver	Başçı S, Zaman A, Kiracı A. Variance Estimates and Model Selection. IER. 2010;2(2):57-72.

Article Files

Full Text