Research Article
BibTex RIS Cite

Performance Comparisons of Model Selection Criteria: AIC, BIC, ICOMP and Wold’s for PLSR

Year 2013, Volume: 10 Issue: 3, 15 - 34, 13.12.2013

Abstract

Partial least squares regression (PLSR) is a statistical method of modeling relationships between YNxM response variable and XNxK explanatory variables which is particularly well suited to analyzing when explanatory variables are highly correlated. In partial least square part, some model selection criteria are used to obtain the latent variables which are the most relevant variables describing the response variables. In typical approach to select the numbers of latent variables are Akaike information criterion (AIC) and Wold’s R criterion.


In this study, we are interested in the performance of Bayesian Information Criterion (BIC) and Information Complexity Criterion (ICOMP) criteria besides the traditional methods AIC and Wold’s R criteria as the model selection criteria for partial least squares regression when the number of observations are higher than predictor variables. Performances of AIC, BIC, ICOMP and Wold’s R criteria were compared by real life data and simulation study. Simulation results were obtained from different sample sizes, different number of predictor variables and different number of response variables. The simulation results demonstrate that the BIC and ICOMP model selection methods are more effective than AIC and Wold’s R criteria selecting of latent variables for known PLSR models.

References

  • Abdi, H., Salkind N. 2007. Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage.
  • Akaike, H., 1974. A new look at the statistical model identification, IEEE Transaction on Automatic Control 19, 716-723
  • Bailey, C., 1994. Smart Exercise: Burning Fat, Getting Fit. Houghton-Mifflin Co., Boston, pp: 179-186.
  • Bedrick, E. J., Tsai, C. L., 1994. Model selection for multivariate regression in small samples. Biometrics 50, 226-231.
  • Behnke, A. R., J.H. Wilmore, 1974. Evaluation and Regulation of Body Build and Composition. Prentice-Hall, Englewood Cliffs, N. J., Pages: 236.
  • Boyer, K. L., Mirza, M. J., Ganguly, G., 1994. The Robust Sequential Estimator: A General Approach and its Application to Surface Organization in Range Data, IEEE PAMI 16, 987-1001.
  • Bozdoğan, H., 1987. Model Selection and Akaike's Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrica 52, 345-370.
  • Bozdoğan, H., 2000. Akaike's Information Criterion and Recent Developments in Information Complexity. Journal of Mathematical Psychology 44, 62-91.
  • Bozdoğan, H., 2004. Statistical Data Mining and Knowledge Discovery.Chapman and Hall/CRC, USA.
  • Bozdoğan, H., 2004. Intelligent statistical data mining with Information Complexity and Genetic Algorithms in Statistical Data Mining and Knowledge Discovery.Chapman and Hall/CRC, USA.
  • Clark, A. E., Troskie, C. G., 2006. Regression and ICOMP: A Simulation Study. Communications in Statistics Simulation and Computation 35, 591-603.
  • Eastment H. T., Krzanowski W. J. 1982. Cross-validatory choice of the number of components from a principal component analysis.Technometrics 24, 73-77.
  • Garthwaite, P. H., 1994. An interpretation of partial least squares, Journal of the American Statistical Association 89, 122-127.
  • Geladi, P., Kowalski, B. R., 1986. Partial least-squares regression a tutorial. Anal. Chim. Acta. 185, 1-17.
  • Haber, R., Unbenhauen, H., 1990. Structure identification of nonlinear dynamic systems-a survey on input/output approaches. Automatica 26 (4), 651-677.
  • Hastie, T., Tibshirani, R., Friedman J., 2001. The elements of statistical learning: data mining, inference, and prediction. New York, Springer.
  • Helland, I. S. 1990. Partial Least Squares Regression and Statistical Models, Scandinavian Journal of Statistics 17(2), 97-114.
  • Henry de-Graft, A. 2010. Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of an asymmetric price relationship. Journal of Development and Agricultural Economics 2(1): 001-006.
  • Jouan-Rimbaud Bouveresse, D., Rutledge, D. N., 2009. Two new extensions of principal component transform to compute a PLS2 model between two wide matrices:PCT-PLS2 and segmented PCT-PLS2. Analytica Chimica Acta. 642 (1-2), 37-44.
  • Katch, F., W. McArdle, 1977. Nutrition, Weight Control and Exercise. Houghton-MifflinCo., Boston.
  • Kanatani, K., 2002. Model Selection for Geometric Inference, The 5th Asian Conference on Computer Vision, Melbourne, Australia, pp. xxi-xxxii, January.
  • Kohavi, R., 1995. A study of cross-validation and boots trap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, San Francisco, CA, USA, pp. 1137–1143. Morgan Kaufmann Publishers Inc.
  • Kuha, J. 2004. AIC and BIC: Comparisons of assumptions and performance. Sociological Methods and Research. (33)2: 188-229.
  • Kundu, D., Murali G., 1996. Model selection in linear regression, Computational Statistics and Data Analysis 22 (5), 461-469(9).
  • Li, B., Morris, J., Martin E. B., 2002. Model Selection for Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems 64, 79-89.
  • Myung, I. J., 2000. The Importance of Complexity in Model Selection. Journal of Mathematical Psychology 44, 190-204.
  • Naes, T., Martens, H., 1985. Comparison of prediction methods for collinear data. Communication in Statistics Simulation and Computation 14, 545-576.
  • Niazi, A., Azizi, A., 2008. Orthogonal Signal Correction-Partial Least Squares Method for Simultaneous Spectrophotometric Determination of Nickel, Cobalt and Zinc. Turkish Journal of Chemistry 32, 217-228.
  • Saporta, G., 2008. Models for Understanding versus Models for Prediction. In Compstat 2008, Part IX, 315-322.
  • Schwarz, G., 1978. Estimating the Dimension of a Model. The Annals of Statistics6 (2), 461-464.
  • Siria, W. E., 1956. Gross Composition of the Body. InAdvance in Biological and Medical Physics, Lawrence J.H. and C.A. Tobias (Eds.). Academic Press, New York.
  • Torr, P. H. S., 1998. Model Selection for Two View Geometry: A Review, Model Selection for Two View Geometry: A Review, Microsoft Research, USA, Microsoft Research, USA.
  • Wang Y, Liu, Q., 2006. Comparison of Akaike information criteria (AIC) and Bayesian information criteria (BIC) in selection of stock recruitment relationships. Fisheries Research 77(2): 220-225.
  • Weakliem, L. D., 2004. Introduction to the Special Issue on Model Selection. Sociological Methods and Research. 33(2): 167-186.
  • Wilmore, J., 1976. Athletic Training and Physical Fitness: Physiological Principles of the Conditioning Process. Allynand Bacon, Inc., Boston.
  • Wold, H., 1966.Estimation of principal components and related models by iterative least squares.In P.R. Krishnaiaah (Ed.). Multivariate Analysis. (pp.391-420) New York: Academic Press.
  • Wold, H., 1982. Soft Modelling, The basic design and some extensions, in: K.- G. Jöreskog, H. Wold (Eds.), Systems Under Indirect Observation. Vols.I and II, North-Holland, Amsterdam.
  • Wold, S., Sjöström, M., Eriksson, L., 2001. PLS regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory 58, 109-130.

KEKKR için Model Seçme Kriterlerinin Performans Karşılaştırmaları: AIC, BIC, ICOMP ve WOLD'S R

Year 2013, Volume: 10 Issue: 3, 15 - 34, 13.12.2013

Abstract

Kısmi en küçük kareler regresyonu (KEKKR), çoklu bağlantının olduğu durumlarda, yanıt değişkeni YNxM ile açıklayıcı değişkenler XNxK arasında modelleme yapabilen istatistiksel bir yöntemdir. Kısmi en küçük kareler bölümünde, yanıt değişkenini en iyi açıklayabilecek gizli (latent) değişkenlerin elde edilmesi için bazı model seçme kriterleri uygulanır. Gizli değişkenlerin seçiminde kullanılan genel yaklaşımlar Akaike bilgi kriteri (AIC) ve Wold’s R kriteridir.



Bu çalışmada, gözlem sayısının açıklayıcı değişken sayısından fazla olduğu durumlarda, geleneksel yöntemler AIC ve Wold’s R’a ek olarak Bayes bilgi kriteri (BIC) ve Bilgi karmaşıklık kriteri de (ICOMP) KEKKR için model seçme kriterleri olarak incelenmiştir. AIC, BIC, ICOMP ve Wold’s R model seçme kriterlerinin performansları gerçek veri örneği ve benzetim çalışması yoluyla karşılaştırılmıştır. Benzetim çalışması sonuçları, farklı örneklem büyüklükleri, farklı sayıda açıklayıcı değişken ve yanıt değişkeninin olduğu durumlarda elde edilmiştir. Yapılan benzetim çalışması sonuçları BIC ve ICOMP model seçme kriterlerinin KEKKR modelleri için, gizli değişkenin
seçiminde diğer model seçme kriterlerinden (AICveWold’s R) çok daha etkili olduklarını ve daha doğru sayıda gizli değişken seçimi yaptıklarını göstermiştir.

References

  • Abdi, H., Salkind N. 2007. Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage.
  • Akaike, H., 1974. A new look at the statistical model identification, IEEE Transaction on Automatic Control 19, 716-723
  • Bailey, C., 1994. Smart Exercise: Burning Fat, Getting Fit. Houghton-Mifflin Co., Boston, pp: 179-186.
  • Bedrick, E. J., Tsai, C. L., 1994. Model selection for multivariate regression in small samples. Biometrics 50, 226-231.
  • Behnke, A. R., J.H. Wilmore, 1974. Evaluation and Regulation of Body Build and Composition. Prentice-Hall, Englewood Cliffs, N. J., Pages: 236.
  • Boyer, K. L., Mirza, M. J., Ganguly, G., 1994. The Robust Sequential Estimator: A General Approach and its Application to Surface Organization in Range Data, IEEE PAMI 16, 987-1001.
  • Bozdoğan, H., 1987. Model Selection and Akaike's Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrica 52, 345-370.
  • Bozdoğan, H., 2000. Akaike's Information Criterion and Recent Developments in Information Complexity. Journal of Mathematical Psychology 44, 62-91.
  • Bozdoğan, H., 2004. Statistical Data Mining and Knowledge Discovery.Chapman and Hall/CRC, USA.
  • Bozdoğan, H., 2004. Intelligent statistical data mining with Information Complexity and Genetic Algorithms in Statistical Data Mining and Knowledge Discovery.Chapman and Hall/CRC, USA.
  • Clark, A. E., Troskie, C. G., 2006. Regression and ICOMP: A Simulation Study. Communications in Statistics Simulation and Computation 35, 591-603.
  • Eastment H. T., Krzanowski W. J. 1982. Cross-validatory choice of the number of components from a principal component analysis.Technometrics 24, 73-77.
  • Garthwaite, P. H., 1994. An interpretation of partial least squares, Journal of the American Statistical Association 89, 122-127.
  • Geladi, P., Kowalski, B. R., 1986. Partial least-squares regression a tutorial. Anal. Chim. Acta. 185, 1-17.
  • Haber, R., Unbenhauen, H., 1990. Structure identification of nonlinear dynamic systems-a survey on input/output approaches. Automatica 26 (4), 651-677.
  • Hastie, T., Tibshirani, R., Friedman J., 2001. The elements of statistical learning: data mining, inference, and prediction. New York, Springer.
  • Helland, I. S. 1990. Partial Least Squares Regression and Statistical Models, Scandinavian Journal of Statistics 17(2), 97-114.
  • Henry de-Graft, A. 2010. Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of an asymmetric price relationship. Journal of Development and Agricultural Economics 2(1): 001-006.
  • Jouan-Rimbaud Bouveresse, D., Rutledge, D. N., 2009. Two new extensions of principal component transform to compute a PLS2 model between two wide matrices:PCT-PLS2 and segmented PCT-PLS2. Analytica Chimica Acta. 642 (1-2), 37-44.
  • Katch, F., W. McArdle, 1977. Nutrition, Weight Control and Exercise. Houghton-MifflinCo., Boston.
  • Kanatani, K., 2002. Model Selection for Geometric Inference, The 5th Asian Conference on Computer Vision, Melbourne, Australia, pp. xxi-xxxii, January.
  • Kohavi, R., 1995. A study of cross-validation and boots trap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, San Francisco, CA, USA, pp. 1137–1143. Morgan Kaufmann Publishers Inc.
  • Kuha, J. 2004. AIC and BIC: Comparisons of assumptions and performance. Sociological Methods and Research. (33)2: 188-229.
  • Kundu, D., Murali G., 1996. Model selection in linear regression, Computational Statistics and Data Analysis 22 (5), 461-469(9).
  • Li, B., Morris, J., Martin E. B., 2002. Model Selection for Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems 64, 79-89.
  • Myung, I. J., 2000. The Importance of Complexity in Model Selection. Journal of Mathematical Psychology 44, 190-204.
  • Naes, T., Martens, H., 1985. Comparison of prediction methods for collinear data. Communication in Statistics Simulation and Computation 14, 545-576.
  • Niazi, A., Azizi, A., 2008. Orthogonal Signal Correction-Partial Least Squares Method for Simultaneous Spectrophotometric Determination of Nickel, Cobalt and Zinc. Turkish Journal of Chemistry 32, 217-228.
  • Saporta, G., 2008. Models for Understanding versus Models for Prediction. In Compstat 2008, Part IX, 315-322.
  • Schwarz, G., 1978. Estimating the Dimension of a Model. The Annals of Statistics6 (2), 461-464.
  • Siria, W. E., 1956. Gross Composition of the Body. InAdvance in Biological and Medical Physics, Lawrence J.H. and C.A. Tobias (Eds.). Academic Press, New York.
  • Torr, P. H. S., 1998. Model Selection for Two View Geometry: A Review, Model Selection for Two View Geometry: A Review, Microsoft Research, USA, Microsoft Research, USA.
  • Wang Y, Liu, Q., 2006. Comparison of Akaike information criteria (AIC) and Bayesian information criteria (BIC) in selection of stock recruitment relationships. Fisheries Research 77(2): 220-225.
  • Weakliem, L. D., 2004. Introduction to the Special Issue on Model Selection. Sociological Methods and Research. 33(2): 167-186.
  • Wilmore, J., 1976. Athletic Training and Physical Fitness: Physiological Principles of the Conditioning Process. Allynand Bacon, Inc., Boston.
  • Wold, H., 1966.Estimation of principal components and related models by iterative least squares.In P.R. Krishnaiaah (Ed.). Multivariate Analysis. (pp.391-420) New York: Academic Press.
  • Wold, H., 1982. Soft Modelling, The basic design and some extensions, in: K.- G. Jöreskog, H. Wold (Eds.), Systems Under Indirect Observation. Vols.I and II, North-Holland, Amsterdam.
  • Wold, S., Sjöström, M., Eriksson, L., 2001. PLS regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory 58, 109-130.
There are 38 citations in total.

Details

Primary Language English
Subjects Statistics
Journal Section Research Articles
Authors

Özlem Gürünlü Alma This is me

Publication Date December 13, 2013
Published in Issue Year 2013 Volume: 10 Issue: 3

Cite

APA Gürünlü Alma, Ö. (2013). Performance Comparisons of Model Selection Criteria: AIC, BIC, ICOMP and Wold’s for PLSR. İstatistik Araştırma Dergisi, 10(3), 15-34.