Determining The Number of Principal Components with Schur's Theorem in Principal Component Analysis
Yıl 2023,
, 299 - 306, 27.06.2023
Cihan Karakuzulu
,
İbrahim Halil Gümüş
,
Serkan Güldal
,
Mustafa Yavaş
Öz
Principal Component Analysis is a method for reducing the dimensionality of datasets while also limiting information loss. It accomplishes this by producing uncorrelated variables that maximize variance one after the other. The accepted criterion for evaluating a Principal Component’s (PC) performance is λ_j/tr(S) where tr(S) denotes the trace of the covariance matrix S. It is standard procedure to determine how many PCs should be maintained using a predetermined percentage of the total variance. In this study, the diagonal elements of the covariance matrix are used instead of the eigenvalues to determine how many PCs need to be considered to obtain the defined threshold of the total variance. For this, an approach which uses one of the important theorems of majorization theory is proposed. Based on the tests, this approach lowers the computational costs.
Kaynakça
- [1] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559-572, 1901.
- [2] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.
- [3] I. T. Jolliffe, "Graphical representation of data using principal components," Principal component analysis, pp. 78-110, 2002.
- [4] T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The elements of statistical learning: Springer, pp. 485-585, 2009.
- [5] C. Hafemeister and R. Satija, "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression," Genome biology, vol. 20, no. 1, pp. 1-15, 2019.
- [6] L. McInnes, J. Healy, and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
- [7] M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for machine learning. Cambridge University Press, 2020.
- [8] J. Wilson Black, J. Brand, J. Hay, and L. Clark, "Using principal component analysis to explore co-variation of vowels," Language and Linguistics Compass, vol. 17, no. 1, p. e12479, 2023.
- [9] I. Świetlicka, W. Kuniszyk-Jóźkowiak, and M. Świetlicki, "Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition," Sensors, vol. 22, no. 1, p. 321, 2022.
- [10] Y. Zhang and Y. Wang, "Forecasting crude oil futures market returns: A principal component analysis combination approach," International Journal of Forecasting, vol. 39, no. 2, pp. 659-673, 2023.
- [11] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, "Principal Component Analysis in ECG Signal Processing," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 074580, 2007.
- [12] D.-Y. Tzeng and R. S. Berns, "A review of principal component analysis and its applications to color technology," Color Research & Application, vol. 30, no. 2, pp. 84-98, 2005.
- [13] O. H. J. Christie, "Introduction to multivariate methodology, an alternative way?," Chemometrics and Intelligent Laboratory Systems, vol. 29, no. 2, pp. 177-188, 1995.
- [14] M. Ghil et al., "Advanced Spectral Methods for Clımatic Time Series," Reviews of Geophysics, vol. 40, no. 1, pp. 3-1-3-41, 2002.
- [15] J. Hwang et al., "Fast and sensitive recognition of various explosive compounds using Raman spectroscopy and principal component analysis," Journal of Molecular Structure, vol. 1039, pp. 130-136, 2013.
- [16] P. Federolf, R. Reid, M. Gilgien, P. Haugen, and G. Smith, "The application of principal component analysis to quantify technique in sports," Scandinavian Journal of Medicine & Science in Sports, vol. 24, no. 3, pp. 491-499, 2014.
- [17] L. Ferré, "Selection of components in principal component analysis: A comparison of methods," Computational Statistics & Data Analysis, vol. 19, no. 6, pp. 669-682, 1995.
- [18] E. Saccenti and J. Camacho, "Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods," Chemometrics and Intelligent Laboratory Systems, vol. 149, pp. 99-116, 2015.
- [19] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, vol. 49, no. 4, pp. 974-997, 2005.
- [20] D. A. Jackson, "Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches," Ecology, vol. 74, no. 8, pp. 2204-2214, 1993.
- [21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016.
- [22] F. Zhang, Matrix theory: basic results and techniques. Springer, 2011.
- [23] K. Nakai and M. Kanehisa, "Expert system for predicting protein localization sites in gram-negative bacteria," (in eng), Proteins, vol. 11, no. 2, pp. 95-110, 1991.
- [24] K. Nakai and M. Kanehisa, "A knowledge base for predicting protein localization sites in eukaryotic cells," (in eng), Genomics, vol. 14, no. 4, pp. 897-911, Dec 1992.
- [25] G. Scalabrini Sampaio, A. R. d. A. Vallim Filho, L. Santos da Silva, and L. Augusto da Silva, "Prediction of Motor Failure Time Using An Artificial Neural Network," Sensors, vol. 19, no. 19, p. 4342, 2019.
- [26] M. Patrício et al., "Using Resistin, glucose, age and BMI to predict the presence of breast cancer," BMC Cancer, vol. 18, no. 1, p. 29, 2018.
- [27] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá, and L. Pereira-Leite, "SisPorto 2.0: a program for automated analysis of cardiotocograms," (in eng), J Matern Fetal Med, vol. 9, no. 5, pp. 311-8, Sep-Oct 2000.
- [28] P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems, vol. 60, pp. 126-140, 2014.
- [29] H. Kaya and P. Tufekci, Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. 2012.
Yıl 2023,
, 299 - 306, 27.06.2023
Cihan Karakuzulu
,
İbrahim Halil Gümüş
,
Serkan Güldal
,
Mustafa Yavaş
Kaynakça
- [1] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559-572, 1901.
- [2] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.
- [3] I. T. Jolliffe, "Graphical representation of data using principal components," Principal component analysis, pp. 78-110, 2002.
- [4] T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The elements of statistical learning: Springer, pp. 485-585, 2009.
- [5] C. Hafemeister and R. Satija, "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression," Genome biology, vol. 20, no. 1, pp. 1-15, 2019.
- [6] L. McInnes, J. Healy, and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
- [7] M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for machine learning. Cambridge University Press, 2020.
- [8] J. Wilson Black, J. Brand, J. Hay, and L. Clark, "Using principal component analysis to explore co-variation of vowels," Language and Linguistics Compass, vol. 17, no. 1, p. e12479, 2023.
- [9] I. Świetlicka, W. Kuniszyk-Jóźkowiak, and M. Świetlicki, "Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition," Sensors, vol. 22, no. 1, p. 321, 2022.
- [10] Y. Zhang and Y. Wang, "Forecasting crude oil futures market returns: A principal component analysis combination approach," International Journal of Forecasting, vol. 39, no. 2, pp. 659-673, 2023.
- [11] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, "Principal Component Analysis in ECG Signal Processing," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 074580, 2007.
- [12] D.-Y. Tzeng and R. S. Berns, "A review of principal component analysis and its applications to color technology," Color Research & Application, vol. 30, no. 2, pp. 84-98, 2005.
- [13] O. H. J. Christie, "Introduction to multivariate methodology, an alternative way?," Chemometrics and Intelligent Laboratory Systems, vol. 29, no. 2, pp. 177-188, 1995.
- [14] M. Ghil et al., "Advanced Spectral Methods for Clımatic Time Series," Reviews of Geophysics, vol. 40, no. 1, pp. 3-1-3-41, 2002.
- [15] J. Hwang et al., "Fast and sensitive recognition of various explosive compounds using Raman spectroscopy and principal component analysis," Journal of Molecular Structure, vol. 1039, pp. 130-136, 2013.
- [16] P. Federolf, R. Reid, M. Gilgien, P. Haugen, and G. Smith, "The application of principal component analysis to quantify technique in sports," Scandinavian Journal of Medicine & Science in Sports, vol. 24, no. 3, pp. 491-499, 2014.
- [17] L. Ferré, "Selection of components in principal component analysis: A comparison of methods," Computational Statistics & Data Analysis, vol. 19, no. 6, pp. 669-682, 1995.
- [18] E. Saccenti and J. Camacho, "Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods," Chemometrics and Intelligent Laboratory Systems, vol. 149, pp. 99-116, 2015.
- [19] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, vol. 49, no. 4, pp. 974-997, 2005.
- [20] D. A. Jackson, "Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches," Ecology, vol. 74, no. 8, pp. 2204-2214, 1993.
- [21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016.
- [22] F. Zhang, Matrix theory: basic results and techniques. Springer, 2011.
- [23] K. Nakai and M. Kanehisa, "Expert system for predicting protein localization sites in gram-negative bacteria," (in eng), Proteins, vol. 11, no. 2, pp. 95-110, 1991.
- [24] K. Nakai and M. Kanehisa, "A knowledge base for predicting protein localization sites in eukaryotic cells," (in eng), Genomics, vol. 14, no. 4, pp. 897-911, Dec 1992.
- [25] G. Scalabrini Sampaio, A. R. d. A. Vallim Filho, L. Santos da Silva, and L. Augusto da Silva, "Prediction of Motor Failure Time Using An Artificial Neural Network," Sensors, vol. 19, no. 19, p. 4342, 2019.
- [26] M. Patrício et al., "Using Resistin, glucose, age and BMI to predict the presence of breast cancer," BMC Cancer, vol. 18, no. 1, p. 29, 2018.
- [27] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá, and L. Pereira-Leite, "SisPorto 2.0: a program for automated analysis of cardiotocograms," (in eng), J Matern Fetal Med, vol. 9, no. 5, pp. 311-8, Sep-Oct 2000.
- [28] P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems, vol. 60, pp. 126-140, 2014.
- [29] H. Kaya and P. Tufekci, Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. 2012.