Research Article
BibTex RIS Cite

Determining The Number of Principal Components with Schur's Theorem in Principal Component Analysis

Year 2023, , 299 - 306, 27.06.2023
https://doi.org/10.17798/bitlisfen.1144360

Abstract

Principal Component Analysis is a method for reducing the dimensionality of datasets while also limiting information loss. It accomplishes this by producing uncorrelated variables that maximize variance one after the other. The accepted criterion for evaluating a Principal Component’s (PC) performance is λ_j/tr(S) where tr(S) denotes the trace of the covariance matrix S. It is standard procedure to determine how many PCs should be maintained using a predetermined percentage of the total variance. In this study, the diagonal elements of the covariance matrix are used instead of the eigenvalues to determine how many PCs need to be considered to obtain the defined threshold of the total variance. For this, an approach which uses one of the important theorems of majorization theory is proposed. Based on the tests, this approach lowers the computational costs.

References

  • [1] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559-572, 1901.
  • [2] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.
  • [3] I. T. Jolliffe, "Graphical representation of data using principal components," Principal component analysis, pp. 78-110, 2002.
  • [4] T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The elements of statistical learning: Springer, pp. 485-585, 2009.
  • [5] C. Hafemeister and R. Satija, "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression," Genome biology, vol. 20, no. 1, pp. 1-15, 2019.
  • [6] L. McInnes, J. Healy, and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
  • [7] M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for machine learning. Cambridge University Press, 2020.
  • [8] J. Wilson Black, J. Brand, J. Hay, and L. Clark, "Using principal component analysis to explore co-variation of vowels," Language and Linguistics Compass, vol. 17, no. 1, p. e12479, 2023.
  • [9] I. Świetlicka, W. Kuniszyk-Jóźkowiak, and M. Świetlicki, "Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition," Sensors, vol. 22, no. 1, p. 321, 2022.
  • [10] Y. Zhang and Y. Wang, "Forecasting crude oil futures market returns: A principal component analysis combination approach," International Journal of Forecasting, vol. 39, no. 2, pp. 659-673, 2023.
  • [11] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, "Principal Component Analysis in ECG Signal Processing," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 074580, 2007.
  • [12] D.-Y. Tzeng and R. S. Berns, "A review of principal component analysis and its applications to color technology," Color Research & Application, vol. 30, no. 2, pp. 84-98, 2005.
  • [13] O. H. J. Christie, "Introduction to multivariate methodology, an alternative way?," Chemometrics and Intelligent Laboratory Systems, vol. 29, no. 2, pp. 177-188, 1995.
  • [14] M. Ghil et al., "Advanced Spectral Methods for Clımatic Time Series," Reviews of Geophysics, vol. 40, no. 1, pp. 3-1-3-41, 2002.
  • [15] J. Hwang et al., "Fast and sensitive recognition of various explosive compounds using Raman spectroscopy and principal component analysis," Journal of Molecular Structure, vol. 1039, pp. 130-136, 2013.
  • [16] P. Federolf, R. Reid, M. Gilgien, P. Haugen, and G. Smith, "The application of principal component analysis to quantify technique in sports," Scandinavian Journal of Medicine & Science in Sports, vol. 24, no. 3, pp. 491-499, 2014.
  • [17] L. Ferré, "Selection of components in principal component analysis: A comparison of methods," Computational Statistics & Data Analysis, vol. 19, no. 6, pp. 669-682, 1995.
  • [18] E. Saccenti and J. Camacho, "Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods," Chemometrics and Intelligent Laboratory Systems, vol. 149, pp. 99-116, 2015.
  • [19] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, vol. 49, no. 4, pp. 974-997, 2005.
  • [20] D. A. Jackson, "Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches," Ecology, vol. 74, no. 8, pp. 2204-2214, 1993.
  • [21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016.
  • [22] F. Zhang, Matrix theory: basic results and techniques. Springer, 2011.
  • [23] K. Nakai and M. Kanehisa, "Expert system for predicting protein localization sites in gram-negative bacteria," (in eng), Proteins, vol. 11, no. 2, pp. 95-110, 1991.
  • [24] K. Nakai and M. Kanehisa, "A knowledge base for predicting protein localization sites in eukaryotic cells," (in eng), Genomics, vol. 14, no. 4, pp. 897-911, Dec 1992.
  • [25] G. Scalabrini Sampaio, A. R. d. A. Vallim Filho, L. Santos da Silva, and L. Augusto da Silva, "Prediction of Motor Failure Time Using An Artificial Neural Network," Sensors, vol. 19, no. 19, p. 4342, 2019.
  • [26] M. Patrício et al., "Using Resistin, glucose, age and BMI to predict the presence of breast cancer," BMC Cancer, vol. 18, no. 1, p. 29, 2018.
  • [27] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá, and L. Pereira-Leite, "SisPorto 2.0: a program for automated analysis of cardiotocograms," (in eng), J Matern Fetal Med, vol. 9, no. 5, pp. 311-8, Sep-Oct 2000.
  • [28] P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems, vol. 60, pp. 126-140, 2014.
  • [29] H. Kaya and P. Tufekci, Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. 2012.
Year 2023, , 299 - 306, 27.06.2023
https://doi.org/10.17798/bitlisfen.1144360

Abstract

References

  • [1] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559-572, 1901.
  • [2] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.
  • [3] I. T. Jolliffe, "Graphical representation of data using principal components," Principal component analysis, pp. 78-110, 2002.
  • [4] T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The elements of statistical learning: Springer, pp. 485-585, 2009.
  • [5] C. Hafemeister and R. Satija, "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression," Genome biology, vol. 20, no. 1, pp. 1-15, 2019.
  • [6] L. McInnes, J. Healy, and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
  • [7] M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for machine learning. Cambridge University Press, 2020.
  • [8] J. Wilson Black, J. Brand, J. Hay, and L. Clark, "Using principal component analysis to explore co-variation of vowels," Language and Linguistics Compass, vol. 17, no. 1, p. e12479, 2023.
  • [9] I. Świetlicka, W. Kuniszyk-Jóźkowiak, and M. Świetlicki, "Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition," Sensors, vol. 22, no. 1, p. 321, 2022.
  • [10] Y. Zhang and Y. Wang, "Forecasting crude oil futures market returns: A principal component analysis combination approach," International Journal of Forecasting, vol. 39, no. 2, pp. 659-673, 2023.
  • [11] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, "Principal Component Analysis in ECG Signal Processing," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 074580, 2007.
  • [12] D.-Y. Tzeng and R. S. Berns, "A review of principal component analysis and its applications to color technology," Color Research & Application, vol. 30, no. 2, pp. 84-98, 2005.
  • [13] O. H. J. Christie, "Introduction to multivariate methodology, an alternative way?," Chemometrics and Intelligent Laboratory Systems, vol. 29, no. 2, pp. 177-188, 1995.
  • [14] M. Ghil et al., "Advanced Spectral Methods for Clımatic Time Series," Reviews of Geophysics, vol. 40, no. 1, pp. 3-1-3-41, 2002.
  • [15] J. Hwang et al., "Fast and sensitive recognition of various explosive compounds using Raman spectroscopy and principal component analysis," Journal of Molecular Structure, vol. 1039, pp. 130-136, 2013.
  • [16] P. Federolf, R. Reid, M. Gilgien, P. Haugen, and G. Smith, "The application of principal component analysis to quantify technique in sports," Scandinavian Journal of Medicine & Science in Sports, vol. 24, no. 3, pp. 491-499, 2014.
  • [17] L. Ferré, "Selection of components in principal component analysis: A comparison of methods," Computational Statistics & Data Analysis, vol. 19, no. 6, pp. 669-682, 1995.
  • [18] E. Saccenti and J. Camacho, "Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods," Chemometrics and Intelligent Laboratory Systems, vol. 149, pp. 99-116, 2015.
  • [19] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, vol. 49, no. 4, pp. 974-997, 2005.
  • [20] D. A. Jackson, "Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches," Ecology, vol. 74, no. 8, pp. 2204-2214, 1993.
  • [21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016.
  • [22] F. Zhang, Matrix theory: basic results and techniques. Springer, 2011.
  • [23] K. Nakai and M. Kanehisa, "Expert system for predicting protein localization sites in gram-negative bacteria," (in eng), Proteins, vol. 11, no. 2, pp. 95-110, 1991.
  • [24] K. Nakai and M. Kanehisa, "A knowledge base for predicting protein localization sites in eukaryotic cells," (in eng), Genomics, vol. 14, no. 4, pp. 897-911, Dec 1992.
  • [25] G. Scalabrini Sampaio, A. R. d. A. Vallim Filho, L. Santos da Silva, and L. Augusto da Silva, "Prediction of Motor Failure Time Using An Artificial Neural Network," Sensors, vol. 19, no. 19, p. 4342, 2019.
  • [26] M. Patrício et al., "Using Resistin, glucose, age and BMI to predict the presence of breast cancer," BMC Cancer, vol. 18, no. 1, p. 29, 2018.
  • [27] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá, and L. Pereira-Leite, "SisPorto 2.0: a program for automated analysis of cardiotocograms," (in eng), J Matern Fetal Med, vol. 9, no. 5, pp. 311-8, Sep-Oct 2000.
  • [28] P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems, vol. 60, pp. 126-140, 2014.
  • [29] H. Kaya and P. Tufekci, Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. 2012.
There are 29 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Araştırma Makalesi
Authors

Cihan Karakuzulu 0000-0001-9306-6276

İbrahim Halil Gümüş 0000-0002-3071-1159

Serkan Güldal 0000-0002-4247-0786

Mustafa Yavaş 0000-0002-9111-9095

Early Pub Date June 27, 2023
Publication Date June 27, 2023
Submission Date July 17, 2022
Acceptance Date February 23, 2023
Published in Issue Year 2023

Cite

IEEE C. Karakuzulu, İ. H. Gümüş, S. Güldal, and M. Yavaş, “Determining The Number of Principal Components with Schur’s Theorem in Principal Component Analysis”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 12, no. 2, pp. 299–306, 2023, doi: 10.17798/bitlisfen.1144360.

Bitlis Eren University
Journal of Science Editor
Bitlis Eren University Graduate Institute
Bes Minare Mah. Ahmet Eren Bulvari, Merkez Kampus, 13000 BITLIS