BibTex RIS Kaynak Göster

Aykırı Gözlemlerin Varlığında Uyarlanmış En Küçük Kovaryans Determinant Tahminine Dayalı Dayanıklı Temel Bileşenler Analizi

Yıl 2016, , 85 - 94, 08.12.2016
https://doi.org/10.17093/aj.2016.4.2.5000189525

Öz

Klasik temel bileşenler analizi (KTBA), çok değişkenli veri kümelerinde yer alabilen aykırı gözlemlere karşı dayanıklı değildir. Aykırı gözlemlerin varlığında KTBA kullanılarak elde edilen sonuçlar gerçekte olması gerekenden oldukça farklı çıkabilir. Bu yüzden, aykırı gözlemlerin varlığında PCA’nın dayanıklı versiyonlarının kullanımı tercih edilmelidir. Dayanıklı temel bileşenler elde etmek için en kolay yol konum ve ölçek parametrelerinin klasik tahminleriyle, onların dayanıklı tahminlerinin yer değiştirilmesidir. Çok değişkenli veri kümesi için konum ve ölçek parametrelerinin dayanıklı tahmini, yüksek bozulma noktası sağlayan en küçük kovaryans determinant (EKKD) yöntemi ile yapılabilir. Bu çalışmada, EKKD yöntemi, jacknife yeniden örnekleme yaklaşımı kullanılarak uyarlanıp, bu uyarlamadan kaynaklanan değişimlerin dayanıklı temel bileşenler analizi (DTBA) üzerindeki etkileri incelenmesi amaçlanmaktadır. Jackknife yeniden örnekleme yöntemine dayanan EKKD’nin aykırı gözlem oranındaki değişmelerden nasıl etkilendiği iki gerçek veri kümesi üzerinden değerlendirilmektedir. Elde edilen bulgular ışığında, önerilen uyarlanmış en küçük kovaryans determinant (UEKKD) tahminine dayalı DTBA, klasik EKKD’ye dayanan DTBA’ya göre veri kümesinde aykırı gözlemlerin varlığında daha iyi sonuçlar verdiği görülmektedir. 

Kaynakça

  • Alkan, B. B., Atakan, C., Alkan, N., (2015). “A comparison of different procedures for principal component analysis in the presence of outliers”, Journal of Applied Statistics, 42(8), 1716-1722.
  • Atkinson, A.C., (1994). “Fast Very Robust Methods for the Detection of Multiple Outliers”, J. Amer. Statist. Assoc. 89, 1329–1339.
  • Campbell, N. A., (1980). “Robust procedures in multivariate analysis I: Robust covariance estimation”, Applied statistics, 231-237.
  • Croux, C., Filzmoser, P., & Fritz, H. (2013). Robust sparse principal component analysis. Technometrics, 55(2), 202-214.
  • Croux, C., Haesbroeck G., (2000). “Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies”, Biometrika, 87, 603–618.
  • Croux, C., Ruiz-Gazen, A.,(2005). “High breakdown estimators for principal components: the projection-pursuit approach revisited”, Journal of Multivariate Analysis 95, 206–226.
  • Daudin, J.J., Duby, C., Trecourt, P., (1988). “Stability of Principal Component Analysis Studied by the Bootstrap Method;Statistics”, 19, 241–258.
  • Devlin, S. J., Gnanadesikan, R., Kettenring, J. R., (1981). “Robust estimation of dispersion matrices and principal components”, Journal of the American Statistical Association, 76(374), 354-362.
  • Farcomeni, A., Greco, L., (2015). “Robust methods for data reduction”. CRC press.
  • Filzmoser, P., Reimann, C., Garrett, R.G., (2003). “Multivariate outlier detection in exploration geochemistry”, Technical ReportTS 03–5, Department of Statistics, Vienna University of Technology, Austria.
  • Filzmoser, P., Todorov, V., (2011). “Review of robust multivariate statistical methods in high dimension”, Analytica chimica acta, 705(1), 2-14.
  • Hubert, M., Debruyne, M., (2010). “Minimum covariance determinant”, Wiley interdisciplinary reviews: Computational statistics, 2(1), 36-43.
  • Hubert, M., Engelen, S., (2004). “Robust PCA and classification in biosciences”, Bioinformatics, 20(11), 1728-1736.
  • Johnson, R., Wichern, D. (1992). “Applied multivariate statistical methods”, 3rd Edition., Prentice Hall, Englewood Cliffs, NJ.
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., (1999). “Robust principal components for functional data”, Test 8, 1–28.
  • Maronna, R., (2005). “Principal components and orthogonal regression based on robust scales”, Technometrics, 47(3), 264-273.
  • R Development Core Team, (2011). “R: A Language and Environment for Statistical Computing”, R Foundation for Statistical Computing, Vienna.
  • Riu, J., Bro, R., (2003). “Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models”, Chemometrics and Intelligent Laboratory Systems, 65(1), 35-49.
  • Rocke, D. M., Woodruff, D. L., (1996). “Identification of Outliers in Multivariate Data”, J. Amer. Statist. Assoc. 91 (435), 1047–1061.
  • Rousseeuw, P. J., (1984). “Least median of squares regression”, Journal of the American statistical association, 79(388), 871-880.
  • Rousseeuw, P. J., (1985). “Multivariate estimation with high breakdown point”, Mathematical statistics and applications, 8, 283-297.
  • Rousseeuw, P. J., Driessen, K. V., (1999). “A fast algorithm for the minimum covariance determinant estimator”, Technometrics, 41(3), 212-223.
  • Rousseeuw, P.J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke T., Maechler, M., (2009). “Robustbase: basic robust statistics”, R package version 0.4–5. Available at http://CRAN. R-project. org/package = robustbase.
  • Todorov ,V. and Filzmoser, P., (2009). “An object-oriented framework for robust multivariate analysis”, J. Statist. Softw. 32(3) (2009), 1–47.
  • Todorov, V., (2009). “rrcov: Scalable Robust Estimators with High Breakdown Point”, R package version 0.5–03, Availableat http://CRAN. R-project. org/package = rrcov.
  • Todorov, V., Neyko, N., Neytchev, P., (1994). “Stability of High Breakdown Point Robust PCA”, in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.

Robust Principal Component Analysis Based On Modified Minimum Covariance Determinant In The Presence Of Outliers

Yıl 2016, , 85 - 94, 08.12.2016
https://doi.org/10.17093/aj.2016.4.2.5000189525

Öz

Principal component analysis (PCA) is not resistant to outliers existing in multivariate data sets. The results which are obtained by using classical PCA are far from real values in the presence of outliers. Therefore, using robust versions of PCA is favorable. The easiest way to obtain robust principal components is to replace classical estimates of the location and scale parameters with their robust versions. Robust estimations of location and scale parameters can be found with minimum covariance determinant (MCD) providing high breakdown point. In this study, algorithm of MCD is modified using Jackknife resampling approach and results of this modification are examined. Proposed  robust principal component analysis (RPCA) based on modified MCD (MMCD) method that is modified using Jaccknife resampling are evaluated over two real data with different outlier ratios. In the light of obtained results, it can be said that RPCA based on MMCD is better than RPCA based on MCD in the presence of outliers.

Kaynakça

  • Alkan, B. B., Atakan, C., Alkan, N., (2015). “A comparison of different procedures for principal component analysis in the presence of outliers”, Journal of Applied Statistics, 42(8), 1716-1722.
  • Atkinson, A.C., (1994). “Fast Very Robust Methods for the Detection of Multiple Outliers”, J. Amer. Statist. Assoc. 89, 1329–1339.
  • Campbell, N. A., (1980). “Robust procedures in multivariate analysis I: Robust covariance estimation”, Applied statistics, 231-237.
  • Croux, C., Filzmoser, P., & Fritz, H. (2013). Robust sparse principal component analysis. Technometrics, 55(2), 202-214.
  • Croux, C., Haesbroeck G., (2000). “Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies”, Biometrika, 87, 603–618.
  • Croux, C., Ruiz-Gazen, A.,(2005). “High breakdown estimators for principal components: the projection-pursuit approach revisited”, Journal of Multivariate Analysis 95, 206–226.
  • Daudin, J.J., Duby, C., Trecourt, P., (1988). “Stability of Principal Component Analysis Studied by the Bootstrap Method;Statistics”, 19, 241–258.
  • Devlin, S. J., Gnanadesikan, R., Kettenring, J. R., (1981). “Robust estimation of dispersion matrices and principal components”, Journal of the American Statistical Association, 76(374), 354-362.
  • Farcomeni, A., Greco, L., (2015). “Robust methods for data reduction”. CRC press.
  • Filzmoser, P., Reimann, C., Garrett, R.G., (2003). “Multivariate outlier detection in exploration geochemistry”, Technical ReportTS 03–5, Department of Statistics, Vienna University of Technology, Austria.
  • Filzmoser, P., Todorov, V., (2011). “Review of robust multivariate statistical methods in high dimension”, Analytica chimica acta, 705(1), 2-14.
  • Hubert, M., Debruyne, M., (2010). “Minimum covariance determinant”, Wiley interdisciplinary reviews: Computational statistics, 2(1), 36-43.
  • Hubert, M., Engelen, S., (2004). “Robust PCA and classification in biosciences”, Bioinformatics, 20(11), 1728-1736.
  • Johnson, R., Wichern, D. (1992). “Applied multivariate statistical methods”, 3rd Edition., Prentice Hall, Englewood Cliffs, NJ.
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., (1999). “Robust principal components for functional data”, Test 8, 1–28.
  • Maronna, R., (2005). “Principal components and orthogonal regression based on robust scales”, Technometrics, 47(3), 264-273.
  • R Development Core Team, (2011). “R: A Language and Environment for Statistical Computing”, R Foundation for Statistical Computing, Vienna.
  • Riu, J., Bro, R., (2003). “Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models”, Chemometrics and Intelligent Laboratory Systems, 65(1), 35-49.
  • Rocke, D. M., Woodruff, D. L., (1996). “Identification of Outliers in Multivariate Data”, J. Amer. Statist. Assoc. 91 (435), 1047–1061.
  • Rousseeuw, P. J., (1984). “Least median of squares regression”, Journal of the American statistical association, 79(388), 871-880.
  • Rousseeuw, P. J., (1985). “Multivariate estimation with high breakdown point”, Mathematical statistics and applications, 8, 283-297.
  • Rousseeuw, P. J., Driessen, K. V., (1999). “A fast algorithm for the minimum covariance determinant estimator”, Technometrics, 41(3), 212-223.
  • Rousseeuw, P.J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke T., Maechler, M., (2009). “Robustbase: basic robust statistics”, R package version 0.4–5. Available at http://CRAN. R-project. org/package = robustbase.
  • Todorov ,V. and Filzmoser, P., (2009). “An object-oriented framework for robust multivariate analysis”, J. Statist. Softw. 32(3) (2009), 1–47.
  • Todorov, V., (2009). “rrcov: Scalable Robust Estimators with High Breakdown Point”, R package version 0.5–03, Availableat http://CRAN. R-project. org/package = rrcov.
  • Todorov, V., Neyko, N., Neytchev, P., (1994). “Stability of High Breakdown Point Robust PCA”, in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.
Toplam 26 adet kaynakça vardır.

Ayrıntılar

Bölüm Makaleler
Yazarlar

B. Barış Alkan

Yayımlanma Tarihi 8 Aralık 2016
Gönderilme Tarihi 16 Mayıs 2016
Yayımlandığı Sayı Yıl 2016

Kaynak Göster

APA Alkan, B. B. (2016). Robust Principal Component Analysis Based On Modified Minimum Covariance Determinant In The Presence Of Outliers. Alphanumeric Journal, 4(2), 85-94. https://doi.org/10.17093/aj.2016.4.2.5000189525

Cited By

Robust Principal Component Analysis based on Fuzzy Coded Data
ANADOLU UNIVERSITY JOURNAL OF SCIENCE AND TECHNOLOGY A - Applied Sciences and Engineering
Baris Alkan
https://doi.org/10.18038/aubtda.317765

Alphanumeric Journal is hosted on DergiPark, a web based online submission and peer review system powered by TUBİTAK ULAKBIM.

Alphanumeric Journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License