Research Article
BibTex RIS Cite

Robust Principal Component Analysis based on Fuzzy Coded Data

Year 2017, Volume: 18 Issue: 3, 754 - 762, 30.09.2017
https://doi.org/10.18038/aubtda.317765

Abstract

In the presence of outliers in the dataset, the principal component analysis method, like
many of the classical statistical methods, is severely affected. For this
reason, if there are outliers in dataset, researchers tend to use alternative
methods.

Use of fuzzy and robust approaches is the leading choice among these methods. In this study, a new approach to
robust fuzzy principal component analysis is proposed. This approach
combines the power of both robust and fuzzy methods at the same time and
collects these two approaches under the framework of principal component
analysis. The performance of proposed approach called robust principal
component analysis based on fuzzy coded data is examined through a set of
artificial dataset that are generated by considering three different scenarios
and a real dataset to
observe how it is affected by the increase in sample size and changes in the
rate of outliers. In light of the study's findings, it is seen that the
proposed approach gives better results than the ones in the classical and
robust principal component analysis in the presence of outliers in dataset.

References

  • Alkan, B. B. (2016). Robust Principal Component Analysis Based On Modified Minimum Covariance Determinant In The Presence Of Outliers (in Turkish). Alphanumeric Journal, 4(2).
  • Alkan, B. B., Atakan, C., Alkan, N., (2015). A comparison of different procedures for principal component analysis in the presence of outliers, Journal of Applied Statistics, 42(8), 1716-1722.
  • Asan, Z., & Greenacre, M. (2011). Biplots of fuzzy coded data. Fuzzy sets and Systems, 183(1), 57-71.
  • Asan, Z., & Senturk, S. (2011). An Application of Fuzzy Coding in Multiple Correspondence Analysis for Transforming Data from Continuous to Categorical. Journal of Multiple-Valued Logic & Soft Computing, 17.
  • Atkinson, A.C., (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, J. Amer. Statist. Assoc. 89, 1329–1339.
  • Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
  • Calcagnì, A., Lombardi, L., & Pascali, E. (2016). A dimension reduction technique for two-mode non-convex fuzzy data. Soft Computing, 20(2), 749-762.
  • Campbell, N. A., (1980). Robust procedures in multivariate analysis I: Robust covariance estimation, Applied statistics, 231-237.
  • Croux, C., Haesbroeck G., (2000). Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies, Biometrika, 87, 603–618.
  • Croux, C., Ruiz-Gazen, A.,(2005). High breakdown estimators for principal components: the projection-pursuit approach revisited, Journal of Multivariate Analysis 95, 206–226.
  • Croux, C., Filzmoser, P., & Fritz, H. (2013). Robust sparse principal component analysis. Technometrics, 55(2), 202-214.
  • Devlin, S. J., Gnanadesikan, R., Kettenring, J. R., (1981). Robust estimation of dispersion matrices and principal components, Journal of the American Statistical Association, 76(374), 354-362.
  • Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246
  • Dumitrescu, D., S [acaron] rbu, C., & Pop, H. (1994). A fuzzy divisive hierarchical clustering algorithm for the optimal choice of sets of solvent systems. Analytical letters, 27(5), 1031-1054.
  • Farcomeni, A., Greco, L., (2015). Robust methods for data reduction. CRC press.
  • Filzmoser, P., & Todorov, V. (2011). Review of robust multivariate statistical methods in high dimension. Analytica chimica acta, 705(1), 2-14.
  • Filzmoser, P., Reimann, C., Garrett, R.G., (2003). Multivariate outlier detection in exploration geochemistry, Technical ReportTS 03–5, Department of Statistics, Vienna University of Technology, Austria.
  • Guitonneau, G. G., & Roux, M. (1977). Sur la taxinomie du genre Erodium. Les cahiers de l'analyse des données, 2(1), 97-113.
  • Hubert, M., Engelen, S., (2004). Robust PCA and classification in biosciences, Bioinformatics, 20(11), 1728-1736.
  • Hubert, M., Reynkens, T., Schmitt, E., & Verdonck, T. (2016). Sparse PCA for high-dimensional data with outliers. Technometrics, 58(4), 424-434.
  • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing; a computational approach to learning and machine intelligence.
  • Johnson, R., Wichern, D. (1992). Applied multivariate statistical methods,3rd Edi., Prentice Hall, Englewood Cliffs, NJ.
  • Lauro CN, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15(1):73–87
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., (1999). Robust principal components for functional data, Test 8, 1–28.
  • Maronna, R., (2005). Principal components and orthogonal regression based on robust scales, Technometrics, 47(3), 264-273.
  • Pop, H. F., Sârbu, C., Horowitz, O., & Dumitrescu, D. (1996). A Fuzzy Classification of the Chemical Elements⊥. Journal of chemical information and computer sciences, 36(3), 465-482.
  • Rocke, D. M., Woodruff, D. L., (1996). Identification of Outliers in Multivariate Data, J. Amer. Statist. Assoc. 91 (435), 1047–1061.
  • Rousseeuw, P. J., (1984). Least median of squares regression, Journal of the American statistical association, 79(388), 871-880.
  • Rousseeuw, P. J., (1985). Multivariate estimation with high breakdown point, Mathematical statistics and applications, 8, 283-297.
  • Rousseeuw, P. J., Driessen, K. V., (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • R Development Core Team, (2011). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna.
  • Rousseeuw, P.J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke T., Maechler, M., (2009). Robustbase: basic robust statistics, R package version 0.4–5. Available at http://CRAN. R-project. org/package = robustbase.
  • Sârbu, C., & Pop, H. F. (2000). Fuzzy clustering analysis of the first 10 MEIC chemicals. Chemosphere, 40(5), 513-520.
  • Sârbu, C., & Pop, H. F. (2001). Fuzzy robust estimation of central location. Talanta, 54(1), 125-130.
  • Sarbu, C., & Pop, H. F. (2004). Fuzzy soft-computing methods and their applications in chemistry. Reviews in Computational Chemistry, 20, 249.
  • Sarbu, C., & Pop, H. F. (2005). Principal component analysis versus fuzzy principal component analysis: a case study: the quality of Danube water (1985–1996). Talanta, 65(5), 1215-1220.
  • Smithson, M., & Verkuilen, J. (2006). Fuzzy set theory: Applications in the social sciences (No. 147). Sage.
  • Taheri SM (2003) Trends in fuzzy statistics. Austrian J Stat 32(3):239– 257
  • Todorov, V., Neyko, N., Neytchev, P., (1994). Stability of High Breakdown Point Robust PCA, in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.
  • Todorov, V., (2009). rrcov: Scalable Robust Estimators with High Breakdown Point, R package version 0.5–03, Availableat http://CRAN. R-project. org/package = rrcov.
  • Todorov ,V. and Filzmoser, P., (2009). An object-oriented framework for robust multivariate analysis, J. Statist. Softw. 32(3) (2009), 1–47.
  • Viertl R (2011) Statistical methods for fuzzy data. Wiley
  • Yang, T. N., & Wang, S. D. (1999). Robust algorithms for principal component analysis. Pattern Recognition Letters, 20(9), 927-933. Zimmermann HJ (2001) Fuzzy set theory-and its applications. Springer
Year 2017, Volume: 18 Issue: 3, 754 - 762, 30.09.2017
https://doi.org/10.18038/aubtda.317765

Abstract

References

  • Alkan, B. B. (2016). Robust Principal Component Analysis Based On Modified Minimum Covariance Determinant In The Presence Of Outliers (in Turkish). Alphanumeric Journal, 4(2).
  • Alkan, B. B., Atakan, C., Alkan, N., (2015). A comparison of different procedures for principal component analysis in the presence of outliers, Journal of Applied Statistics, 42(8), 1716-1722.
  • Asan, Z., & Greenacre, M. (2011). Biplots of fuzzy coded data. Fuzzy sets and Systems, 183(1), 57-71.
  • Asan, Z., & Senturk, S. (2011). An Application of Fuzzy Coding in Multiple Correspondence Analysis for Transforming Data from Continuous to Categorical. Journal of Multiple-Valued Logic & Soft Computing, 17.
  • Atkinson, A.C., (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, J. Amer. Statist. Assoc. 89, 1329–1339.
  • Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
  • Calcagnì, A., Lombardi, L., & Pascali, E. (2016). A dimension reduction technique for two-mode non-convex fuzzy data. Soft Computing, 20(2), 749-762.
  • Campbell, N. A., (1980). Robust procedures in multivariate analysis I: Robust covariance estimation, Applied statistics, 231-237.
  • Croux, C., Haesbroeck G., (2000). Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies, Biometrika, 87, 603–618.
  • Croux, C., Ruiz-Gazen, A.,(2005). High breakdown estimators for principal components: the projection-pursuit approach revisited, Journal of Multivariate Analysis 95, 206–226.
  • Croux, C., Filzmoser, P., & Fritz, H. (2013). Robust sparse principal component analysis. Technometrics, 55(2), 202-214.
  • Devlin, S. J., Gnanadesikan, R., Kettenring, J. R., (1981). Robust estimation of dispersion matrices and principal components, Journal of the American Statistical Association, 76(374), 354-362.
  • Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246
  • Dumitrescu, D., S [acaron] rbu, C., & Pop, H. (1994). A fuzzy divisive hierarchical clustering algorithm for the optimal choice of sets of solvent systems. Analytical letters, 27(5), 1031-1054.
  • Farcomeni, A., Greco, L., (2015). Robust methods for data reduction. CRC press.
  • Filzmoser, P., & Todorov, V. (2011). Review of robust multivariate statistical methods in high dimension. Analytica chimica acta, 705(1), 2-14.
  • Filzmoser, P., Reimann, C., Garrett, R.G., (2003). Multivariate outlier detection in exploration geochemistry, Technical ReportTS 03–5, Department of Statistics, Vienna University of Technology, Austria.
  • Guitonneau, G. G., & Roux, M. (1977). Sur la taxinomie du genre Erodium. Les cahiers de l'analyse des données, 2(1), 97-113.
  • Hubert, M., Engelen, S., (2004). Robust PCA and classification in biosciences, Bioinformatics, 20(11), 1728-1736.
  • Hubert, M., Reynkens, T., Schmitt, E., & Verdonck, T. (2016). Sparse PCA for high-dimensional data with outliers. Technometrics, 58(4), 424-434.
  • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing; a computational approach to learning and machine intelligence.
  • Johnson, R., Wichern, D. (1992). Applied multivariate statistical methods,3rd Edi., Prentice Hall, Englewood Cliffs, NJ.
  • Lauro CN, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15(1):73–87
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., (1999). Robust principal components for functional data, Test 8, 1–28.
  • Maronna, R., (2005). Principal components and orthogonal regression based on robust scales, Technometrics, 47(3), 264-273.
  • Pop, H. F., Sârbu, C., Horowitz, O., & Dumitrescu, D. (1996). A Fuzzy Classification of the Chemical Elements⊥. Journal of chemical information and computer sciences, 36(3), 465-482.
  • Rocke, D. M., Woodruff, D. L., (1996). Identification of Outliers in Multivariate Data, J. Amer. Statist. Assoc. 91 (435), 1047–1061.
  • Rousseeuw, P. J., (1984). Least median of squares regression, Journal of the American statistical association, 79(388), 871-880.
  • Rousseeuw, P. J., (1985). Multivariate estimation with high breakdown point, Mathematical statistics and applications, 8, 283-297.
  • Rousseeuw, P. J., Driessen, K. V., (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • R Development Core Team, (2011). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna.
  • Rousseeuw, P.J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke T., Maechler, M., (2009). Robustbase: basic robust statistics, R package version 0.4–5. Available at http://CRAN. R-project. org/package = robustbase.
  • Sârbu, C., & Pop, H. F. (2000). Fuzzy clustering analysis of the first 10 MEIC chemicals. Chemosphere, 40(5), 513-520.
  • Sârbu, C., & Pop, H. F. (2001). Fuzzy robust estimation of central location. Talanta, 54(1), 125-130.
  • Sarbu, C., & Pop, H. F. (2004). Fuzzy soft-computing methods and their applications in chemistry. Reviews in Computational Chemistry, 20, 249.
  • Sarbu, C., & Pop, H. F. (2005). Principal component analysis versus fuzzy principal component analysis: a case study: the quality of Danube water (1985–1996). Talanta, 65(5), 1215-1220.
  • Smithson, M., & Verkuilen, J. (2006). Fuzzy set theory: Applications in the social sciences (No. 147). Sage.
  • Taheri SM (2003) Trends in fuzzy statistics. Austrian J Stat 32(3):239– 257
  • Todorov, V., Neyko, N., Neytchev, P., (1994). Stability of High Breakdown Point Robust PCA, in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.
  • Todorov, V., (2009). rrcov: Scalable Robust Estimators with High Breakdown Point, R package version 0.5–03, Availableat http://CRAN. R-project. org/package = rrcov.
  • Todorov ,V. and Filzmoser, P., (2009). An object-oriented framework for robust multivariate analysis, J. Statist. Softw. 32(3) (2009), 1–47.
  • Viertl R (2011) Statistical methods for fuzzy data. Wiley
  • Yang, T. N., & Wang, S. D. (1999). Robust algorithms for principal component analysis. Pattern Recognition Letters, 20(9), 927-933. Zimmermann HJ (2001) Fuzzy set theory-and its applications. Springer
There are 43 citations in total.

Details

Subjects Engineering
Journal Section Articles
Authors

B. Barış Alkan

Sevgi Ganık This is me

Publication Date September 30, 2017
Published in Issue Year 2017 Volume: 18 Issue: 3

Cite

APA Alkan, B. B., & Ganık, S. (2017). Robust Principal Component Analysis based on Fuzzy Coded Data. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, 18(3), 754-762. https://doi.org/10.18038/aubtda.317765
AMA Alkan BB, Ganık S. Robust Principal Component Analysis based on Fuzzy Coded Data. AUJST-A. September 2017;18(3):754-762. doi:10.18038/aubtda.317765
Chicago Alkan, B. Barış, and Sevgi Ganık. “Robust Principal Component Analysis Based on Fuzzy Coded Data”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 18, no. 3 (September 2017): 754-62. https://doi.org/10.18038/aubtda.317765.
EndNote Alkan BB, Ganık S (September 1, 2017) Robust Principal Component Analysis based on Fuzzy Coded Data. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 18 3 754–762.
IEEE B. B. Alkan and S. Ganık, “Robust Principal Component Analysis based on Fuzzy Coded Data”, AUJST-A, vol. 18, no. 3, pp. 754–762, 2017, doi: 10.18038/aubtda.317765.
ISNAD Alkan, B. Barış - Ganık, Sevgi. “Robust Principal Component Analysis Based on Fuzzy Coded Data”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 18/3 (September 2017), 754-762. https://doi.org/10.18038/aubtda.317765.
JAMA Alkan BB, Ganık S. Robust Principal Component Analysis based on Fuzzy Coded Data. AUJST-A. 2017;18:754–762.
MLA Alkan, B. Barış and Sevgi Ganık. “Robust Principal Component Analysis Based on Fuzzy Coded Data”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, vol. 18, no. 3, 2017, pp. 754-62, doi:10.18038/aubtda.317765.
Vancouver Alkan BB, Ganık S. Robust Principal Component Analysis based on Fuzzy Coded Data. AUJST-A. 2017;18(3):754-62.