Değişkenlerin Kategori Sayılarının ve Dağılımlarının Korelasyon Katsayılarına Etkisi

Abdullah Faruk Kılıç

doi:10.12984/egeefd.890104

Research Article

BibTex

RIS

Cite

Değişkenlerin Kategori Sayılarının ve Dağılımlarının Korelasyon Katsayılarına Etkisi

Year 2022, Volume: 23 Issue: 1, 50 - 80, 31.03.2022

Abdullah Faruk Kılıç

https://doi.org/10.12984/egeefd.890104

Cited By: 6

Abstract

Korelasyon katsayıları birçok bilim alanında kullanılmaktadır. Bilim alanlarına göre kullanılan değişkenlerin tipleri de farklılaşabilmektedir. Bu araştırmada farklı örneklem büyüklüklerinde değişkenlerin kategori sayısı ve çarpıklığının korelasyon katsayılarına etkisinin incelenmesi amaçlanmıştır. Bu amaç doğrultusunda gerçekleştirilen Monte Carlo simülasyon çalışmasıyla polikorik / tetrakorik, Pearson momentler çarpımı (PMÇ), Spearman’ın sıra farkları (rho), Kendall’ın Tau, Goodman-Kruskal Gamma ve Lambda katsayıları karşılaştırılmıştır. Araştırma sonucunda polikorik / tetrakorik korelasyon katsayısının diğer yöntemlere göre daha yansız sonuçlar verdiği gözlenmiştir. Kategori sayısının artmasıyla normal dağılan veri setlerinde PMÇ de yansız kestirimler yapabilmiştir. Ancak çarpık dağılan veri setlerinde PMÇ’nin parametrik olmayan alternatifi olan Spearman’ın sıra farkları korelasyon katsayısı, yeterli performansı gösterememiştir. Polikorik korelasyon katsayısı, hem normal hem de çarpık dağılan veri setlerinde diğer yöntemlere nazaran daha yansız ve doğru sonuçlar vermiştir. Araştırma bulgularına göre kategorik verilerle gerçekleştirilen korelasyon analizinde polikorik / tetrakorik korelasyon katsayısının kullanılması önerilmektedir. Kategori sayısı arttıkça değişkenin sürekli kabul edilebileceği belirtilse de korelasyon analizi sonuçlarında PMÇ ve parametrik olmayan karşılığı olan Spearman’ın sıra farkları ile Kendall’ın Tau katsayısı yanlı sonuçlar vermiştir.

Keywords

Goodman Kruskal Gamma , Goodman Kruskal Lambda , Pearson korelasyonu , Kendall tau , Polikorik korelasyon

References

Akbulut, Ö. (2016). Korelasyon ve regresyon. Ö. Akbulut (Ed.), İstatistiğe giriş II içinde. TUBİTAK. https://esatis.tubitak.gov.tr/ekitap.htm adresinden elde edildi.
Altaş, D., Kaspar, E. Ç. ve Ergüt, Ö. (2012). İlişki katsayılarının karşılaştırılması: Bir simülasyon çalışması. Namık Kemal Üniversitesi Sosyal Bilimler Metinleri, (2), 1-9.
Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo studies in structural equation modeling research. In G. R. Hancock, & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed.) (pp. 625-666). Charlotte, NC: Information Age.
Baris Pekmezci, F., & Sengul Avsar, A. (2021). A guide for more accurate and precise estimations in Simulative Unidimensional IRT Models. International Journal of Assessment Tools in Education, 8(2), 423-447. doi: 10.21449/ijate.790289
Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması (2. baskı). Ankara: Pegem Akademi.
Chou, C. P., & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications. (pp. 37-55). Thousand Oaks, CA: Sage.
Çokluk, Ö., Şekercioğlu, G.ve Büyüköztürk, Ş. (2012). Sosyal bilimler için çok değişkenli istatistik SPSS ve LISREL uygulamaları (2. baskı). Ankara: Pegem Akademi.
Coolen-Maturi, T., & Elsayigh, A. (2010). A comparison of correlation coefficients via a three-step bootstrap approach. Journal of Mathematics Research, 2(2), 3-10.
Cooper, C. (2019). Psychological testing: Theory and practice. Abingdon, Oxon: Routledge.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16-29. doi: 10.1037/1082-989X.1.1.16
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. doi: 10.1111/emip.12111
Finney, S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 439-492). Charlotte, NC: IAP.
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466-491. doi:. 10.1037/1082-989X.9.4.466
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275-299. doi: 10.1037/a0015825
Göktaş, A., & Işçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732-764. doi: 10.1080/01621459.1954.10501231
Goodman, L. A., & Kruskal, W. H. (1979). Measures of association for cross classifications. New York, NY: Springer. doi: 10.1007/978-1-4612-9995-0
Hahs-Vaughn, D. L., & Lomax, R. G. (2020). An introduction to statistical concepts (4th. ed.). New York, NY: Routledge.
Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30(2), 87-93. doi: 10.2478/v10117-011-0021-1
Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59(3), 381-389. doi: 10.1007/BF02296131
Kılıç, A. F ve Koyuncu, İ. (2017). Ölçek uyarlama çalışmalarının yapı geçerliği açısından incelenmesi. Ö. Demirel ve S. Dinçer (Ed.), Küreselleşen dünyada eğitim içinde (ss. 1202-1205). Ankara: Pegem Akademi.
Kılıç, A. F., Uysal, İ. ve Doğan, N. (2018, Nisan). Simülasyon çalışmalarında replikasyon sayısının üretilen veri setlerine etkisi. Sözel Bildiri, 27. Uluslararası Eğitim Bilimleri Kongresi. Antalya.
Kolassa, J. E. (2020). An introduction to nonparametric statistics. Boca Raton: Chapman and Hall/CRC. doi: 10.1201/9780429202759
Kvålseth, T. O. (2017). An alternative measure of ordinal association as a value-validity correction of the Goodman–Kruskal gamma. Communications in Statistics - Theory and Methods, 46(21), 10582-10593. doi: 10.1080/03610926.2016.1239114
Kvålseth, T. O. (2018). Measuring association between nominal categorical variables: an alternative to the Goodman–Kruskal lambda. Journal of Applied Statistics, 45(6), 1118-1132. doi: 10.1080/02664763.2017.1346066
Masson, M. E. J., & Rotello, C. M. (2009). Sources of bias in the Goodman–Kruskal gamma coefficient measure of association: Implications for studies of metacognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(2), 509-527. doi: 10.1037/a0014876
Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171-189. doi: 10.1111/j.2044-8317.1985.tb00832.x
Olsson, U. H. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460. doi: 10.1007/BF02296207
R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.r-project.org/
Revelle, W. (2020). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois. Retrieved from https://cran.r-project.org/package=psych
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354-373. doi: 10.1037/a0029315
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763-1768. doi: 10.1213/ANE.0000000000002864
Signorell, A., Aho, K., Alfons, A., Anderegg, N., Aragon, T., Arachchige, C., et al. (2020). DescTools: Tools for descriptive statistics. Refrieved from https://cran.r-project.org/package=DescTools
Şensoy, S. (2020). Kategorik değişkenler arası ilişki katsayılarının simülasyon yoluyla karşılaştırılması. (Yayımlanmamış yüksek lisans tezi). Ordu Üniversitesi, Ordu.
Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th. ed.). Newyork, NY: Pearson.
Tuğran, E., Kocak, M., Mirtagioğlu, H., Yiğit, S., & Mendes, M. (2015). A simulation based comparison of correlation coefficients with regard to type I error rate and power. Journal of Data Analysis and Information Processing, 03(03), 87-101. doi: 10.4236/jdaip.2015.33010

The Effect of Categories and Distribution of Variables on Correlation Coefficients

Year 2022, Volume: 23 Issue: 1, 50 - 80, 31.03.2022

Abdullah Faruk Kılıç

https://doi.org/10.12984/egeefd.890104

Cited By: 6

Abstract

Correlation coefficients are used in many scientific fields. The types of variables used can also vary according to the scientific fields. In the current study, it was aimed to examine the effect of the number of categories and skewness of variables in different sample sizes on the correlation coefficients. Monte Carlo simulation study was conducted and polychoric / tetrachoric, Pearson product moments (PPM), Spearman's rank differences (rho), Kendall's Tau, Goodman-Kruskal Gamma and Lambda coefficients were compared. As a result of the study, it was observed that the polychoric / tetrachoric correlation coefficient had more unbiased results than others. With the increase in the number of categories, unbiased estimates were made by PPM in normally distributed data sets. However, Spearman’s rho could not show sufficient performance in the skewed data sets. The polychoric correlation coefficient gave more unbiased and accurate results in both normal and skewed data compared to other methods. According to the research findings, it is recommended to use the polychoric / tetrachoric correlation coefficient in the correlation analysis performed with categorical data. Although it is stated that the variable can be analyzed as continuous when the number of categories increases, PPM and its non-parametric alternatives Spearman’s rho, Kendall’s Tau coefficient gave biased results.

Keywords

Goodman Kruskal Gamma , Goodman Kruskal Lambda , Pearson correlation , Kendall’s Tau , Polychoric correlation

References

Akbulut, Ö. (2016). Korelasyon ve regresyon. Ö. Akbulut (Ed.), İstatistiğe giriş II içinde. TUBİTAK. https://esatis.tubitak.gov.tr/ekitap.htm adresinden elde edildi.
Altaş, D., Kaspar, E. Ç. ve Ergüt, Ö. (2012). İlişki katsayılarının karşılaştırılması: Bir simülasyon çalışması. Namık Kemal Üniversitesi Sosyal Bilimler Metinleri, (2), 1-9.
Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo studies in structural equation modeling research. In G. R. Hancock, & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed.) (pp. 625-666). Charlotte, NC: Information Age.
Baris Pekmezci, F., & Sengul Avsar, A. (2021). A guide for more accurate and precise estimations in Simulative Unidimensional IRT Models. International Journal of Assessment Tools in Education, 8(2), 423-447. doi: 10.21449/ijate.790289
Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması (2. baskı). Ankara: Pegem Akademi.
Chou, C. P., & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications. (pp. 37-55). Thousand Oaks, CA: Sage.
Çokluk, Ö., Şekercioğlu, G.ve Büyüköztürk, Ş. (2012). Sosyal bilimler için çok değişkenli istatistik SPSS ve LISREL uygulamaları (2. baskı). Ankara: Pegem Akademi.
Coolen-Maturi, T., & Elsayigh, A. (2010). A comparison of correlation coefficients via a three-step bootstrap approach. Journal of Mathematics Research, 2(2), 3-10.
Cooper, C. (2019). Psychological testing: Theory and practice. Abingdon, Oxon: Routledge.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16-29. doi: 10.1037/1082-989X.1.1.16
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. doi: 10.1111/emip.12111
Finney, S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 439-492). Charlotte, NC: IAP.
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466-491. doi:. 10.1037/1082-989X.9.4.466
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275-299. doi: 10.1037/a0015825
Göktaş, A., & Işçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732-764. doi: 10.1080/01621459.1954.10501231
Goodman, L. A., & Kruskal, W. H. (1979). Measures of association for cross classifications. New York, NY: Springer. doi: 10.1007/978-1-4612-9995-0
Hahs-Vaughn, D. L., & Lomax, R. G. (2020). An introduction to statistical concepts (4th. ed.). New York, NY: Routledge.
Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30(2), 87-93. doi: 10.2478/v10117-011-0021-1
Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59(3), 381-389. doi: 10.1007/BF02296131
Kılıç, A. F ve Koyuncu, İ. (2017). Ölçek uyarlama çalışmalarının yapı geçerliği açısından incelenmesi. Ö. Demirel ve S. Dinçer (Ed.), Küreselleşen dünyada eğitim içinde (ss. 1202-1205). Ankara: Pegem Akademi.
Kılıç, A. F., Uysal, İ. ve Doğan, N. (2018, Nisan). Simülasyon çalışmalarında replikasyon sayısının üretilen veri setlerine etkisi. Sözel Bildiri, 27. Uluslararası Eğitim Bilimleri Kongresi. Antalya.
Kolassa, J. E. (2020). An introduction to nonparametric statistics. Boca Raton: Chapman and Hall/CRC. doi: 10.1201/9780429202759
Kvålseth, T. O. (2017). An alternative measure of ordinal association as a value-validity correction of the Goodman–Kruskal gamma. Communications in Statistics - Theory and Methods, 46(21), 10582-10593. doi: 10.1080/03610926.2016.1239114
Kvålseth, T. O. (2018). Measuring association between nominal categorical variables: an alternative to the Goodman–Kruskal lambda. Journal of Applied Statistics, 45(6), 1118-1132. doi: 10.1080/02664763.2017.1346066
Masson, M. E. J., & Rotello, C. M. (2009). Sources of bias in the Goodman–Kruskal gamma coefficient measure of association: Implications for studies of metacognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(2), 509-527. doi: 10.1037/a0014876
Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171-189. doi: 10.1111/j.2044-8317.1985.tb00832.x
Olsson, U. H. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460. doi: 10.1007/BF02296207
R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.r-project.org/
Revelle, W. (2020). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois. Retrieved from https://cran.r-project.org/package=psych
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354-373. doi: 10.1037/a0029315
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763-1768. doi: 10.1213/ANE.0000000000002864
Signorell, A., Aho, K., Alfons, A., Anderegg, N., Aragon, T., Arachchige, C., et al. (2020). DescTools: Tools for descriptive statistics. Refrieved from https://cran.r-project.org/package=DescTools
Şensoy, S. (2020). Kategorik değişkenler arası ilişki katsayılarının simülasyon yoluyla karşılaştırılması. (Yayımlanmamış yüksek lisans tezi). Ordu Üniversitesi, Ordu.
Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th. ed.). Newyork, NY: Pearson.
Tuğran, E., Kocak, M., Mirtagioğlu, H., Yiğit, S., & Mendes, M. (2015). A simulation based comparison of correlation coefficients with regard to type I error rate and power. Journal of Data Analysis and Information Processing, 03(03), 87-101. doi: 10.4236/jdaip.2015.33010

There are 36 citations in total.

Details

Primary Language	Turkish
Subjects	Other Fields of Education
Journal Section	Articles
Authors	Abdullah Faruk Kılıç 0000-0003-3129-1763
Publication Date	March 31, 2022
Published in Issue	Year 2022 Volume: 23 Issue: 1

Cite

APA	Kılıç, A. F. (2022). Değişkenlerin Kategori Sayılarının ve Dağılımlarının Korelasyon Katsayılarına Etkisi. Ege Eğitim Dergisi, 23(1), 50-80. https://doi.org/10.12984/egeefd.890104