Assessing Measurement Invariance: Multiple Group Confirmatory Factor Analysis for Differential Item Functioning Detection in Polytomous Measures of Turkish and American Students

Derya Evran

doi:10.22596/2019.0401.1.20

Araştırma Makalesi

Ölçme Değişmezliğinin Değerlendirilmesi: Türk ve Amerikalı Öğrenciler için Madde İşlev Farklılığının Belirlenmesinde Çoklu Grup Doğrulayıcı Faktör Analizi Kullanımı

Yıl 2019, Cilt: 4 Sayı: 1, 1 - 20, 30.06.2019

Derya Evran

https://doi.org/10.22596/2019.0401.1.20

Öz

Uluslararası büyük ölçekli sınavlar genellikle
bir ülkede geliştirilmekte ve diğer ülkelerde uygulanmaktadır. Ülkeler arasında
ölçme değişmezliğinin değerlendirilmesi, geçerli sonuçlar ve ülkeler arasında
karşılaştırmalar için önemli bir adımdır. Bu makale, iki ülkede, Uluslararası
Öğrenci Değerlendirme Programı (PISA) 2009 öğrenci anketinden seçilen soruların
ölçme değişmezliğini araştırmıştır. Türkiye ve Amerika Birleşik Devletleri,
madde işlev farklılığını (DIF) saptamak için çoklu grup doğrulayıcı faktör
analizi ile karşılaştırılmıştır. Sonuçlar ki-kare uyum testi , RMSEA
karşılaştırmalı uyum indeksi ve Tucker-Lewis indeksi temel alınarak
değerlendirilmiştir. Sonuçlara göre, Öğrenme stratejileri kapsamında DIF
sergileyen maddeler, Madde Tepki kuramı ile değerlendirilmiş ve yorumlanmıştır.

Anahtar Kelimeler

DIF, PISA, MG-CFA, IRT, öğrenme stratejileri, ölçme değişmezliği

Kaynakça

American Educational Research Association, American Psychological Association and National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.). Hillsdale, NJ: Erlbaum.
Barbara, D., Claus, C. and Manfred, P. (2011). The Role of Content and Context in PISA Interest Scales: A Study of the Embedded Interest Items in the PISA 2006 Science Assessment. International Journal of Science Education, 33, 73-95.
Bock, R.D. and Aitkin, M. (1981) .Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46: 443-459.
Browne, M.W. and Cudeck, R. (1993). Alternative ways of assessing model fit. In Bollen, K.A. & Long, J.S. [Eds.] Testing structural equation models. Newbury Park, CA: Sage, 136–162.
Bybee, R. and McCrae, B. (2011). Scientific literacy and student attitudes: Perspectives from PISA 2006 science. International Journal of Science Education, 33(1), 7–26.
Green, B. A., Miller, R. B., Crowson, H. M., Duke, B. L. and Akey, K. L. (2004). Predicting High School Students’ Cognitive Engagement and Achievement: Contributions of Classroom Perceptions and Motivation. Contemporary Educational Psychology, 29, 462-482.
Hambleton, R. K., Swaminathan, H. and Rogers, H. J. (1991). Measurement methods for the social sciences series, Vol. 2. Fundamentals of item response theory. Thousand Oaks, CA, US: Sage Publications, Inc.
Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229-244.
Hambleton, R. K. and Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptations. European Journal of Psychological Assessment, 11, 147-157.
Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests in multiple languages. In R. K. Hambleton, P. Merenda, & C. D. Spielberger (Eds.). Hillsdale, NJ: Lawrence Erlbaum.
Hambleton, R. K., Sireci, S. G. and Patsula, L. (2005). Statistical methods for identifying flaws in the test adaptation process. In R. K. Hambleton, P. Merenda, & C. D. Spielberger (Eds.). Hillsdale, NJ: Lawrence Erlbaum.
Hopfenbeck, T. N. and Maul, A. (2011). Examining evidence for the validity of PISA learning strategy scales based on student response processes. International Journal of Testing, 11, 95–121.
Hu L. and Bentler P.M. (1999) Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives, Structural Equation Modeling: A Multidisciplinary Journal, 6:1, 1-55.
Holland, P. W. and Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Jöreskog, K.G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426.
Kline, R. B. (2005). Methodology in the social sciences. Principles and practice of structural equation modeling (2nd ed.). New York, NY, US: Guilford Press.
MEB (2010). PISA 2009 ulusal ön raporu. Ankara: Milli Eğitim Bakanlığı-EARGED.
Metallidou, P. and Vlachou, A. (2007). Motivational beliefs, cognitive engagement, and achievement in language and mathematics in elementary school children. International Journal of Psychology, 42, 2-15.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.
Millsap, E.R. and Yun-Tein, J. (2004) Assessing Factorial Invariance in Ordered-Categorical Measures, Multivariate Behavioral Research, 39:3, 479-515.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.
Muraki, E. (1999). Stepwise analysis of differential item functioning based on multiple- group partial credit model. Journal of educational measurement, 36(3), 217-232.
Muthén, B. O. and Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (Mplus Web Note No. 4). Retrieved April 28, 2005, from http://www.statmodel.com/mplus/examples/webnote.html.
OECD (2012). PISA 2009 technical report. PISA: OECD Publishing.
Pintrich, P.R. (2000). An Achievement Goal Theory Perspective on Issues in Motivation Terminology, Theory, and Research. Contemporary Educational Psychology 25, 92–104.
Steenkamp, J.E.M. and Baumgartner, H. (1998); Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25, 1, 1, 78–90.
Tay, L. and Harter, J.K. (2013). Economic and Labor Market Forces Matter for Worker Well‐Being. Applied Psychology, 5(2), 193-208.
Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In: H. Wainer and H. Braun (Eds). Hillsdale: Lawrence Erlbaum Associates.
Van de Vijver, F. J. R. and Tanzer, N. K. (1997). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 47, 263-279.
Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A bayesian multilevel approach (Unpublished Doctoral dissertation). Florida State University, Tallahassee, FL.
Wolf, R. M. (1998). Validity issues in international assessments. International Journal of Educational Research, 29, 491-501.
Wolters, C.A. and Pintrich, P.R. Instructional Science (1998). Contextual differences in student motivation and self-regulated learning in mathematics, English, and social studies classrooms. Instructional Science, 26, 27-46.
Wolters, C. A., Pintrich, P. R. and Karabenick, S. A. (2005). Assessing Academic Self-Regulated Learning. In K. A. Moore & L. H. Lippman (Eds.), The Search Institute series on developmentally attentive community and society. What do children need to flourish: Conceptualizing and measuring indicators of positive development (pp. 251-270). New York, NY, US: Springer.
Zimmerman, B.J. (2000). Self-Efficacy: An Essential Motive to Learn. Contemporary Educational Psychology 25, 82–91.
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where is it going. Language assessment quarterly, 4(2), 223-233.

Assessing Measurement Invariance: Multiple Group Confirmatory Factor Analysis for Differential Item Functioning Detection in Polytomous Measures of Turkish and American Students

Yıl 2019, Cilt: 4 Sayı: 1, 1 - 20, 30.06.2019

Derya Evran

https://doi.org/10.22596/2019.0401.1.20

Öz

International
assessments are often developed in one country and applied in other countries.
Assessing the measurement invariance across countries is an important step in
determining valid conclusions, comparisons across countries. This paper
investigated measurement invariance, across two countries, of selected
questions from the Programme for International Student Assessment 2009 student
questionnaire. Turkey and United States were compared with the multiple group
confirmatory factor analysis for scores on polytomous items to detect
differential item functioning (DIF). The results were based on the chi-square
goodness of fit test and root mean squared error of approximation, the
comparative fit index and the Tucker-Lewis index. The items exhibit DIF,
learning strategies, were investigated with Item Response Theory based on the
chi-square goodness of fit and t-test.

Anahtar Kelimeler

DIF, PISA, MG-CFA, IRT, learning strategies, measurement invariance

Kaynakça

American Educational Research Association, American Psychological Association and National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.). Hillsdale, NJ: Erlbaum.
Barbara, D., Claus, C. and Manfred, P. (2011). The Role of Content and Context in PISA Interest Scales: A Study of the Embedded Interest Items in the PISA 2006 Science Assessment. International Journal of Science Education, 33, 73-95.
Bock, R.D. and Aitkin, M. (1981) .Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46: 443-459.
Browne, M.W. and Cudeck, R. (1993). Alternative ways of assessing model fit. In Bollen, K.A. & Long, J.S. [Eds.] Testing structural equation models. Newbury Park, CA: Sage, 136–162.
Bybee, R. and McCrae, B. (2011). Scientific literacy and student attitudes: Perspectives from PISA 2006 science. International Journal of Science Education, 33(1), 7–26.
Green, B. A., Miller, R. B., Crowson, H. M., Duke, B. L. and Akey, K. L. (2004). Predicting High School Students’ Cognitive Engagement and Achievement: Contributions of Classroom Perceptions and Motivation. Contemporary Educational Psychology, 29, 462-482.
Hambleton, R. K., Swaminathan, H. and Rogers, H. J. (1991). Measurement methods for the social sciences series, Vol. 2. Fundamentals of item response theory. Thousand Oaks, CA, US: Sage Publications, Inc.
Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229-244.
Hambleton, R. K. and Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptations. European Journal of Psychological Assessment, 11, 147-157.
Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests in multiple languages. In R. K. Hambleton, P. Merenda, & C. D. Spielberger (Eds.). Hillsdale, NJ: Lawrence Erlbaum.
Hambleton, R. K., Sireci, S. G. and Patsula, L. (2005). Statistical methods for identifying flaws in the test adaptation process. In R. K. Hambleton, P. Merenda, & C. D. Spielberger (Eds.). Hillsdale, NJ: Lawrence Erlbaum.
Hopfenbeck, T. N. and Maul, A. (2011). Examining evidence for the validity of PISA learning strategy scales based on student response processes. International Journal of Testing, 11, 95–121.
Hu L. and Bentler P.M. (1999) Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives, Structural Equation Modeling: A Multidisciplinary Journal, 6:1, 1-55.
Holland, P. W. and Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Jöreskog, K.G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426.
Kline, R. B. (2005). Methodology in the social sciences. Principles and practice of structural equation modeling (2nd ed.). New York, NY, US: Guilford Press.
MEB (2010). PISA 2009 ulusal ön raporu. Ankara: Milli Eğitim Bakanlığı-EARGED.
Metallidou, P. and Vlachou, A. (2007). Motivational beliefs, cognitive engagement, and achievement in language and mathematics in elementary school children. International Journal of Psychology, 42, 2-15.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.
Millsap, E.R. and Yun-Tein, J. (2004) Assessing Factorial Invariance in Ordered-Categorical Measures, Multivariate Behavioral Research, 39:3, 479-515.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.
Muraki, E. (1999). Stepwise analysis of differential item functioning based on multiple- group partial credit model. Journal of educational measurement, 36(3), 217-232.
Muthén, B. O. and Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (Mplus Web Note No. 4). Retrieved April 28, 2005, from http://www.statmodel.com/mplus/examples/webnote.html.
OECD (2012). PISA 2009 technical report. PISA: OECD Publishing.
Pintrich, P.R. (2000). An Achievement Goal Theory Perspective on Issues in Motivation Terminology, Theory, and Research. Contemporary Educational Psychology 25, 92–104.
Steenkamp, J.E.M. and Baumgartner, H. (1998); Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25, 1, 1, 78–90.
Tay, L. and Harter, J.K. (2013). Economic and Labor Market Forces Matter for Worker Well‐Being. Applied Psychology, 5(2), 193-208.
Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In: H. Wainer and H. Braun (Eds). Hillsdale: Lawrence Erlbaum Associates.
Van de Vijver, F. J. R. and Tanzer, N. K. (1997). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 47, 263-279.
Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A bayesian multilevel approach (Unpublished Doctoral dissertation). Florida State University, Tallahassee, FL.
Wolf, R. M. (1998). Validity issues in international assessments. International Journal of Educational Research, 29, 491-501.
Wolters, C.A. and Pintrich, P.R. Instructional Science (1998). Contextual differences in student motivation and self-regulated learning in mathematics, English, and social studies classrooms. Instructional Science, 26, 27-46.
Wolters, C. A., Pintrich, P. R. and Karabenick, S. A. (2005). Assessing Academic Self-Regulated Learning. In K. A. Moore & L. H. Lippman (Eds.), The Search Institute series on developmentally attentive community and society. What do children need to flourish: Conceptualizing and measuring indicators of positive development (pp. 251-270). New York, NY, US: Springer.
Zimmerman, B.J. (2000). Self-Efficacy: An Essential Motive to Learn. Contemporary Educational Psychology 25, 82–91.
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where is it going. Language assessment quarterly, 4(2), 223-233.

Toplam 36 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Makaleler
Yazarlar	Derya Evran
Yayımlanma Tarihi	30 Haziran 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 4 Sayı: 1

Kaynak Göster

APA	Evran, D. (2019). Assessing Measurement Invariance: Multiple Group Confirmatory Factor Analysis for Differential Item Functioning Detection in Polytomous Measures of Turkish and American Students. Harran Maarif Dergisi, 4(1), 1-20. https://doi.org/10.22596/2019.0401.1.20

Makale Dosyaları

Tam Metin