Linking Scales in Item Response Theory with Covariates

Valentina Sansivieri; Marie Wiberg

Research Article

Madde Tepki Teorisinde Ölçeklerin Ortak Değişkenler ile İlişkilendirilmesi

Year 2018, Volume: 3 Issue: 2, 12 - 32, 05.09.2019

Valentina Sansivieri , Marie Wiberg

Abstract

Bir biriyle eşit özelliklere sahip olmayan farklı gruplara test formları uygulandığı ve sonuçlar madde tepki kuramına (MTT) göre puanlandırıldığı zaman, iki grup için ayrı ayrı tahmin edilen madde parametrelerinin aynı ölçeğe yerleştirilmesi gerekmektedir. Test edilenlerle ilgili değişkenleri içeren MTT modellerinde, aynı skalaya koyulması gereken, düzgün olan ve düzgün olmayan değişken madde fonksiyonunu (DMF) modelleyen iki farklı parametre vardır. Bu çalışma düzgün olan ve düzgün olmayan DMF parametrelerini aynı skalaya yerleştiren dönüşüm denklemlerini önermeyi amaçlamaktadır. Dönüşüm denklemlerinin katsayılarını tahmin etmek amacıyla bu çalışmada şu dört yöntem kullanılmıştır: ortalama/ortalama, ortalama/sigma, Haebara ve Stocking-Lord. Araştırmamızda bir simülasyon çalışması ve deneysel bir örnek vermekteyiz. Bu simülasyon çalışmasının sonuçları bizlere eşitlik denklemlerinin katsayılarının büyük ölçüde Haebara ve Stocking-Lord yöntemleri için aynı olduğunu göstermiş olsa da, diğer yöntemler için farklı olduğunu göstermiştir. Deneysel örneğimizin sonuçları ise yüksek beceri değerleri için değişkenlerle birlikte olan MTT’nin değişkenler olmaksızın uygulanan MTT’den daha bilgilendirici bir sonuç ürettiğini göstermiştir. Bunun yanında ortalama/ortalama ve ortalama/sigma yöntemleri kullanıldığında eş zamanlı kalibrasyon yöntemine göre daha aydınlatıcı sonuçlar elde edilmiştir.

Keywords

MTT, Ortak değişkenler, Birleştirme, DMF

References

Andersson, B. (2018). Asymptotic variance of linking coefficient estimators for polytomous IRT models. Applied Psychological Measurement, 42(3), 192-205.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord and M. R. Novick (Eds.), Statistical theories of mental test scores (chaps. 17-20). Reading, MA: Addison-Wesley.
Borchers, H. W. (2017). Pracma: Practical numerical math functions. R package.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
González, J., & Wiberg, M. (2017). Applying test equating methods – using R. Cham, Switzerland: Springer.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.
Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3-24.
Huggins-Manley, A. C. (2014). The effect of differential item functioning in anchor items on population invariance of equating, Educational and Psychological Measurement, 74(4), 627-658.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131-143.
Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371-397.
Kolen, M., & Brennan, R. (2014). Test equating, scaling, and linking: Method and practice. 3rd edition New York, NY: Springer-Verlag.
Kopf, J., Zeileis, A., & Strobl, C. (2015). Anchor selection strategies for DIF analysis. Review, assessment and new approaches. Educational and Psychological Measurement, 75(1), 22-56.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch Model. Journal of Educational Measurement, 17, 179-193.
Lyrén, P-E., & Hambleton, R. K. (2011). Consequences of violated the equating assumptions under the equivalent group design. International Journal of Testing, 36(5), 308-323.
Magis, D., Beland, S.,Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691-692.
Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53-67.
Ogasawara, H. (2011). Applications of asymptotic expansion in item response theory linking. In von A. A. Davier, (Ed.), Statistical models for test equating, scaling, and linking (pp. 261-280). New York, NY: Springer.
R Core Development Team (2019). R: A language and Environment for Statistical Computing. R Foundation for statistical computing. Vienna, Austria: http://www.R-project.org/
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17 (5), 1-25.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Tay, L., Newman, D., & Vermunt, J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 1(14), 147-176.
Tay, L., Newman, D., & Vermunt, J. K. (2016). Item response theory with covariates (IRT-C): Assessing item recovery and differential item functioning for the three-parameter logistic model, Educational and Psychological Measurement, 76(1), 22-42.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Wang, W.-C., Shih, C.-L., & Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687-708.
Wedman, J. (2018). Reasons for gender-related differential item functioning in a college admissions test. Journal of Educational Research, 62(6), 959-970.
Wiberg, M., & Bränberg, K. (2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39(5), 1-13.
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Working paper of the Edgeworth Laboratory for Quantitative Behavioral Sciences. Prince George, Canada: University of British Columbia.

Linking Scales in Item Response Theory with Covariates

Year 2018, Volume: 3 Issue: 2, 12 - 32, 05.09.2019

Valentina Sansivieri , Marie Wiberg

Abstract

When test forms are administered to different non-equivalent groups of examinees and are scored by item response theory (IRT), it is necessary to put item parameters estimated separately on two groups on the same scale. In the IRT models which include covariates about the examinees, we have two parameters which model uniform and non-uniform differential item functioning (DIF) and that have to be put on the same scale. The aim of this study is to propose conversion equations, which are used to put the uniform and non-uniform DIF parameters on the same scale. To estimate the coefficients of the conversion equations we will use four methods: mean/mean, mean/sigma, Haebara and Stocking-Lord. We give a simulation study and an empirical example. The results of the simulation study show that the coefficients of the conversion equations are substantially equal for the Haebara and Stocking-Lord methods, while they are different for the other methods. The results of the empirical example is that IRT with covariates produces a more informative test than using IRT without covariates for high abilities’ values and, when the mean-mean and the mean-sigma methods are used, we obtain more informative tests than when using concurrent calibration.

Keywords

IRT, Covariates, Linking, DIF

References

Andersson, B. (2018). Asymptotic variance of linking coefficient estimators for polytomous IRT models. Applied Psychological Measurement, 42(3), 192-205.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord and M. R. Novick (Eds.), Statistical theories of mental test scores (chaps. 17-20). Reading, MA: Addison-Wesley.
Borchers, H. W. (2017). Pracma: Practical numerical math functions. R package.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
González, J., & Wiberg, M. (2017). Applying test equating methods – using R. Cham, Switzerland: Springer.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.
Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3-24.
Huggins-Manley, A. C. (2014). The effect of differential item functioning in anchor items on population invariance of equating, Educational and Psychological Measurement, 74(4), 627-658.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131-143.
Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371-397.
Kolen, M., & Brennan, R. (2014). Test equating, scaling, and linking: Method and practice. 3rd edition New York, NY: Springer-Verlag.
Kopf, J., Zeileis, A., & Strobl, C. (2015). Anchor selection strategies for DIF analysis. Review, assessment and new approaches. Educational and Psychological Measurement, 75(1), 22-56.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch Model. Journal of Educational Measurement, 17, 179-193.
Lyrén, P-E., & Hambleton, R. K. (2011). Consequences of violated the equating assumptions under the equivalent group design. International Journal of Testing, 36(5), 308-323.
Magis, D., Beland, S.,Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691-692.
Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53-67.
Ogasawara, H. (2011). Applications of asymptotic expansion in item response theory linking. In von A. A. Davier, (Ed.), Statistical models for test equating, scaling, and linking (pp. 261-280). New York, NY: Springer.
R Core Development Team (2019). R: A language and Environment for Statistical Computing. R Foundation for statistical computing. Vienna, Austria: http://www.R-project.org/
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17 (5), 1-25.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Tay, L., Newman, D., & Vermunt, J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 1(14), 147-176.
Tay, L., Newman, D., & Vermunt, J. K. (2016). Item response theory with covariates (IRT-C): Assessing item recovery and differential item functioning for the three-parameter logistic model, Educational and Psychological Measurement, 76(1), 22-42.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Wang, W.-C., Shih, C.-L., & Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687-708.
Wedman, J. (2018). Reasons for gender-related differential item functioning in a college admissions test. Journal of Educational Research, 62(6), 959-970.
Wiberg, M., & Bränberg, K. (2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39(5), 1-13.
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Working paper of the Edgeworth Laboratory for Quantitative Behavioral Sciences. Prince George, Canada: University of British Columbia.

There are 35 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Journal Section	Cilt 3
Authors	Valentina Sansivieri Marie Wiberg This is me
Publication Date	September 5, 2019
Published in Issue	Year 2018 Volume: 3 Issue: 2

Cite

APA	Sansivieri, V., & Wiberg, M. (n.d.). Linking Scales in Item Response Theory with Covariates. Eğitim Bilim Ve Teknoloji Araştırmaları Dergisi, 3(2), 12-32.
AMA	Sansivieri V, Wiberg M. Linking Scales in Item Response Theory with Covariates. EBTAD (JREST). 3(2):12-32.
Chicago	Sansivieri, Valentina, and Marie Wiberg. “Linking Scales in Item Response Theory With Covariates”. Eğitim Bilim Ve Teknoloji Araştırmaları Dergisi 3, no. 2 n.d.: 12-32.
EndNote	Sansivieri V, Wiberg M Linking Scales in Item Response Theory with Covariates. Eğitim Bilim ve Teknoloji Araştırmaları Dergisi 3 2 12–32.
IEEE	V. Sansivieri and M. Wiberg, “Linking Scales in Item Response Theory with Covariates”, EBTAD (JREST), vol. 3, no. 2, pp. 12–32.
ISNAD	Sansivieri, Valentina - Wiberg, Marie. “Linking Scales in Item Response Theory With Covariates”. Eğitim Bilim ve Teknoloji Araştırmaları Dergisi 3/2 (n.d.), 12-32.
JAMA	Sansivieri V, Wiberg M. Linking Scales in Item Response Theory with Covariates. EBTAD (JREST).;3:12–32.
MLA	Sansivieri, Valentina and Marie Wiberg. “Linking Scales in Item Response Theory With Covariates”. Eğitim Bilim Ve Teknoloji Araştırmaları Dergisi, vol. 3, no. 2, pp. 12-32.
Vancouver	Sansivieri V, Wiberg M. Linking Scales in Item Response Theory with Covariates. EBTAD (JREST). 3(2):12-3.

Download Cover Image

Article Files

Full Text