Investigation of Multidimensional Scale Transformation Methods Applied to Multidimensional Tests According to Various Conditions
Year 2023,
, 41 - 53, 28.06.2023
Yaşar Mehmet Zor
Abstract
The purpose of this study was to compare the equating errors of item and ability parameters obtained by performing scale transformation methods to two multidimensional test forms under various conditions. Sample size (1000-2000), common item ratio (20% and 40%), correlation between dimensions (0.1-0.5-0.9) and parameter estimation model (2PLM-3PLM) were taken as research conditions. Equating error (RMSE) was used to examine the accuracy of the scale transformation results. It was observed that the RMSE value generally decreased as the sample size and common item ratio increased and the correlation between dimensions decreased. Higher equating errors were obtained when the mean-sigma method was used. In the estimation of the discrimination parameter, lower RMSE values were obtained in 2PLM for all methods. In the estimation of difficulty and ability parameters, lower RMSE values were obtained in 2PLM for Stocking-Lord method and in 3PLM for mean-mean and mean-sigma methods.
References
- Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37-51.
- Angoff, W. H. (1984). Scales,norms and equivalent scores. New Jersey: Educational Testing Service.
- Atar, B. ve Yeşiltaş, G. (2017). Investigation of the Performance of Multidimensional Equating Procedures for Common-Item Nonequivalent Groups Design. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 421-434.
- Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162.
- Braun, H. I and Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland and D.B. Rubin (Ed.), Test equating (s. 9-49). New York: Academic Press.
- Beguin, A. A., & Hanson, B. A. (2001). Effect of noncompensatory multidimensionality on separate and concurrent estimation in IRT observed score equating. Paper presented at the The Annual Meeting of the National Council on Measurement in Education, Seattle, WA.
- Bökeoğlu, Ö., Uçar, A. ve Balta, E. (2022). Investigation of Scale Transformation Methods in True Score Equating Based on Item Response Theory. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi. 55(1), 1-36.
- Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory (Doctoral dissertation). University of Iowa, Iowa.
- Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1-29.
- Chu, K. L. & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Educational Research Association, Chicago.
- Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
- Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: basic theory and the linear case. Journal of Educational Measurement, 37(4), 281-306.
- Gibbons, R. D., Immekus, J., and Bock, R. D. (2007). Didactic workbook: The added value of multidimensional IRT models. National Cancer Institute Technical Report.
- Gök, B. ve Kelecioğlu, H. (2014). Comparison of IRT Equating Methods Using the Common-Item Nonequivalent Groups Design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
- Gübeş, N. Ö. (2019). Effect of Multidimensionality on Concurrent and Separate Calibration in Test Equating. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 34(4), 1061-1074.
- Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Hanson, B. A., & Beguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3-24.
- He, Y. (2011). Evaluating equating properties for mixed-format tests (Doctoral dissertation). University of Iowa, Iowa.
- Kabasakal, K. A. (2014). The Effect Of Differential Item Functioning on Test Equating (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Kaskowitz, G. S., & De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25, 39-52.
- Kim, S., & Kolen, M.J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381.
- Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43(1), 53-76.
- Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests (Doctoral Dissertation). University of Iowa, Iowa.
- Kim, S., Lee, W. C. And Kolen, M. J. (2020). Simple-Structure Multidimensional Item Response Theory Equating for Multidimensional Tests. Educational and Psychological Measurement. 80(1), 91-125.
- Kim, S. & Lee, W. (2022). Several Variations of Simple-Structure MIRT Equating. Journal of Educational Measurement. https://doi.org/10.1111/jedm.12341.
- Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd edition). New York: Springer.
- Kumlu, G. (2019). An Investigation of Test and Sub-Tests Equating In Terms Of Different Conditions (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the rasch model. Journal of Educational Measurement, 17(3), 179-193.
- Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139-160.
- Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
- Öztürk Gübeş, N. (2014). The Effects of Test Dimensionality, Common Item Format, Ability Distribution and Scale Transformation Methods on Mixed-Format Test Equating (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.
- Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
- Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529.
- Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
- Topczewski, A., Cui, Z., Woodruff, D., Chen, H. and Fang, Y. (2013). A comparison of four linear equating methods for the common-Item nonequivalent groups design using simulation methods. ACT Research Report Series, 2013(2).
- Wang, T., Lee, W. C., Brennan, R. J., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item non-equivalent groups design. Applied Psychological Measurement, 32, 632-651.
- Yao, L. (2009). LinkMIRT: Linking of Multivariate Item Response Model. Monterey, CA: Defense Manpower Data Center.
- Xu, Y. (2009). Measuring change in jurisdiction achievement over time: Equating issues in current international assessment programs (Doctoral dissertation). University of Toronto, Toronto.
- Zhang, B. (2009). Application of unidimensional item response models to tests with item sensitive to secondary dimensions. The Journal of Experimental Education, 77(2), 147-166.
- Zhang, J. (2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375-398.
Çok Boyutlu Testlere Uygulanan Çok Boyutlu Ölçek Dönüştürme Yöntemlerinin Çeşitli Koşullara Göre İncelenmesi
Year 2023,
, 41 - 53, 28.06.2023
Yaşar Mehmet Zor
Abstract
Bu çalışma kapsamında çok boyutlu iki test formuna çeşitli koşullar altında ölçek dönüştürme yapılıp, elde edilen madde ve yetenek parametrelerine ait eşitleme hatalarının karşılaştırılması amaçlanmıştır. Araştırma koşulları olarak örneklem büyüklüğü (1000-2000), ortak madde oranı (%20 ve %40), boyutlar arası korelasyon (0.1-0.5-0.9) ve parametre kestirim modeli (2PLM-3PLM) alınmıştır. Ölçek dönüştürme sonuçlarının doğruluğunu incelemek amacıyla eşitleme hatası (RMSE) kullanılmıştır. Örneklem büyüklüğü ve ortak madde oranı arttıkça ve boyutlar arası korelasyon azaldıkça RMSE değerinin genel olarak azaldığı görülmüştür. Ortalama-sigma yöntemi kullanıldığına daha yüksek eşitleme hataları elde edilmiştir. Ayırıcılık parametresinin kestiriminde tüm yöntemler için 2PLM’de daha düşük RMSE değerleri elde edilmiştir. Güçlük ve yetenek parametrelerinin kestiriminde Stocking-Lord yöntemi için 2PLM’de, ortalama-ortalama ve ortalama-sigma yöntemleri için 3PLM’de daha düşük RMSE değerleri elde edilmiştir.
References
- Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37-51.
- Angoff, W. H. (1984). Scales,norms and equivalent scores. New Jersey: Educational Testing Service.
- Atar, B. ve Yeşiltaş, G. (2017). Investigation of the Performance of Multidimensional Equating Procedures for Common-Item Nonequivalent Groups Design. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 421-434.
- Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162.
- Braun, H. I and Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland and D.B. Rubin (Ed.), Test equating (s. 9-49). New York: Academic Press.
- Beguin, A. A., & Hanson, B. A. (2001). Effect of noncompensatory multidimensionality on separate and concurrent estimation in IRT observed score equating. Paper presented at the The Annual Meeting of the National Council on Measurement in Education, Seattle, WA.
- Bökeoğlu, Ö., Uçar, A. ve Balta, E. (2022). Investigation of Scale Transformation Methods in True Score Equating Based on Item Response Theory. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi. 55(1), 1-36.
- Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory (Doctoral dissertation). University of Iowa, Iowa.
- Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1-29.
- Chu, K. L. & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Educational Research Association, Chicago.
- Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
- Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: basic theory and the linear case. Journal of Educational Measurement, 37(4), 281-306.
- Gibbons, R. D., Immekus, J., and Bock, R. D. (2007). Didactic workbook: The added value of multidimensional IRT models. National Cancer Institute Technical Report.
- Gök, B. ve Kelecioğlu, H. (2014). Comparison of IRT Equating Methods Using the Common-Item Nonequivalent Groups Design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
- Gübeş, N. Ö. (2019). Effect of Multidimensionality on Concurrent and Separate Calibration in Test Equating. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 34(4), 1061-1074.
- Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Hanson, B. A., & Beguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3-24.
- He, Y. (2011). Evaluating equating properties for mixed-format tests (Doctoral dissertation). University of Iowa, Iowa.
- Kabasakal, K. A. (2014). The Effect Of Differential Item Functioning on Test Equating (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Kaskowitz, G. S., & De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25, 39-52.
- Kim, S., & Kolen, M.J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381.
- Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43(1), 53-76.
- Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests (Doctoral Dissertation). University of Iowa, Iowa.
- Kim, S., Lee, W. C. And Kolen, M. J. (2020). Simple-Structure Multidimensional Item Response Theory Equating for Multidimensional Tests. Educational and Psychological Measurement. 80(1), 91-125.
- Kim, S. & Lee, W. (2022). Several Variations of Simple-Structure MIRT Equating. Journal of Educational Measurement. https://doi.org/10.1111/jedm.12341.
- Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd edition). New York: Springer.
- Kumlu, G. (2019). An Investigation of Test and Sub-Tests Equating In Terms Of Different Conditions (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the rasch model. Journal of Educational Measurement, 17(3), 179-193.
- Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139-160.
- Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
- Öztürk Gübeş, N. (2014). The Effects of Test Dimensionality, Common Item Format, Ability Distribution and Scale Transformation Methods on Mixed-Format Test Equating (Doctoral dissertation). Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.
- Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
- Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529.
- Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
- Topczewski, A., Cui, Z., Woodruff, D., Chen, H. and Fang, Y. (2013). A comparison of four linear equating methods for the common-Item nonequivalent groups design using simulation methods. ACT Research Report Series, 2013(2).
- Wang, T., Lee, W. C., Brennan, R. J., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item non-equivalent groups design. Applied Psychological Measurement, 32, 632-651.
- Yao, L. (2009). LinkMIRT: Linking of Multivariate Item Response Model. Monterey, CA: Defense Manpower Data Center.
- Xu, Y. (2009). Measuring change in jurisdiction achievement over time: Equating issues in current international assessment programs (Doctoral dissertation). University of Toronto, Toronto.
- Zhang, B. (2009). Application of unidimensional item response models to tests with item sensitive to secondary dimensions. The Journal of Experimental Education, 77(2), 147-166.
- Zhang, J. (2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375-398.