Investigation of Models Used in Equating Testlet-Based Tests
Yıl 2024,
Cilt: 43 Sayı: 2, 951 - 970, 30.12.2024
Ertunç Ukşul
,
Hülya Kelecioğlu
Öz
This study aims to examine the effects of testlets on test equating. For this purpose unidimensional item response theory, two-factor item response theory and testlet response theory models were applied to the testlet-based tests for the estimation of item and ability parameters. In order to equate the tests, the parameters were placed on the common scale using mean-mean, mean-sigma and Stocking-Lord scale transformation methods under the common-item non-equivalent groups design. Then, the equating errors of the models depending on the scale transformation method and the number of testlets were calculated and compared. Equating errors were compared with Root Mean Squared Error. In the study, the science test of the Trends in International Mathematics and Science Study project administered in 2019 was used as the data collection tool. As a result of the study, it was determined that the use of unidimensional item response theory model increased the equating error, while the use of two-factor and testlet response theory models decreased the equating error as the number of testlets in the test increased. In order to compare the models, the correlation between the parameters obtained from the models after scale transformation was examined and it was found that the item parameters were more affected by the model selection than the ability parameter. In addition, it was concluded that the equating errors obtained from the mean-mean and Stocking-Lord scale transformation methods were lower than the mean-sigma method.
Etik Beyan
Ethics committee approval was received for the research from the Scientific Research and Publication Ethics Committee of Hacettepe University with the decision number 35853172-300 dated 21.04.2020.
Kaynakça
- Ackerman, T. A. (1987). ACT research report series: The robustness of LOGIST and BILOG IRT estimation programs to violations of local independence. PsycEXTRA Dataset. https://doi.org/10.1037/e426132008-001
- Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
- Babcock, B. G. E. (2009). Estimating a Noncompensatory IRT Model Using a modified Metropolis algorithm [Unpublished doctoral dissertation]. The University of Minnesota.
- Baker, F. B. & Al‐Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162. https://doi.org/10.1111/j.1745-3984.1991.tb00350.x
- Bao, H. (2007). Investigating differential item function amplification and cancellation in the application of item response testlet models [Unpublished doctoral dissertation]. University of Maryland.
- Bock, R. D. & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/bf02293801
- Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168. https://doi.org/10.1007/bf02294533
- Cook, L. L. & Eignor, D. R. (1991). IRT equating methods. Educational Measurement: Issues and Practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
- DeMars, C. E. (2006). Application of the Bi‐factor multidimensional item response theory model to testlet‐based tests. Journal of Educational Measurement, 43(2), 145-168. https://doi.org/10.1111/j.1745-3984.2006.00010.x
- Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
- Gök, B. & Kelecioğlu, H. (2014). Comparison of IRT Equating Methods Using the Common-Item Nonequivalent Groups Design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
- Gübeş, N. Ö. & Kelecioğlu, H. (2015). Karma Testlerin Eşitlenmesinde MTK Eşitleme Yöntemlerinin Eşitlik Özelliği Korunumu Ölçütüne Göre Karşılaştırılması. Journal of Measurement and Evaluation in Education and
Psychology, 6(1). https://doi.org/10.21031/epod.65039
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer Science & Business Media.
- Kogar, E. Y. & Kelecioglu, H. (2017). Examination of different item response theory models on tests composed of Testlets. Journal of Education and Learning, 6(4), 113. https://doi.org/10.5539/jel.v6n4p113
- Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Springer Science & Business Media.
- Lee, G., Kolen, M. J., Frisbie, D. A. & Ankenmann, R. D. (2001). Comparison of dichotomous and Polytomous item response models in equating scores from tests composed of Testlets. Applied Psychological Measurement, 25(4), 357-372. https://doi.org/10.1177/01466210122032226
- Liu, Y. & Thissen, D. (2012). Identifying local dependence with a score test statistic based on the Bifactor logistic model. Applied Psychological Measurement, 36(8), 670-688. https://doi.org/10.1177/0146621612458174
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
- Min, S. & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453-477. https://doi.org/10.1177/0265532214527277
- Richardson, M., Isaacs, T., Barnes, I., Swensson, C., Wilkinson, D. & Golding, J. (2020). Trends in International Mathematics and Science Study (TIMSS) 2019: National Report for England. Research Report. UK Department for Education.
- Rijmen, F. (2009). Three multidimensional models for testlet‐based tests: Formal relations and an empirical comparison. ETS Research Report Series, 2009(2). https://doi.org/10.1002/j.2333-8504.2009.tb02194.x
- Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. https://doi.org/10.1177/014662168300700208
- Tan, Ş. (2005). Küçük örneklemlerde beta4 ve polynomial loglineer öndüzgünleştirme ve kübik eğri sondüzgünleştirme metotlarının uygunluğu. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 35(1), 123-151.
- Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186.
https://doi.org/10.1207/s15324818ame0802_4
- Wainer, H. & Wang, X. (2000). Using a new statistical model for Testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x
- Wainer, H., Bradlow, E. T. & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
- Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data [Unpublished doctoral dissertation] University of Massachusetts Amherst.
- Zor, Y. M. (2023). Çok boyutlu testlerin tek boyutlu ve çok boyutlu yöntemlere göre eşitlenmesi [Unpublished doctoral dissertation] Hacettepe University.
Madde Takımı Tabanlı Testlerin Eşitlenmesinde Kullanılan Modellerin İncelenmesi
Yıl 2024,
Cilt: 43 Sayı: 2, 951 - 970, 30.12.2024
Ertunç Ukşul
,
Hülya Kelecioğlu
Öz
Bu çalışmada, testlerde yer alan madde takımlarının uygulanan modellere göre test eşitleme hatasına olan etkisi araştırılmıştır. Bu amaçla, çalışma kapsamında analiz edilen madde takımı içeren testlere madde tepki kuramı modellerinden tek boyutlu madde tepki kuramı, iki faktör madde tepki kuramı ve madde takımı tepki kuramı modelleri uygulanarak madde ve yetenek parametreleri kestirilmiştir. Testlerin eşitlenebilmesi için elde edilen parametreler, eşdeğer olmayan gruplar ortak madde deseni altında ortalama-ortalama, ortalama-sigma ve Stocking-Lord ölçek dönüştürme yöntemleri kullanılarak ortak ölçeğe yerleştirilmiştir. Daha sonra modellerin ölçek dönüştürme yöntemine ve madde takım sayısına bağlı olarak değişen eşitleme hataları hesaplanmış ve karşılaştırılmıştır. Değerlendirme ölçütü olarak Root Mean Squared Error tercih edilmiştir. Araştırmada, veri toplama aracı olarak Trends in International Mathematics and Science Study projesinin 2019 yılında uygulanan fen bilimleri testi kullanılmıştır. Araştırma sonucunda, testte yer alan madde takım sayısı arttıkça tek boyutlu madde tepki kuramı modeli kullanımının eşitleme hatasını artırdığı, iki faktör ve madde takımı tepki kuramı modelleri kullanımının ise eşitleme hatasını düşürdüğü belirlenmiştir. Modellerin karşılaştırılması için ölçek dönüştürme sonrası modellerden elde edilen parametreler arasındaki ilişki incelmiş, madde parametrelerinin yetenek parametresine göre model seçiminden daha fazla etkilendiği bulunmuştur. Bununla birlikte ortalama-ortalama ve Stocking-Lord ölçek dönüştürme yöntemlerinden elde edilen elde edilen eşitleme hatalarının ortalama-sigma yöntemine göre daha düşük olduğu sonucuna ulaşılmıştır.
Kaynakça
- Ackerman, T. A. (1987). ACT research report series: The robustness of LOGIST and BILOG IRT estimation programs to violations of local independence. PsycEXTRA Dataset. https://doi.org/10.1037/e426132008-001
- Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
- Babcock, B. G. E. (2009). Estimating a Noncompensatory IRT Model Using a modified Metropolis algorithm [Unpublished doctoral dissertation]. The University of Minnesota.
- Baker, F. B. & Al‐Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162. https://doi.org/10.1111/j.1745-3984.1991.tb00350.x
- Bao, H. (2007). Investigating differential item function amplification and cancellation in the application of item response testlet models [Unpublished doctoral dissertation]. University of Maryland.
- Bock, R. D. & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/bf02293801
- Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168. https://doi.org/10.1007/bf02294533
- Cook, L. L. & Eignor, D. R. (1991). IRT equating methods. Educational Measurement: Issues and Practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
- DeMars, C. E. (2006). Application of the Bi‐factor multidimensional item response theory model to testlet‐based tests. Journal of Educational Measurement, 43(2), 145-168. https://doi.org/10.1111/j.1745-3984.2006.00010.x
- Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
- Gök, B. & Kelecioğlu, H. (2014). Comparison of IRT Equating Methods Using the Common-Item Nonequivalent Groups Design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
- Gübeş, N. Ö. & Kelecioğlu, H. (2015). Karma Testlerin Eşitlenmesinde MTK Eşitleme Yöntemlerinin Eşitlik Özelliği Korunumu Ölçütüne Göre Karşılaştırılması. Journal of Measurement and Evaluation in Education and
Psychology, 6(1). https://doi.org/10.21031/epod.65039
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer Science & Business Media.
- Kogar, E. Y. & Kelecioglu, H. (2017). Examination of different item response theory models on tests composed of Testlets. Journal of Education and Learning, 6(4), 113. https://doi.org/10.5539/jel.v6n4p113
- Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Springer Science & Business Media.
- Lee, G., Kolen, M. J., Frisbie, D. A. & Ankenmann, R. D. (2001). Comparison of dichotomous and Polytomous item response models in equating scores from tests composed of Testlets. Applied Psychological Measurement, 25(4), 357-372. https://doi.org/10.1177/01466210122032226
- Liu, Y. & Thissen, D. (2012). Identifying local dependence with a score test statistic based on the Bifactor logistic model. Applied Psychological Measurement, 36(8), 670-688. https://doi.org/10.1177/0146621612458174
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
- Min, S. & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453-477. https://doi.org/10.1177/0265532214527277
- Richardson, M., Isaacs, T., Barnes, I., Swensson, C., Wilkinson, D. & Golding, J. (2020). Trends in International Mathematics and Science Study (TIMSS) 2019: National Report for England. Research Report. UK Department for Education.
- Rijmen, F. (2009). Three multidimensional models for testlet‐based tests: Formal relations and an empirical comparison. ETS Research Report Series, 2009(2). https://doi.org/10.1002/j.2333-8504.2009.tb02194.x
- Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. https://doi.org/10.1177/014662168300700208
- Tan, Ş. (2005). Küçük örneklemlerde beta4 ve polynomial loglineer öndüzgünleştirme ve kübik eğri sondüzgünleştirme metotlarının uygunluğu. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 35(1), 123-151.
- Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186.
https://doi.org/10.1207/s15324818ame0802_4
- Wainer, H. & Wang, X. (2000). Using a new statistical model for Testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x
- Wainer, H., Bradlow, E. T. & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
- Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data [Unpublished doctoral dissertation] University of Massachusetts Amherst.
- Zor, Y. M. (2023). Çok boyutlu testlerin tek boyutlu ve çok boyutlu yöntemlere göre eşitlenmesi [Unpublished doctoral dissertation] Hacettepe University.