Araştırma Makalesi
BibTex RIS Kaynak Göster

A practical guide to item bank calibration with multiple matrix sampling

Yıl 2024, Cilt: 11 Sayı: 4, 647 - 659, 15.11.2024
https://doi.org/10.21449/ijate.1440316

Öz

When it is required to estimate item parameters of a large item bank, Multiple Matrix Sampling (MMS) design provides an efficient way while minimizing the test burden on students. The current study exemplifies how to calibrate a large item pool using MMS design for various purposes, such as developing a CAT administration. The purpose of the current study is to explain and provide an example of how to use MMS design for item bank calibration. Two functions of mirt package, mirt() and multipleGroup() were compared using real data. The results of the present study showed that the standard mirt() function is more practical and makes more precise estimations compared to the multipleGroup() function.

Destekleyen Kurum

Bogazici University

Proje Numarası

BAP-SUP 17002

Kaynakça

  • Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
  • Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
  • Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
  • Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
  • Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
  • Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
  • Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
  • Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
  • Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
  • Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
  • OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
  • O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
  • Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
  • R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
  • Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
  • Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
  • Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
  • Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
  • Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
  • Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
  • Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.

A practical guide to item bank calibration with multiple matrix sampling

Yıl 2024, Cilt: 11 Sayı: 4, 647 - 659, 15.11.2024
https://doi.org/10.21449/ijate.1440316

Öz

When it is required to estimate item parameters of a large item bank, Multiple Matrix Sampling (MMS) design provides an efficient way while minimizing the test burden on students. The current study exemplifies how to calibrate a large item pool using MMS design for various purposes, such as developing a CAT administration. The purpose of the current study is to explain and provide an example of how to use MMS design for item bank calibration. Two functions of mirt package, mirt() and multipleGroup() were compared using real data. The results of the present study showed that the standard mirt() function is more practical and makes more precise estimations compared to the multipleGroup() function.

Proje Numarası

BAP-SUP 17002

Kaynakça

  • Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
  • Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
  • Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
  • Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
  • Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
  • Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
  • Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
  • Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
  • Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
  • Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
  • OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
  • O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
  • Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
  • R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
  • Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
  • Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
  • Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
  • Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
  • Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
  • Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
  • Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.
Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları
Bölüm Makaleler
Yazarlar

Eren Can Aybek 0000-0003-3040-2337

Serkan Arıkan 0000-0001-9610-5496

Güneş Ertaş 0000-0001-8785-7768

Proje Numarası BAP-SUP 17002
Erken Görünüm Tarihi 21 Ekim 2024
Yayımlanma Tarihi 15 Kasım 2024
Gönderilme Tarihi 20 Şubat 2024
Kabul Tarihi 12 Ağustos 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 11 Sayı: 4

Kaynak Göster

APA Aybek, E. C., Arıkan, S., & Ertaş, G. (2024). A practical guide to item bank calibration with multiple matrix sampling. International Journal of Assessment Tools in Education, 11(4), 647-659. https://doi.org/10.21449/ijate.1440316

23823             23825             23824