Araştırma Makalesi
BibTex RIS Kaynak Göster

The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests

Yıl 2022, , 682 - 696, 30.09.2022
https://doi.org/10.21449/ijate.1105769

Öz

Recently, adaptive test approaches have become a viable alternative to traditional fixed-item tests. The main advantage of adaptive tests is that they reach desired measurement precision with fewer items. However, fewer items mean that each item has a more significant effect on ability estimation and therefore those tests are open to more consequential results from any flaw in an item. So, any items indicating differential item functioning (DIF) may play an important role in examinees' test scores. This study, therefore, aimed to investigate the effect of DIF items on the performance of computer adaptive and multi-stage tests. For this purpose, different test designs were tested under different test lengths and ratios of DIF items using Monte Carlo simulation. As a result, it was seen that computer adaptive test (CAT) designs had the best measurement precision over all conditions. When multi-stage test (MST) panel designs were compared, it was found that the 1-3-3 design had higher measurement precision in most of the conditions; however, the findings were not enough to say that 1-3-3 design performed better than the 1-2-4 design. Furthermore, CAT was found to be the least affected design by the increase of ratio of DIF items. MST designs were affected by that increment especially in the 10-item length test.

Kaynakça

  • Aksu-Dunya, B. (2017). Item parameter drift in computer adaptive testing due to lack of content knowledge within sub-populations (Publication No. 10708515) [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
  • Armstrong, R.D., Jones, D.H., Koppel, N.B., & Pashley, P.J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
  • Babcock, B., & Albano, A.D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565 580. https://dx.doi.org/10.1177/0146621612455090
  • Berger, S., Verschoor, A.J., Eggen, T.J.H.M., & Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(1), 1–18. https://doi.org/10.3389/feduc.2019.00001
  • Birdsall, M. (2011). Implementing computer adaptive testing to improve achievement opportunities. Office of Qualifications and Examinations Regulation Report. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/606023/0411_MichaelBirdsall_implementing-computer-testing-_Final_April_2011_With_Copyright.pdf
  • Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items (4th ed.). Sage Publications, Inc.
  • Chu, M.W., & Lai, H. (2013). Detecting biased items using CATSIB to increase fairness in computer adaptive tests. Alberta Journal of Educational Research, 59(4), 630–643. https://doi.org/10.11575/ajer.v59i4.55750
  • Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group/Thomson Learning.
  • Gierl, M.J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19(2-3), 188–203. https://www.tandfonline.com/doi/full/10.1080/13803611.2013.767622
  • Hambleton, R.K., & Swaminathan, H. (1991). Item response theory: Principles and applications. Springer.
  • Hambleton, R.K., Jac, N.Z., & Pieters, J.P.M. (2000). Computerized adaptive testing: Theory, applications and standards. In R.K. Hambleton, & J.N. Zaal (Eds.), Advances in educational and psychological testing: Theory and applications (4th ed., pp. 341–366). Springer.
  • Han, K.T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Report No. RR-11-02). Graduate Management Admission Council (GMAC) Research Reports. https://www.gmac.com/~/media/Files/gmac/Research/research-report-series/rr1102_itemcalibration.pdf
  • Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, Summer 2007, 44-52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
  • Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Publication No. 3315089) [Doctoral Dissertation, University of Texas]. ProQuest Dissertations & Theses.
  • Lei, P.W., Chen, S.Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x
  • Luecht, R.M., & Sireci, S.G. (2011). A review of models for computer-based testing (Report No. 2011 12). College Board Research Report. https://files.eric.ed.gov/fulltext/ED562580.pdf
  • Magis, D., Yan, D., & von-Davier, A. (Eds.). (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
  • National Research Council (1999). Designing mathematics or science curriculum programs: A guide for using mathematics and science education standards. National Academies Press. https://www.nap.edu/catalog/9658.html
  • Piromsombat, C. (2014). Differential item functioning in computerized adaptive testing: Can cat self-adjust enough? (Publication No. 3620715) [Doctoral Dissertation, University of Minnesota]. ProQuest Dissertations & Theses.
  • Sari, H.I. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized multistage testing (Publication No. 403003) [Doctoral Dissertation, University of Florida]. The Council of Higher Education National Thesis Center.
  • Sari, H.I., & Huggins-Manley, A.C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory and Practice, 17, 1759-1781. http://doi:10.12738/estp.2017.5.0484
  • Steinberg, L., Thissen, D., & Wainer, H. (2000). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2. ed., p. 185–229). Routledge.
  • Tay, P.H. (2015). On-the-fly assembled multistage adaptive testing (Publication No. 3740572). [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
  • van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5 20. https://doi.org/10.3102/1076998607302626
  • van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing. Springer.
  • Wainer, H. (2000). Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., p. 1–22). Lawrence Erlbaum Associates.
  • Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
  • Wang, S., Haiyan, L., Chang, H.H., & Douglas, J. (2016). Hybrid computerized adaptive testing: From group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45–62. https://doi.org/10.1111/jedm.12100
  • Wang, X. (2013). An investigation on computer-adaptive multistage testing panels for multidimensional assessment (Publication No. 3609605). [Doctoral Dissertation, University of North Carolina]. ProQuest Dissertations & Theses.
  • Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361 375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
  • Yan, D. (2010). Investigation of optimal design and scoring for adaptive multi-stage testing: A tree-based regression approach (Publication No. 3452799). [Master Thesis, Fordham University]. ProQuest Dissertations & Theses.
  • Yan, D., von-Davier, A.A., & Lewis, C. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing (p. 3–20). CRC Press; Taylor & Francis Group.
  • Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104 118. https://doi.org/10.1177/0146621614544519
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Headquarters of National Defense.
  • Zwick, R. (2010). The investigation of differential item functioning in adaptive tests. In W.J. van der Linden and C.A.W. Glas (Eds.), Elements of adaptive testing. Springer
  • Zwick, R., & Bridgeman, B. (2014). Evaluating validity, fairness, and differential item functioning in multistage testing. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing. CRC Press; Taylor&Francis Group.

The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests

Yıl 2022, , 682 - 696, 30.09.2022
https://doi.org/10.21449/ijate.1105769

Öz

Recently, adaptive test approaches have become a viable alternative to traditional fixed-item tests. The main advantage of adaptive tests is that they reach desired measurement precision with fewer items. However, fewer items mean that each item has a more significant effect on ability estimation and therefore those tests are open to more consequential results from any flaw in an item. So, any items indicating differential item functioning (DIF) may play an important role in examinees' test scores. This study, therefore, aimed to investigate the effect of DIF items on the performance of computer adaptive and multi-stage tests. For this purpose, different test designs were tested under different test lengths and ratios of DIF items using Monte Carlo simulation. As a result, it was seen that computer adaptive test (CAT) designs had the best measurement precision over all conditions. When multi-stage test (MST) panel designs were compared, it was found that the 1-3-3 design had higher measurement precision in most of the conditions; however, the findings were not enough to say that 1-3-3 design performed better than the 1-2-4 design. Furthermore, CAT was found to be the least affected design by the increase of ratio of DIF items. MST designs were affected by that increment especially in the 10-item length test.

Kaynakça

  • Aksu-Dunya, B. (2017). Item parameter drift in computer adaptive testing due to lack of content knowledge within sub-populations (Publication No. 10708515) [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
  • Armstrong, R.D., Jones, D.H., Koppel, N.B., & Pashley, P.J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
  • Babcock, B., & Albano, A.D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565 580. https://dx.doi.org/10.1177/0146621612455090
  • Berger, S., Verschoor, A.J., Eggen, T.J.H.M., & Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(1), 1–18. https://doi.org/10.3389/feduc.2019.00001
  • Birdsall, M. (2011). Implementing computer adaptive testing to improve achievement opportunities. Office of Qualifications and Examinations Regulation Report. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/606023/0411_MichaelBirdsall_implementing-computer-testing-_Final_April_2011_With_Copyright.pdf
  • Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items (4th ed.). Sage Publications, Inc.
  • Chu, M.W., & Lai, H. (2013). Detecting biased items using CATSIB to increase fairness in computer adaptive tests. Alberta Journal of Educational Research, 59(4), 630–643. https://doi.org/10.11575/ajer.v59i4.55750
  • Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group/Thomson Learning.
  • Gierl, M.J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19(2-3), 188–203. https://www.tandfonline.com/doi/full/10.1080/13803611.2013.767622
  • Hambleton, R.K., & Swaminathan, H. (1991). Item response theory: Principles and applications. Springer.
  • Hambleton, R.K., Jac, N.Z., & Pieters, J.P.M. (2000). Computerized adaptive testing: Theory, applications and standards. In R.K. Hambleton, & J.N. Zaal (Eds.), Advances in educational and psychological testing: Theory and applications (4th ed., pp. 341–366). Springer.
  • Han, K.T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Report No. RR-11-02). Graduate Management Admission Council (GMAC) Research Reports. https://www.gmac.com/~/media/Files/gmac/Research/research-report-series/rr1102_itemcalibration.pdf
  • Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, Summer 2007, 44-52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
  • Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Publication No. 3315089) [Doctoral Dissertation, University of Texas]. ProQuest Dissertations & Theses.
  • Lei, P.W., Chen, S.Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x
  • Luecht, R.M., & Sireci, S.G. (2011). A review of models for computer-based testing (Report No. 2011 12). College Board Research Report. https://files.eric.ed.gov/fulltext/ED562580.pdf
  • Magis, D., Yan, D., & von-Davier, A. (Eds.). (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
  • National Research Council (1999). Designing mathematics or science curriculum programs: A guide for using mathematics and science education standards. National Academies Press. https://www.nap.edu/catalog/9658.html
  • Piromsombat, C. (2014). Differential item functioning in computerized adaptive testing: Can cat self-adjust enough? (Publication No. 3620715) [Doctoral Dissertation, University of Minnesota]. ProQuest Dissertations & Theses.
  • Sari, H.I. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized multistage testing (Publication No. 403003) [Doctoral Dissertation, University of Florida]. The Council of Higher Education National Thesis Center.
  • Sari, H.I., & Huggins-Manley, A.C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory and Practice, 17, 1759-1781. http://doi:10.12738/estp.2017.5.0484
  • Steinberg, L., Thissen, D., & Wainer, H. (2000). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2. ed., p. 185–229). Routledge.
  • Tay, P.H. (2015). On-the-fly assembled multistage adaptive testing (Publication No. 3740572). [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
  • van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5 20. https://doi.org/10.3102/1076998607302626
  • van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing. Springer.
  • Wainer, H. (2000). Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., p. 1–22). Lawrence Erlbaum Associates.
  • Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
  • Wang, S., Haiyan, L., Chang, H.H., & Douglas, J. (2016). Hybrid computerized adaptive testing: From group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45–62. https://doi.org/10.1111/jedm.12100
  • Wang, X. (2013). An investigation on computer-adaptive multistage testing panels for multidimensional assessment (Publication No. 3609605). [Doctoral Dissertation, University of North Carolina]. ProQuest Dissertations & Theses.
  • Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361 375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
  • Yan, D. (2010). Investigation of optimal design and scoring for adaptive multi-stage testing: A tree-based regression approach (Publication No. 3452799). [Master Thesis, Fordham University]. ProQuest Dissertations & Theses.
  • Yan, D., von-Davier, A.A., & Lewis, C. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing (p. 3–20). CRC Press; Taylor & Francis Group.
  • Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104 118. https://doi.org/10.1177/0146621614544519
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Headquarters of National Defense.
  • Zwick, R. (2010). The investigation of differential item functioning in adaptive tests. In W.J. van der Linden and C.A.W. Glas (Eds.), Elements of adaptive testing. Springer
  • Zwick, R., & Bridgeman, B. (2014). Evaluating validity, fairness, and differential item functioning in multistage testing. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing. CRC Press; Taylor&Francis Group.
Toplam 36 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Alan Eğitimleri
Bölüm Makaleler
Yazarlar

Başak Erdem Kara 0000-0003-3066-2892

Nuri Doğan 0000-0001-6274-2016

Yayımlanma Tarihi 30 Eylül 2022
Gönderilme Tarihi 19 Nisan 2022
Yayımlandığı Sayı Yıl 2022

Kaynak Göster

APA Erdem Kara, B., & Doğan, N. (2022). The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests. International Journal of Assessment Tools in Education, 9(3), 682-696. https://doi.org/10.21449/ijate.1105769

23823             23825             23824