The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests

Başak Erdem Kara; Nuri Doğan

doi:10.21449/ijate.1105769

Research Article

The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests

Year 2022, Volume: 9 Issue: 3, 682 - 696, 30.09.2022

Başak Erdem Kara , Nuri Doğan

https://doi.org/10.21449/ijate.1105769

Cited By: 6

https://izlik.org/JA86WC84CD

Abstract

Recently, adaptive test approaches have become a viable alternative to traditional fixed-item tests. The main advantage of adaptive tests is that they reach desired measurement precision with fewer items. However, fewer items mean that each item has a more significant effect on ability estimation and therefore those tests are open to more consequential results from any flaw in an item. So, any items indicating differential item functioning (DIF) may play an important role in examinees' test scores. This study, therefore, aimed to investigate the effect of DIF items on the performance of computer adaptive and multi-stage tests. For this purpose, different test designs were tested under different test lengths and ratios of DIF items using Monte Carlo simulation. As a result, it was seen that computer adaptive test (CAT) designs had the best measurement precision over all conditions. When multi-stage test (MST) panel designs were compared, it was found that the 1-3-3 design had higher measurement precision in most of the conditions; however, the findings were not enough to say that 1-3-3 design performed better than the 1-2-4 design. Furthermore, CAT was found to be the least affected design by the increase of ratio of DIF items. MST designs were affected by that increment especially in the 10-item length test.

Keywords

Computer adaptive test , Multi-stage test , Differential item functioning

References

Aksu-Dunya, B. (2017). Item parameter drift in computer adaptive testing due to lack of content knowledge within sub-populations (Publication No. 10708515) [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
Armstrong, R.D., Jones, D.H., Koppel, N.B., & Pashley, P.J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
Babcock, B., & Albano, A.D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565 580. https://dx.doi.org/10.1177/0146621612455090
Berger, S., Verschoor, A.J., Eggen, T.J.H.M., & Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(1), 1–18. https://doi.org/10.3389/feduc.2019.00001
Birdsall, M. (2011). Implementing computer adaptive testing to improve achievement opportunities. Office of Qualifications and Examinations Regulation Report. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/606023/0411_MichaelBirdsall_implementing-computer-testing-_Final_April_2011_With_Copyright.pdf
Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items (4th ed.). Sage Publications, Inc.
Chu, M.W., & Lai, H. (2013). Detecting biased items using CATSIB to increase fairness in computer adaptive tests. Alberta Journal of Educational Research, 59(4), 630–643. https://doi.org/10.11575/ajer.v59i4.55750
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group/Thomson Learning.
Gierl, M.J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19(2-3), 188–203. https://www.tandfonline.com/doi/full/10.1080/13803611.2013.767622
Hambleton, R.K., & Swaminathan, H. (1991). Item response theory: Principles and applications. Springer.
Hambleton, R.K., Jac, N.Z., & Pieters, J.P.M. (2000). Computerized adaptive testing: Theory, applications and standards. In R.K. Hambleton, & J.N. Zaal (Eds.), Advances in educational and psychological testing: Theory and applications (4th ed., pp. 341–366). Springer.
Han, K.T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Report No. RR-11-02). Graduate Management Admission Council (GMAC) Research Reports. https://www.gmac.com/~/media/Files/gmac/Research/research-report-series/rr1102_itemcalibration.pdf
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, Summer 2007, 44-52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Publication No. 3315089) [Doctoral Dissertation, University of Texas]. ProQuest Dissertations & Theses.
Lei, P.W., Chen, S.Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x
Luecht, R.M., & Sireci, S.G. (2011). A review of models for computer-based testing (Report No. 2011 12). College Board Research Report. https://files.eric.ed.gov/fulltext/ED562580.pdf
Magis, D., Yan, D., & von-Davier, A. (Eds.). (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
National Research Council (1999). Designing mathematics or science curriculum programs: A guide for using mathematics and science education standards. National Academies Press. https://www.nap.edu/catalog/9658.html
Piromsombat, C. (2014). Differential item functioning in computerized adaptive testing: Can cat self-adjust enough? (Publication No. 3620715) [Doctoral Dissertation, University of Minnesota]. ProQuest Dissertations & Theses.
Sari, H.I. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized multistage testing (Publication No. 403003) [Doctoral Dissertation, University of Florida]. The Council of Higher Education National Thesis Center.
Sari, H.I., & Huggins-Manley, A.C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory and Practice, 17, 1759-1781. http://doi:10.12738/estp.2017.5.0484
Steinberg, L., Thissen, D., & Wainer, H. (2000). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2. ed., p. 185–229). Routledge.
Tay, P.H. (2015). On-the-fly assembled multistage adaptive testing (Publication No. 3740572). [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5 20. https://doi.org/10.3102/1076998607302626
van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing. Springer.
Wainer, H. (2000). Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., p. 1–22). Lawrence Erlbaum Associates.
Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
Wang, S., Haiyan, L., Chang, H.H., & Douglas, J. (2016). Hybrid computerized adaptive testing: From group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45–62. https://doi.org/10.1111/jedm.12100
Wang, X. (2013). An investigation on computer-adaptive multistage testing panels for multidimensional assessment (Publication No. 3609605). [Doctoral Dissertation, University of North Carolina]. ProQuest Dissertations & Theses.
Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361 375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
Yan, D. (2010). Investigation of optimal design and scoring for adaptive multi-stage testing: A tree-based regression approach (Publication No. 3452799). [Master Thesis, Fordham University]. ProQuest Dissertations & Theses.
Yan, D., von-Davier, A.A., & Lewis, C. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing (p. 3–20). CRC Press; Taylor & Francis Group.
Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104 118. https://doi.org/10.1177/0146621614544519
Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Headquarters of National Defense.
Zwick, R. (2010). The investigation of differential item functioning in adaptive tests. In W.J. van der Linden and C.A.W. Glas (Eds.), Elements of adaptive testing. Springer
Zwick, R., & Bridgeman, B. (2014). Evaluating validity, fairness, and differential item functioning in multistage testing. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing. CRC Press; Taylor&Francis Group.

The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests

Year 2022, Volume: 9 Issue: 3, 682 - 696, 30.09.2022

Başak Erdem Kara , Nuri Doğan

https://doi.org/10.21449/ijate.1105769

Cited By: 6

https://izlik.org/JA86WC84CD

Abstract

Keywords

Computer adaptive test , Multi-stage test , Differential item functioning

References

Aksu-Dunya, B. (2017). Item parameter drift in computer adaptive testing due to lack of content knowledge within sub-populations (Publication No. 10708515) [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
Armstrong, R.D., Jones, D.H., Koppel, N.B., & Pashley, P.J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
Babcock, B., & Albano, A.D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565 580. https://dx.doi.org/10.1177/0146621612455090
Berger, S., Verschoor, A.J., Eggen, T.J.H.M., & Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(1), 1–18. https://doi.org/10.3389/feduc.2019.00001
Birdsall, M. (2011). Implementing computer adaptive testing to improve achievement opportunities. Office of Qualifications and Examinations Regulation Report. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/606023/0411_MichaelBirdsall_implementing-computer-testing-_Final_April_2011_With_Copyright.pdf
Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items (4th ed.). Sage Publications, Inc.
Chu, M.W., & Lai, H. (2013). Detecting biased items using CATSIB to increase fairness in computer adaptive tests. Alberta Journal of Educational Research, 59(4), 630–643. https://doi.org/10.11575/ajer.v59i4.55750
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group/Thomson Learning.
Gierl, M.J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19(2-3), 188–203. https://www.tandfonline.com/doi/full/10.1080/13803611.2013.767622
Hambleton, R.K., & Swaminathan, H. (1991). Item response theory: Principles and applications. Springer.
Hambleton, R.K., Jac, N.Z., & Pieters, J.P.M. (2000). Computerized adaptive testing: Theory, applications and standards. In R.K. Hambleton, & J.N. Zaal (Eds.), Advances in educational and psychological testing: Theory and applications (4th ed., pp. 341–366). Springer.
Han, K.T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Report No. RR-11-02). Graduate Management Admission Council (GMAC) Research Reports. https://www.gmac.com/~/media/Files/gmac/Research/research-report-series/rr1102_itemcalibration.pdf
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, Summer 2007, 44-52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Publication No. 3315089) [Doctoral Dissertation, University of Texas]. ProQuest Dissertations & Theses.
Lei, P.W., Chen, S.Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x
Luecht, R.M., & Sireci, S.G. (2011). A review of models for computer-based testing (Report No. 2011 12). College Board Research Report. https://files.eric.ed.gov/fulltext/ED562580.pdf
Magis, D., Yan, D., & von-Davier, A. (Eds.). (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
National Research Council (1999). Designing mathematics or science curriculum programs: A guide for using mathematics and science education standards. National Academies Press. https://www.nap.edu/catalog/9658.html
Piromsombat, C. (2014). Differential item functioning in computerized adaptive testing: Can cat self-adjust enough? (Publication No. 3620715) [Doctoral Dissertation, University of Minnesota]. ProQuest Dissertations & Theses.
Sari, H.I. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized multistage testing (Publication No. 403003) [Doctoral Dissertation, University of Florida]. The Council of Higher Education National Thesis Center.
Sari, H.I., & Huggins-Manley, A.C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory and Practice, 17, 1759-1781. http://doi:10.12738/estp.2017.5.0484
Steinberg, L., Thissen, D., & Wainer, H. (2000). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2. ed., p. 185–229). Routledge.
Tay, P.H. (2015). On-the-fly assembled multistage adaptive testing (Publication No. 3740572). [Doctoral Dissertation, University of Illinois]. ProQuest Dissertations & Theses.
van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5 20. https://doi.org/10.3102/1076998607302626
van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing. Springer.
Wainer, H. (2000). Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., p. 1–22). Lawrence Erlbaum Associates.
Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
Wang, S., Haiyan, L., Chang, H.H., & Douglas, J. (2016). Hybrid computerized adaptive testing: From group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45–62. https://doi.org/10.1111/jedm.12100
Wang, X. (2013). An investigation on computer-adaptive multistage testing panels for multidimensional assessment (Publication No. 3609605). [Doctoral Dissertation, University of North Carolina]. ProQuest Dissertations & Theses.
Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361 375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
Yan, D. (2010). Investigation of optimal design and scoring for adaptive multi-stage testing: A tree-based regression approach (Publication No. 3452799). [Master Thesis, Fordham University]. ProQuest Dissertations & Theses.
Yan, D., von-Davier, A.A., & Lewis, C. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing (p. 3–20). CRC Press; Taylor & Francis Group.
Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104 118. https://doi.org/10.1177/0146621614544519
Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Headquarters of National Defense.
Zwick, R. (2010). The investigation of differential item functioning in adaptive tests. In W.J. van der Linden and C.A.W. Glas (Eds.), Elements of adaptive testing. Springer
Zwick, R., & Bridgeman, B. (2014). Evaluating validity, fairness, and differential item functioning in multistage testing. In D. Yan, A.A. von-Davier, & C. Lewis (Eds.), Computerized multistage testing. CRC Press; Taylor&Francis Group.

There are 36 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Research Article
Authors	Başak Erdem Kara 0000-0003-3066-2892 Nuri Doğan 0000-0001-6274-2016
Submission Date	April 19, 2022
Publication Date	September 30, 2022
DOI	https://doi.org/10.21449/ijate.1105769
IZ	https://izlik.org/JA86WC84CD
Published in Issue	Year 2022 Volume: 9 Issue: 3

Cite

APA	Erdem Kara, B., & Doğan, N. (2022). The Effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests. International Journal of Assessment Tools in Education, 9(3), 682-696. https://doi.org/10.21449/ijate.1105769