Investigation of a multistage adaptive test based on test assembly methods

Ebru Doğruöz; Hülya Kelecioğlu

doi:10.21449/ijate.1268614

EN TR

Investigation of a multistage adaptive test based on test assembly methods

Abstract

In this research, multistage adaptive tests (MST) were compared according to sample size, panel pattern and module length for top-down and bottom-up test assembly methods. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted according to the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias. According to the results obtained from the MST simulation based on the top-down test assembly method, mean RMSE values reduced when the module length increased and when the panel pattern changed from 1-2 to 1-2-2 and 1-2-3 for MST applied to small and large samples. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted using the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias.

Keywords

References

American Institute of Certified Public Accountants. (2019, February 18). CPA exam structure. https://www.aicpa.org/becomeacpa/cpaexam/examinationcontent.html
Belov, D.I. (2016). Review of modern methods for automated test assembly and item pool analysis. Law School Admission Council Research Report 16-01 March 2016, LSAC Research Report Series, 23 pages, https://www.lsac.org/docs/default-source/research-(lsac-resources)/rr-16-01.pdf
Breithaupt, K., Ariel, A., & Veldkamp, B. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5(3), 319 330. https://doi.org/10.1207/s15327574ijt05038
Breithaupt, K., & Hare, D.R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5-20. https://doi.org/10.1177/0013164406288162
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. CBS College Publishing.
Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework [Unpublished Doctoral Dissertation]. The University of North Carolina.
Davis, L.L., & Dodd, B.G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335 356. https://doi.org/10.1177/0146621603256804
Educational Testing Service. (2018, February 18). Computer-delivered GRE general test content and structure. http://www.ets.org/gre/revised%5Cgeneral/about/content/computer/

Hambleton, R.K., & Xing, D. (2006). Optimal and nonoptimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221-239. https://doi.org/10.1207/s15324818ame1903_4
Hembry, I.F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Unpublished Doctoral Dissertation]. University of Texas at Austin.
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44 52. https://doi.org/10.1111/j.1745 3992.2007.00093.x
Hogan, J., Thornton, N., Diaz-Hoffmann, L., Mohadjer, L., Krenzke, T., Li, J. & Khorramdel, L. (2016, July 5,). US program for the international assessment of adult competencies (PIAAC) 2012/2014: Main study and national supplement technical report (NCES 2016-036REV). U.S. Department of Education. National Center for Education Statistics. https://nces.ed.gov/pubs2016/2016036 rev.pdf
Jodoin, M.G., Zenisky, A., & Hambleton, R.K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203 220. http://doi.org/10.1207/s15324818ame1903_3
Khorramdel, L., Pokropek, A., & van Rijn, P. (2020). Special Topic: Establishing comparability and measurement invariance in large-scale assessments, part I. Psychological Test and Assessment Modeling, 62(1), 3 10. https://www.psychologie aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/01_Khorramdel.pdf
Kim, S., Moses, T., & You, H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70-79. https://doi.org/10.1111/jedm.12063
Kim, J., Chung, H., Dodd, B.G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574-588. https://doi.org/10.1177/0013164411428977
Kirsch, I., & Lennon, M.L. (2017). PIAAC: A new design for a new era. Large-scale Assessments in Education, 5, 11. https://doi.org/10.1186/s40536-017-0046-6
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
Luecht, R. (2000). Implementing the CAST framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R.M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development, 575-596. Lawrence Erlbaum Associates.
Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189 202. https://doi.org/10.1207/s15324818ame1903_2
Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229 249. https://www.learntechlib.org/p/87698/.
Luo, X. (2019). Automated test assembly with mixed-ınteger programming: The effects of modeling approaches and solvers. Journal of Educational Measurement, 57(4), 547-565. https://doi.org/10.1111/jedm.12262
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243 263. https://doi.org/10.1111/jedm.12174
Lynn Chen, L.Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partial credit model [Unpublished Doctoral Dissertation]. The University of Texas at Austin.
OECD (2015). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-1-Programme-for-International-Student-Assessment-an-Overview.pdf
OECD (2017). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-9-Scaling-PISA-Data.pdf
Papadimitriou, C.H., & Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. Prentice-Hall.
Park R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Unpublished doctoral dissertation]. University of Texas, Austin.
Patsula, L.N. (1999). A comparison of computerized-adaptive testing and multi-stage testing [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
Pihlainen, K.A.I., Santtila, M., Häkkinen, K., & Kyröläinen, H. (2018). Associations of physical fitness and body composition characteristics with simulated military task performance. The Journal of Strength & Conditioning Research, 32(4), 1089-1098. https://doi.org/10.1519/jsc.0000000000001921
Sari, H.İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Unpublished doctoral dissertation]. University of Florida.
Sari, H.I., & Raborn, A. (2018). What information works best? A comparison of routing methods. Applied psychological measurement, 42(6), 499 515. https://doi.org/10.1177/0146621617752990
Şahin Kürşad, M., Çokluk-bökeoglu, Ö. & Çıkrıkçı, N. (2022). The study of the effect of item parameter drift on ability estimation obtained from adaptive testing under different conditions. International Journal of Assessment Tools in Education, 9(3), 654-681. https://doi.org/10.21449/ijate.1070848
Theunissen, T.J.J.M. (1985). Binary programming and test design. Psychometrika, 50(4), 411-420. https://link.springer.com/article/10.1007/BF02296260
Tian, C. (2018). Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing [Unpublished master dissertation]. University of Illinois.
van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35 53. https://doi.org/10.1207/s15324818ame1301_2
van der Linden, W.J. (2005). Linear models of optimal test design. Springer.
van der Linden, W.J., & Boekkooi-Timminga, E. (1989). A maximin model for IRT-based test design with practical constraints. Psychometrika, 54(2), 237 247. https://link.springer.com/article/10.1007/BF02294518
Veldkamp, B.P. (1999). Multiple-objective test assembly problems. Journal of Educational Measurement, 36, 253-66. http://www.jstor.org/stable/1435157
Veldkamp, B.P., Matteucci, M., & de Jong, M.G. (2013). Uncertainties in the item parameter estimates and robust automated test assembly. Applied Psychological Measurement, 37, 123-139. https://doi.org/10.1177/0146621612469825
Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Unpublished doctoral dissertation]. Michigan State University.
Wise, S.L., & Kingsbury, G.G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicologica, 21, 135 155. https://www.uv.es/revispsi/articulos1y2.00/wise.pdf
Xiao, J., & Bulut, O. (2022). Item selection with collaborative filtering in on-the-fly multistage adaptive testing. Applied Psychological Measurement, 46(8), 690 704. https://doi.org/10.1177/01466216221124089
Xing, D., & Hambleton, R.K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64, 5-21. https://doi.org/10.1177/0013164403258393
Xu, L., Wang, S., Cai, Y., & Tu, D. (2021). The automated test assembly and routing rule for multistage adaptive testing with multidimensional item response theory. Journal of Educational Measurement, 58, 538-563. https://doi.org/10.1111/jedm.12305
Yan, D., Lewis, C., & von Davier, A. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von Davier, & C. Lewis (Eds.). Computerized Multistage Testing: Theory and Applications, 3-20. Chapman & Hall.
Yan, D., von Davier, A.A., & Lewis, C. (Eds.). (2014). Computerized Multistage Testing: Theory and Applications (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16858
Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Unpublished doctoral dissertation]. Michigan State University.
Zenisky, A. (2004). Evaluating the effects of several multistage testing design variables on selected psychometric outcomes for certification and licensure assessment [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. In Yan, D., Von Davier, A., & Lewis, C. (Eds.), Computerized Multistage Testing: Theory and Applications, 21-36. Chapman & Hall.
Zenisky, A., Hambleton, R.K. & Luecht, R.M. (2010). Multistage testing: Issues, designs and research. In: der Linden, W.J. & Glas, C.A.W. (Eds.). Elements of Adaptive Testing. 355-372. Springer.
Zenisky, A.L., Sireci, S.G., Martone, A., Baldwin, P., & Lam, W. (2009). Massachusetts adult proficiency tests technical manual supplement: 2008-2009. Center for Educational Assessment Research. http://www.umass.edu/remp/docs/MAPTTMSupp7-09 final.pdf
Zheng, Y. (2014). New methods of online calibration for item bank replenishment [Unpublished Doctoral Dissertation]. University of Illinois at Urbana-Champaign.
Zheng, Y., & Chang, H.-H. (2015). On-the-Fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104 118. https://doi.org/10.1177/0146621614544519
Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. (2012). Multistage adaptive testing for a large-scale classification test: the designs, heuristic assembly, and comparison with other testing modes. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME) (ACT Research Reports 2012-6). Vancouver, British Columbia, Canada.
Zheng, Y., Nozawa, Y., Zhu, R., & Gao, X. (2016). Automated top-down heuristic assembly of a classification multistage test. Int. J. Quantitative Research in Education, 3(4), 242-265. https://doi.org/10.1504/IJQRE.2016.082387
Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. [Computer software]. Scientific Software International.

Details

Primary Language

English

Subjects

Studies on Education

Journal Section

Research Article

Authors

Ebru Doğruöz ^*
0000-0001-6572-274X
Türkiye

Hülya Kelecioğlu
0000-0002-0741-9934
Türkiye

Early Pub Date

May 22, 2024

Publication Date

June 20, 2024

Submission Date

March 21, 2023

Acceptance Date

February 13, 2024

Published in Issue

Year 2024 Volume: 11 Number: 2

DOI

https://doi.org/10.21449/ijate.1268614

IZ

https://izlik.org/JA37ZN97KK

Cite

RIS / Bibtex

APA

Doğruöz, E., & Kelecioğlu, H. (2024). Investigation of a multistage adaptive test based on test assembly methods. International Journal of Assessment Tools in Education, 11(2), 270-287. https://doi.org/10.21449/ijate.1268614

AMA

1.Doğruöz E, Kelecioğlu H. Investigation of a multistage adaptive test based on test assembly methods. Int. J. Assess. Tools Educ. 2024;11(2):270-287. doi:10.21449/ijate.1268614

Chicago

Doğruöz, Ebru, and Hülya Kelecioğlu. 2024. “Investigation of a Multistage Adaptive Test Based on Test Assembly Methods”. International Journal of Assessment Tools in Education 11 (2): 270-87. https://doi.org/10.21449/ijate.1268614.

EndNote

Doğruöz E, Kelecioğlu H (June 1, 2024) Investigation of a multistage adaptive test based on test assembly methods. International Journal of Assessment Tools in Education 11 2 270–287.

IEEE

[1]E. Doğruöz and H. Kelecioğlu, “Investigation of a multistage adaptive test based on test assembly methods”, Int. J. Assess. Tools Educ., vol. 11, no. 2, pp. 270–287, June 2024, doi: 10.21449/ijate.1268614.

ISNAD

Doğruöz, Ebru - Kelecioğlu, Hülya. “Investigation of a Multistage Adaptive Test Based on Test Assembly Methods”. International Journal of Assessment Tools in Education 11/2 (June 1, 2024): 270-287. https://doi.org/10.21449/ijate.1268614.

JAMA

1.Doğruöz E, Kelecioğlu H. Investigation of a multistage adaptive test based on test assembly methods. Int. J. Assess. Tools Educ. 2024;11:270–287.

MLA

Doğruöz, Ebru, and Hülya Kelecioğlu. “Investigation of a Multistage Adaptive Test Based on Test Assembly Methods”. International Journal of Assessment Tools in Education, vol. 11, no. 2, June 2024, pp. 270-87, doi:10.21449/ijate.1268614.

Vancouver

1.Ebru Doğruöz, Hülya Kelecioğlu. Investigation of a multistage adaptive test based on test assembly methods. Int. J. Assess. Tools Educ. 2024 Jun. 1;11(2):270-87. doi:10.21449/ijate.1268614