Investigation of a multistage adaptive test based on test assembly methods
Year 2024,
, 270 - 287, 20.06.2024
Ebru Doğruöz
,
Hülya Kelecioğlu
Abstract
In this research, multistage adaptive tests (MST) were compared according to sample size, panel pattern and module length for top-down and bottom-up test assembly methods. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted according to the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias. According to the results obtained from the MST simulation based on the top-down test assembly method, mean RMSE values reduced when the module length increased and when the panel pattern changed from 1-2 to 1-2-2 and 1-2-3 for MST applied to small and large samples. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted using the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias.
References
- American Institute of Certified Public Accountants. (2019, February 18). CPA exam structure. https://www.aicpa.org/becomeacpa/cpaexam/examinationcontent.html
- Belov, D.I. (2016). Review of modern methods for automated test assembly and item pool analysis. Law School Admission Council Research Report 16-01 March 2016, LSAC Research Report Series, 23 pages, https://www.lsac.org/docs/default-source/research-(lsac-resources)/rr-16-01.pdf
- Breithaupt, K., Ariel, A., & Veldkamp, B. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5(3), 319 330. https://doi.org/10.1207/s15327574ijt05038
- Breithaupt, K., & Hare, D.R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5-20. https://doi.org/10.1177/0013164406288162
- Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. CBS College Publishing.
- Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework [Unpublished Doctoral Dissertation]. The University of North Carolina.
- Davis, L.L., & Dodd, B.G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335 356. https://doi.org/10.1177/0146621603256804
- Educational Testing Service. (2018, February 18). Computer-delivered GRE general test content and structure. http://www.ets.org/gre/revised%5Cgeneral/about/content/computer/
- Hambleton, R.K., & Xing, D. (2006). Optimal and nonoptimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221-239. https://doi.org/10.1207/s15324818ame1903_4
- Hembry, I.F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Unpublished Doctoral Dissertation]. University of Texas at Austin.
- Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44 52. https://doi.org/10.1111/j.1745 3992.2007.00093.x
- Hogan, J., Thornton, N., Diaz-Hoffmann, L., Mohadjer, L., Krenzke, T., Li, J. & Khorramdel, L. (2016, July 5,). US program for the international assessment of adult competencies (PIAAC) 2012/2014: Main study and national supplement technical report (NCES 2016-036REV). U.S. Department of Education. National Center for Education Statistics. https://nces.ed.gov/pubs2016/2016036 rev.pdf
- Jodoin, M.G., Zenisky, A., & Hambleton, R.K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203 220. http://doi.org/10.1207/s15324818ame1903_3
- Khorramdel, L., Pokropek, A., & van Rijn, P. (2020). Special Topic: Establishing comparability and measurement invariance in large-scale assessments, part I. Psychological Test and Assessment Modeling, 62(1), 3 10. https://www.psychologie aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/01_Khorramdel.pdf
- Kim, S., Moses, T., & You, H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70-79. https://doi.org/10.1111/jedm.12063
- Kim, J., Chung, H., Dodd, B.G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574-588. https://doi.org/10.1177/0013164411428977
- Kirsch, I., & Lennon, M.L. (2017). PIAAC: A new design for a new era. Large-scale Assessments in Education, 5, 11. https://doi.org/10.1186/s40536-017-0046-6
- Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
- Luecht, R. (2000). Implementing the CAST framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
- Luecht, R.M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development, 575-596. Lawrence Erlbaum Associates.
- Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189 202. https://doi.org/10.1207/s15324818ame1903_2
- Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229 249. https://www.learntechlib.org/p/87698/.
- Luo, X. (2019). Automated test assembly with mixed-ınteger programming: The effects of modeling approaches and solvers. Journal of Educational Measurement, 57(4), 547-565. https://doi.org/10.1111/jedm.12262
- Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243 263. https://doi.org/10.1111/jedm.12174
- Lynn Chen, L.Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partial credit model [Unpublished Doctoral Dissertation]. The University of Texas at Austin.
- OECD (2015). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-1-Programme-for-International-Student-Assessment-an-Overview.pdf
- OECD (2017). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-9-Scaling-PISA-Data.pdf
- Papadimitriou, C.H., & Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. Prentice-Hall.
- Park R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Unpublished doctoral dissertation]. University of Texas, Austin.
- Patsula, L.N. (1999). A comparison of computerized-adaptive testing and multi-stage testing [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
- Pihlainen, K.A.I., Santtila, M., Häkkinen, K., & Kyröläinen, H. (2018). Associations of physical fitness and body composition characteristics with simulated military task performance. The Journal of Strength & Conditioning Research, 32(4), 1089-1098. https://doi.org/10.1519/jsc.0000000000001921
- Sari, H.İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Unpublished doctoral dissertation]. University of Florida.
- Sari, H.I., & Raborn, A. (2018). What information works best? A comparison of routing methods. Applied psychological measurement, 42(6), 499 515. https://doi.org/10.1177/0146621617752990
- Şahin Kürşad, M., Çokluk-bökeoglu, Ö. & Çıkrıkçı, N. (2022). The study of the effect of item parameter drift on ability estimation obtained from adaptive testing under different conditions. International Journal of Assessment Tools in Education, 9(3), 654-681. https://doi.org/10.21449/ijate.1070848
- Theunissen, T.J.J.M. (1985). Binary programming and test design. Psychometrika, 50(4), 411-420. https://link.springer.com/article/10.1007/BF02296260
- Tian, C. (2018). Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing [Unpublished master dissertation]. University of Illinois.
- van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35 53. https://doi.org/10.1207/s15324818ame1301_2
- van der Linden, W.J. (2005). Linear models of optimal test design. Springer.
- van der Linden, W.J., & Boekkooi-Timminga, E. (1989). A maximin model for IRT-based test design with practical constraints. Psychometrika, 54(2), 237 247. https://link.springer.com/article/10.1007/BF02294518
- Veldkamp, B.P. (1999). Multiple-objective test assembly problems. Journal of Educational Measurement, 36, 253-66. http://www.jstor.org/stable/1435157
- Veldkamp, B.P., Matteucci, M., & de Jong, M.G. (2013). Uncertainties in the item parameter estimates and robust automated test assembly. Applied Psychological Measurement, 37, 123-139. https://doi.org/10.1177/0146621612469825
- Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Unpublished doctoral dissertation]. Michigan State University.
- Wise, S.L., & Kingsbury, G.G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicologica, 21, 135 155. https://www.uv.es/revispsi/articulos1y2.00/wise.pdf
- Xiao, J., & Bulut, O. (2022). Item selection with collaborative filtering in on-the-fly multistage adaptive testing. Applied Psychological Measurement, 46(8), 690 704. https://doi.org/10.1177/01466216221124089
- Xing, D., & Hambleton, R.K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64, 5-21. https://doi.org/10.1177/0013164403258393
- Xu, L., Wang, S., Cai, Y., & Tu, D. (2021). The automated test assembly and routing rule for multistage adaptive testing with multidimensional item response theory. Journal of Educational Measurement, 58, 538-563. https://doi.org/10.1111/jedm.12305
- Yan, D., Lewis, C., & von Davier, A. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von Davier, & C. Lewis (Eds.). Computerized Multistage Testing: Theory and Applications, 3-20. Chapman & Hall.
- Yan, D., von Davier, A.A., & Lewis, C. (Eds.). (2014). Computerized Multistage Testing: Theory and Applications (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16858
- Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Unpublished doctoral dissertation]. Michigan State University.
- Zenisky, A. (2004). Evaluating the effects of several multistage testing design variables on selected psychometric outcomes for certification and licensure assessment [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
- Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. In Yan, D., Von Davier, A., & Lewis, C. (Eds.), Computerized Multistage Testing: Theory and Applications, 21-36. Chapman & Hall.
- Zenisky, A., Hambleton, R.K. & Luecht, R.M. (2010). Multistage testing: Issues, designs and research. In: der Linden, W.J. & Glas, C.A.W. (Eds.). Elements of Adaptive Testing. 355-372. Springer.
- Zenisky, A.L., Sireci, S.G., Martone, A., Baldwin, P., & Lam, W. (2009). Massachusetts adult proficiency tests technical manual supplement: 2008-2009. Center for Educational Assessment Research. http://www.umass.edu/remp/docs/MAPTTMSupp7-09 final.pdf
- Zheng, Y. (2014). New methods of online calibration for item bank replenishment [Unpublished Doctoral Dissertation]. University of Illinois at Urbana-Champaign.
- Zheng, Y., & Chang, H.-H. (2015). On-the-Fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104 118. https://doi.org/10.1177/0146621614544519
- Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. (2012). Multistage adaptive testing for a large-scale classification test: the designs, heuristic assembly, and comparison with other testing modes. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME) (ACT Research Reports 2012-6). Vancouver, British Columbia, Canada.
- Zheng, Y., Nozawa, Y., Zhu, R., & Gao, X. (2016). Automated top-down heuristic assembly of a classification multistage test. Int. J. Quantitative Research in Education, 3(4), 242-265. https://doi.org/10.1504/IJQRE.2016.082387
- Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. [Computer software]. Scientific Software International.
Investigation of a multistage adaptive test based on test assembly methods
Year 2024,
, 270 - 287, 20.06.2024
Ebru Doğruöz
,
Hülya Kelecioğlu
Abstract
In this research, multistage adaptive tests (MST) were compared according to sample size, panel pattern and module length for top-down and bottom-up test assembly methods. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted according to the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias. According to the results obtained from the MST simulation based on the top-down test assembly method, mean RMSE values reduced when the module length increased and when the panel pattern changed from 1-2 to 1-2-2 and 1-2-3 for MST applied to small and large samples. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted using the parameters estimated from these data. Analysis results for each condition were compared in terms of mean RMSE and bias.
References
- American Institute of Certified Public Accountants. (2019, February 18). CPA exam structure. https://www.aicpa.org/becomeacpa/cpaexam/examinationcontent.html
- Belov, D.I. (2016). Review of modern methods for automated test assembly and item pool analysis. Law School Admission Council Research Report 16-01 March 2016, LSAC Research Report Series, 23 pages, https://www.lsac.org/docs/default-source/research-(lsac-resources)/rr-16-01.pdf
- Breithaupt, K., Ariel, A., & Veldkamp, B. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5(3), 319 330. https://doi.org/10.1207/s15327574ijt05038
- Breithaupt, K., & Hare, D.R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5-20. https://doi.org/10.1177/0013164406288162
- Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. CBS College Publishing.
- Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework [Unpublished Doctoral Dissertation]. The University of North Carolina.
- Davis, L.L., & Dodd, B.G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335 356. https://doi.org/10.1177/0146621603256804
- Educational Testing Service. (2018, February 18). Computer-delivered GRE general test content and structure. http://www.ets.org/gre/revised%5Cgeneral/about/content/computer/
- Hambleton, R.K., & Xing, D. (2006). Optimal and nonoptimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221-239. https://doi.org/10.1207/s15324818ame1903_4
- Hembry, I.F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Unpublished Doctoral Dissertation]. University of Texas at Austin.
- Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44 52. https://doi.org/10.1111/j.1745 3992.2007.00093.x
- Hogan, J., Thornton, N., Diaz-Hoffmann, L., Mohadjer, L., Krenzke, T., Li, J. & Khorramdel, L. (2016, July 5,). US program for the international assessment of adult competencies (PIAAC) 2012/2014: Main study and national supplement technical report (NCES 2016-036REV). U.S. Department of Education. National Center for Education Statistics. https://nces.ed.gov/pubs2016/2016036 rev.pdf
- Jodoin, M.G., Zenisky, A., & Hambleton, R.K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203 220. http://doi.org/10.1207/s15324818ame1903_3
- Khorramdel, L., Pokropek, A., & van Rijn, P. (2020). Special Topic: Establishing comparability and measurement invariance in large-scale assessments, part I. Psychological Test and Assessment Modeling, 62(1), 3 10. https://www.psychologie aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/01_Khorramdel.pdf
- Kim, S., Moses, T., & You, H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70-79. https://doi.org/10.1111/jedm.12063
- Kim, J., Chung, H., Dodd, B.G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574-588. https://doi.org/10.1177/0013164411428977
- Kirsch, I., & Lennon, M.L. (2017). PIAAC: A new design for a new era. Large-scale Assessments in Education, 5, 11. https://doi.org/10.1186/s40536-017-0046-6
- Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
- Luecht, R. (2000). Implementing the CAST framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
- Luecht, R.M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development, 575-596. Lawrence Erlbaum Associates.
- Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189 202. https://doi.org/10.1207/s15324818ame1903_2
- Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229 249. https://www.learntechlib.org/p/87698/.
- Luo, X. (2019). Automated test assembly with mixed-ınteger programming: The effects of modeling approaches and solvers. Journal of Educational Measurement, 57(4), 547-565. https://doi.org/10.1111/jedm.12262
- Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243 263. https://doi.org/10.1111/jedm.12174
- Lynn Chen, L.Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partial credit model [Unpublished Doctoral Dissertation]. The University of Texas at Austin.
- OECD (2015). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-1-Programme-for-International-Student-Assessment-an-Overview.pdf
- OECD (2017). PISA 2015 technical report. http://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-9-Scaling-PISA-Data.pdf
- Papadimitriou, C.H., & Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. Prentice-Hall.
- Park R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Unpublished doctoral dissertation]. University of Texas, Austin.
- Patsula, L.N. (1999). A comparison of computerized-adaptive testing and multi-stage testing [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
- Pihlainen, K.A.I., Santtila, M., Häkkinen, K., & Kyröläinen, H. (2018). Associations of physical fitness and body composition characteristics with simulated military task performance. The Journal of Strength & Conditioning Research, 32(4), 1089-1098. https://doi.org/10.1519/jsc.0000000000001921
- Sari, H.İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Unpublished doctoral dissertation]. University of Florida.
- Sari, H.I., & Raborn, A. (2018). What information works best? A comparison of routing methods. Applied psychological measurement, 42(6), 499 515. https://doi.org/10.1177/0146621617752990
- Şahin Kürşad, M., Çokluk-bökeoglu, Ö. & Çıkrıkçı, N. (2022). The study of the effect of item parameter drift on ability estimation obtained from adaptive testing under different conditions. International Journal of Assessment Tools in Education, 9(3), 654-681. https://doi.org/10.21449/ijate.1070848
- Theunissen, T.J.J.M. (1985). Binary programming and test design. Psychometrika, 50(4), 411-420. https://link.springer.com/article/10.1007/BF02296260
- Tian, C. (2018). Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing [Unpublished master dissertation]. University of Illinois.
- van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35 53. https://doi.org/10.1207/s15324818ame1301_2
- van der Linden, W.J. (2005). Linear models of optimal test design. Springer.
- van der Linden, W.J., & Boekkooi-Timminga, E. (1989). A maximin model for IRT-based test design with practical constraints. Psychometrika, 54(2), 237 247. https://link.springer.com/article/10.1007/BF02294518
- Veldkamp, B.P. (1999). Multiple-objective test assembly problems. Journal of Educational Measurement, 36, 253-66. http://www.jstor.org/stable/1435157
- Veldkamp, B.P., Matteucci, M., & de Jong, M.G. (2013). Uncertainties in the item parameter estimates and robust automated test assembly. Applied Psychological Measurement, 37, 123-139. https://doi.org/10.1177/0146621612469825
- Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Unpublished doctoral dissertation]. Michigan State University.
- Wise, S.L., & Kingsbury, G.G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicologica, 21, 135 155. https://www.uv.es/revispsi/articulos1y2.00/wise.pdf
- Xiao, J., & Bulut, O. (2022). Item selection with collaborative filtering in on-the-fly multistage adaptive testing. Applied Psychological Measurement, 46(8), 690 704. https://doi.org/10.1177/01466216221124089
- Xing, D., & Hambleton, R.K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64, 5-21. https://doi.org/10.1177/0013164403258393
- Xu, L., Wang, S., Cai, Y., & Tu, D. (2021). The automated test assembly and routing rule for multistage adaptive testing with multidimensional item response theory. Journal of Educational Measurement, 58, 538-563. https://doi.org/10.1111/jedm.12305
- Yan, D., Lewis, C., & von Davier, A. (2014). Overview of computerized multistage tests. In D. Yan, A.A. von Davier, & C. Lewis (Eds.). Computerized Multistage Testing: Theory and Applications, 3-20. Chapman & Hall.
- Yan, D., von Davier, A.A., & Lewis, C. (Eds.). (2014). Computerized Multistage Testing: Theory and Applications (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16858
- Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Unpublished doctoral dissertation]. Michigan State University.
- Zenisky, A. (2004). Evaluating the effects of several multistage testing design variables on selected psychometric outcomes for certification and licensure assessment [Unpublished doctoral dissertation]. University of Massachusetts at Amherst.
- Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. In Yan, D., Von Davier, A., & Lewis, C. (Eds.), Computerized Multistage Testing: Theory and Applications, 21-36. Chapman & Hall.
- Zenisky, A., Hambleton, R.K. & Luecht, R.M. (2010). Multistage testing: Issues, designs and research. In: der Linden, W.J. & Glas, C.A.W. (Eds.). Elements of Adaptive Testing. 355-372. Springer.
- Zenisky, A.L., Sireci, S.G., Martone, A., Baldwin, P., & Lam, W. (2009). Massachusetts adult proficiency tests technical manual supplement: 2008-2009. Center for Educational Assessment Research. http://www.umass.edu/remp/docs/MAPTTMSupp7-09 final.pdf
- Zheng, Y. (2014). New methods of online calibration for item bank replenishment [Unpublished Doctoral Dissertation]. University of Illinois at Urbana-Champaign.
- Zheng, Y., & Chang, H.-H. (2015). On-the-Fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104 118. https://doi.org/10.1177/0146621614544519
- Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. (2012). Multistage adaptive testing for a large-scale classification test: the designs, heuristic assembly, and comparison with other testing modes. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME) (ACT Research Reports 2012-6). Vancouver, British Columbia, Canada.
- Zheng, Y., Nozawa, Y., Zhu, R., & Gao, X. (2016). Automated top-down heuristic assembly of a classification multistage test. Int. J. Quantitative Research in Education, 3(4), 242-265. https://doi.org/10.1504/IJQRE.2016.082387
- Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. [Computer software]. Scientific Software International.