Computer Adaptive Multistage Testing: Practical Issues, Challenges and Principles

Halil İbrahim Sarı; Hasibe Yahsı-sarı; Anne Corinne Huggıns-manley

Yıl 2016, Cilt: 7 Sayı: 2, 388 - 406, 25.12.2016

Halil İbrahim Sarı Hasibe Yahsı-sarı Anne Corinne Huggıns-manley

Öz

Kaynakça

Angoff, W. H., & Huddleston, E. M. (1958). The multi-level experiment: a study of a two-level test system for the College Board Scholastic Aptitude Test (SR-58-21). Princeton, New Jersey: Educational Testing Service.
Armstrong, R. D. & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. (Research Report 02-07). Newtown, PA: Law School Admissions Council.
Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28, 147- 164.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness to the Fisher information for improving item exposure control in CATS. British Journal of Mathematical and Statistical Psychology, 61, 493-513.
Becker, K. A., & Bergstrom, B. A. (2013). Test administration models. Practical Assessment, Research & Evaluation, 18(14), 7.
Belov, D. I., & Armstrong, R. D. (2005). Monte Carlo test assembly for item pool analysis and extension. Applied Psychological Measurement, 29, 239-261.
Berkelaar, M. (2015). Package ‘lpSolve’.
Bridgeman, B. (2012). A Simple Answer to a Simple Question on Changing Answers. Journal of Educational Measurement, 49, 467-468.
Chang, H.H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213–229.
Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A New Stopping Rule for Computerized Adaptive Testing. Educational and Psychological Measurement, 70(6), 1–17.
Cor, K., Alves, C., & Gierl, M. (2009). Three Applications of Automated Test Assembly within a User-Friendly Modeling Environment. Practical Assessment, Research & Evaluation, 14(14).
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Cronbach, L. J., & Glaser, G. C. (1965). Psychological tests and personnel decisions. Urbana, IL: University of Illinois Press.
Crotts, K. M., Zenisky, A. L., & Sireci, S. G. (2012, April). Estimating measurement precision in reduced-length multistage-adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, BC, Canada.
Davey, T., & Y.H. Lee. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE Revised General Test. (GRE Board Research Report 08-01). Princeton, NJ: Educational Testing Service.
Davis, L. L., & Dodd, B. G. (2003). Item Exposure Constraints for Testlets in the Verbal Reasoning Section of the MCAT. Applied Psychological Measurement, 27, 335-356.
Diao, Q., & van der Linden, W. J. (2011). Automated test assembly using lp_solve version 5.5 in R. Applied Psychological Measurement, DOI: 0146621610392211.
Dubois, P. H. (1970). A history of psychological testing. Boston: Allyn & Bacon
Eignor, D. R., Stocking, M. L., Way, W. D., & Steffen, M. (1993). Case studies in computer adaptive test design through simulation. ETS Research Report Series, 1993(2).
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications (Vol. 7). Springer Science & Business Media.
Han, K. T. (2013). " MSTGen": Simulated Data Generator for Multistage Testing. Applied Psychological Measurement, 37, 666-668.
Han, K. T., & Kosinski, M. (2014). Software Tools for Multistage Testing Simulations. In Computerized Multistage Testing: Theory and Applications (pp. 411-420). Chapman and Hall/CRC.
Han, K.T., & Guo, F. (2013). An Approach to Assembling Optimal Multistage Testing Modules on the Fly (Report No. RR-13-01). Reston, Virginia: Graduate Management Admission Council. Retrieved from GMAC website: http://www.gmac.com/market-intelligence-and-research/research-library/validity-and-testing/research-reports-validity-related/module-assembly-on-the-fly.aspx
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44-52.
ILOG. (2006). ILOG CPLEX 10.0 [User’s manual]. Paris, France: ILOG SA.
Keller, L. A. (2000). Ability estimation procedures in computerized adaptive testing. USA: American Institute of Certified Public Accountants-AICPA Research Concortium-Examination Teams.
Keng, L. & Dodd, B.G. (2009, April). A comparison of the performance of testlet based computer adaptive tests and multistage tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Order No. 3315089).
Kim, H., & Plake, B. S. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Atlanta, GA.
Leung, C.K., Chang, H.H., & Hau, K.T. (2002). Item selection in computerized adaptive testing: Improving the a-Stratified design with the Sympson-Hetter algorithm. Applied Psychological Measurement, 26(4), 376–392.
Leung, C.K., Chang, H.H., & Hau, K.T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. Journal of Technology, Learning, and Assessment, 2(5).
Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367-386.
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Lord, F. M. (1974). Practical methods for redesigning a homogeneous test, also for designing a multilevel test. Educational Testing Service RB-74–30.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.
Luecht, R. M. & Sireci, S. G. (2011). A review of models for computer-based testing. Research (Report No: 2011-12). New York: The College Board. Retrieved from website: http://research.collegeboard.org/publications/content/2012/05/review-models-computer-based-testing
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224-236.
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Luecht, R. M. (2003, April). Exposure control using adaptive multi-stage item bundles. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
Luecht, R. M., & Nungester, R. J. (1998). Some Practical Examples of Computer‐Adaptive Sequential Testing. Journal of Educational Measurement,35, 229-249.
Luecht, R. M., Brumfield T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Luecht, R. M., Nungester, R.J., & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the annual meeting of the National Council of Measurement in Education, New York.
Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48(8), 1-31.
Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19, 185-187.
Patsula, L. N. & Hambleton, R.K. (1999, April). A comparative study of ability estimates from computer adaptive testing and multi-stage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec.
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing (Order No. 9950199). Available from ProQuest Dissertations & Theses Global. (304514969)
R Development Core Team. (2013). R: A language and environment for statistical computing, reference index (Version 2.2.1). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org
Rudner, L. M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York.
Schnipke, D. L., & Reese, L. M. (1999). A Comparison [of] Testlet-Based Test Designs for Computerized Adaptive Testing. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Segall, D. O. (2005). Computerized adaptive testing. Encyclopedia of social measurement, 1, 429-438.
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101-133). Hillsdale, NJ: Lawrence Erlbaum.
Thompson, N. A. (2008). A Proposed Framework of Test Administration Methods. Journal of Applied Testing Technology, 9(5), 1-17.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation,16(1), 1-9.
van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201-216. doi: 10.1007/BF02294775

Computer Adaptive Multistage Testing: Practical Issues, Challenges and Principles

Yıl 2016, Cilt: 7 Sayı: 2, 388 - 406, 25.12.2016

Halil İbrahim Sarı Hasibe Yahsı-sarı Anne Corinne Huggıns-manley

Öz

The purpose of many test in the educational and psychological measurement is to measure test takers’ latent trait scores from responses given to a set of items. Over the years, this has been done by traditional methods (paper and pencil tests). However, compared to other test administration models (e.g., adaptive testing), traditional methods are extensively criticized in terms of producing low measurement accuracy and long test length. Adaptive testing has been proposed to overcome these problems. There are two popular adaptive testing approaches. These are computerized adaptive testing (CAT) and computer adaptive multistage testing (ca-MST). The former is a well-known approach that has been predominantly used in this field. We believe that researchers and practitioners are fairly familiar with many aspects of CAT because it has more than a hundred years of history. However, the same thing is not true for the latter one. Since ca-MST is relatively new, many researchers are not familiar with features of it. The purpose of this study is to closely examine the characteristics of ca-MST, including its working principle, the adaptation procedure called the routing method, test assembly, and scoring, and provide an overview to researchers, with the aim of drawing researchers’ attention to ca-MST and encouraging them to contribute to the research in this area. The books, software and future work for ca-MST are also discussed.

Kaynakça

Angoff, W. H., & Huddleston, E. M. (1958). The multi-level experiment: a study of a two-level test system for the College Board Scholastic Aptitude Test (SR-58-21). Princeton, New Jersey: Educational Testing Service.
Armstrong, R. D. & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. (Research Report 02-07). Newtown, PA: Law School Admissions Council.
Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28, 147- 164.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness to the Fisher information for improving item exposure control in CATS. British Journal of Mathematical and Statistical Psychology, 61, 493-513.
Becker, K. A., & Bergstrom, B. A. (2013). Test administration models. Practical Assessment, Research & Evaluation, 18(14), 7.
Belov, D. I., & Armstrong, R. D. (2005). Monte Carlo test assembly for item pool analysis and extension. Applied Psychological Measurement, 29, 239-261.
Berkelaar, M. (2015). Package ‘lpSolve’.
Bridgeman, B. (2012). A Simple Answer to a Simple Question on Changing Answers. Journal of Educational Measurement, 49, 467-468.
Chang, H.H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213–229.
Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A New Stopping Rule for Computerized Adaptive Testing. Educational and Psychological Measurement, 70(6), 1–17.
Cor, K., Alves, C., & Gierl, M. (2009). Three Applications of Automated Test Assembly within a User-Friendly Modeling Environment. Practical Assessment, Research & Evaluation, 14(14).
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Cronbach, L. J., & Glaser, G. C. (1965). Psychological tests and personnel decisions. Urbana, IL: University of Illinois Press.
Crotts, K. M., Zenisky, A. L., & Sireci, S. G. (2012, April). Estimating measurement precision in reduced-length multistage-adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, BC, Canada.
Davey, T., & Y.H. Lee. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE Revised General Test. (GRE Board Research Report 08-01). Princeton, NJ: Educational Testing Service.
Davis, L. L., & Dodd, B. G. (2003). Item Exposure Constraints for Testlets in the Verbal Reasoning Section of the MCAT. Applied Psychological Measurement, 27, 335-356.
Diao, Q., & van der Linden, W. J. (2011). Automated test assembly using lp_solve version 5.5 in R. Applied Psychological Measurement, DOI: 0146621610392211.
Dubois, P. H. (1970). A history of psychological testing. Boston: Allyn & Bacon
Eignor, D. R., Stocking, M. L., Way, W. D., & Steffen, M. (1993). Case studies in computer adaptive test design through simulation. ETS Research Report Series, 1993(2).
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications (Vol. 7). Springer Science & Business Media.
Han, K. T. (2013). " MSTGen": Simulated Data Generator for Multistage Testing. Applied Psychological Measurement, 37, 666-668.
Han, K. T., & Kosinski, M. (2014). Software Tools for Multistage Testing Simulations. In Computerized Multistage Testing: Theory and Applications (pp. 411-420). Chapman and Hall/CRC.
Han, K.T., & Guo, F. (2013). An Approach to Assembling Optimal Multistage Testing Modules on the Fly (Report No. RR-13-01). Reston, Virginia: Graduate Management Admission Council. Retrieved from GMAC website: http://www.gmac.com/market-intelligence-and-research/research-library/validity-and-testing/research-reports-validity-related/module-assembly-on-the-fly.aspx
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44-52.
ILOG. (2006). ILOG CPLEX 10.0 [User’s manual]. Paris, France: ILOG SA.
Keller, L. A. (2000). Ability estimation procedures in computerized adaptive testing. USA: American Institute of Certified Public Accountants-AICPA Research Concortium-Examination Teams.
Keng, L. & Dodd, B.G. (2009, April). A comparison of the performance of testlet based computer adaptive tests and multistage tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Order No. 3315089).
Kim, H., & Plake, B. S. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Atlanta, GA.
Leung, C.K., Chang, H.H., & Hau, K.T. (2002). Item selection in computerized adaptive testing: Improving the a-Stratified design with the Sympson-Hetter algorithm. Applied Psychological Measurement, 26(4), 376–392.
Leung, C.K., Chang, H.H., & Hau, K.T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. Journal of Technology, Learning, and Assessment, 2(5).
Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367-386.
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Lord, F. M. (1974). Practical methods for redesigning a homogeneous test, also for designing a multilevel test. Educational Testing Service RB-74–30.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.
Luecht, R. M. & Sireci, S. G. (2011). A review of models for computer-based testing. Research (Report No: 2011-12). New York: The College Board. Retrieved from website: http://research.collegeboard.org/publications/content/2012/05/review-models-computer-based-testing
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224-236.
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Luecht, R. M. (2003, April). Exposure control using adaptive multi-stage item bundles. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
Luecht, R. M., & Nungester, R. J. (1998). Some Practical Examples of Computer‐Adaptive Sequential Testing. Journal of Educational Measurement,35, 229-249.
Luecht, R. M., Brumfield T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Luecht, R. M., Nungester, R.J., & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the annual meeting of the National Council of Measurement in Education, New York.
Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48(8), 1-31.
Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19, 185-187.
Patsula, L. N. & Hambleton, R.K. (1999, April). A comparative study of ability estimates from computer adaptive testing and multi-stage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec.
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing (Order No. 9950199). Available from ProQuest Dissertations & Theses Global. (304514969)
R Development Core Team. (2013). R: A language and environment for statistical computing, reference index (Version 2.2.1). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org
Rudner, L. M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York.
Schnipke, D. L., & Reese, L. M. (1999). A Comparison [of] Testlet-Based Test Designs for Computerized Adaptive Testing. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Segall, D. O. (2005). Computerized adaptive testing. Encyclopedia of social measurement, 1, 429-438.
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101-133). Hillsdale, NJ: Lawrence Erlbaum.
Thompson, N. A. (2008). A Proposed Framework of Test Administration Methods. Journal of Applied Testing Technology, 9(5), 1-17.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation,16(1), 1-9.
van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201-216. doi: 10.1007/BF02294775

Toplam 55 adet kaynakça vardır.

Ayrıntılar

Bölüm	Makaleler
Yazarlar	Halil İbrahim Sarı Bu kişi benim Hasibe Yahsı-sarı Anne Corinne Huggıns-manley Bu kişi benim
Yayımlanma Tarihi	25 Aralık 2016
Yayımlandığı Sayı	Yıl 2016 Cilt: 7 Sayı: 2

Kaynak Göster

APA	Sarı, H. İ., Yahsı-sarı, H., & Huggıns-manley, A. C. (2016). Computer Adaptive Multistage Testing: Practical Issues, Challenges and Principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin