A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Eren Halil Özberk; Selahattin Gelbal

doi:10.21031/epod.286956

Research Article

A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Year 2017, Volume: 8 Issue: 1, 34 - 46, 03.04.2017

Eren Halil Özberk , Selahattin Gelbal

https://doi.org/10.21031/epod.286956

Abstract

Multidimensional computer adaptive testing (MCAT) is capable of measuring
multiple dimensions efficiently by using multidimensional IRT (MIRT)
applications. There have been several research studies about MCAT item
selection methods to improve the overall ability score estimations accuracy. According to the literature review it
has been found that most studies focused on comparing item selection methods in
many conditions except for the structure of test design. In contrast with the
previous studies, this study employed various test design (simple and complex)
which allows the evaluation of the overall ability score estimations across
multiple real test conditions. In this study, four factors were manipulated,
namely the test design, number of items per dimension, correlation between
dimensions and item selection methods. Using the generated item and ability
parameters, dichotomous item responses were generated in by using M3PL
compensatory multidimensional IRT model with specified correlations. MCAT
composite ability score accuracy was evaluated using absolute bias (ABSBIAS),
correlation and the root mean square error (RMSE) between true and estimated
ability scores. The results suggest that the multidimensional test structure,
number of item per dimension and correlation between dimensions had significant
effect on item selection methods for the overall score estimations. For simple
structure test design it was found that V1 item selection has the lowest
absolute bias estimations for both long and short test while estimating overall
scores. As the model gets complex KL item selection method performed better
than other two item selection method.

Keywords

Item selection method, multidimensional computer adaptive testing, multidimensional item response theory, composite score estimation

References

Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801
Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.

Basit ve Karmaşık Test Desenlerinde Çok Boyutlu Madde Seçme Yöntemlerinin Karşılaştırılması

Year 2017, Volume: 8 Issue: 1, 34 - 46, 03.04.2017

Eren Halil Özberk , Selahattin Gelbal

https://doi.org/10.21031/epod.286956

Abstract

Bu araştırmada diğer araştırmaların aksine toplam yetenek puanları gerçek
test koşullarına uygun olacak şekilde farklı test koşullarında
karşılaştırılmıştır (basit ve karmaşık). Araştırmada test deseni, boyut başına
düşen soru sayısı, boyutlar arası korelasyon ve madde seçme yöntemleri olmak
üzere dört koşul manipüle edilmiştir. Veri setleri, üretilen madde ve yetenek
parametreleri ve M3PL telafi edici çok boyutlu madde tepki kuramı modeli
kullanılarak belirlenen korelasyonlara bağlı kalarak üretilmiştir. Çok boyutlu
bireyselleştirilmiş bilgisayarlı test uygulamaları sonucu elde edilen toplam
yetenek puanları mutlak yanlılık (ABSBIAS), korelasyon ve hata kareleri
ortalamasının karekökü (RMSE) kullanılarak karşılaştırılmıştır. Sonuçlar
incelendiğinde çok boyutlu test deseni, boyut başına düşen madde sayısı ve
boyutlar arası korelasyon değişkenlerinin toplam puanları kestirmede madde
seçme yöntemleri üzerinde etkilerinin olduğu belirlenmiştir. Basit yapıdaki bir
test için Minimum Hata Varyansı madde seçme yönteminin hem uzun hem de kısa
testler için en düşük mutlak yanlılık değerinin ürettiği belirlenmiştir. Model
karmaşıklaştıkça Kullback-Leibler madde seçme yönteminin diğer iki yöntemden
daha iyi performans gösterdiği belirlenmiştir.

Keywords

Madde seçme yöntemi, çok boyutlu bireyselleştirilmiş bilgisayarlı test, çok boyutlu maddde tepki kuramı, toplam puan kestirimi

References

Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801
Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.

There are 53 citations in total.

Details

Journal Section	Articles
Authors	Eren Halil Özberk Selahattin Gelbal
Publication Date	April 3, 2017
Acceptance Date	March 6, 2017
Published in Issue	Year 2017 Volume: 8 Issue: 1

Cite

APA	Özberk, E. H., & Gelbal, S. (2017). A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 34-46. https://doi.org/10.21031/epod.286956

Download Cover Image

Article Files

Full Text