A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Eren Halil Özberk; Selahattin Gelbal

doi:10.21031/epod.286956

EN TR

A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Abstract

Multidimensional computer adaptive testing (MCAT) is capable of measuring multiple dimensions efficiently by using multidimensional IRT (MIRT) applications. There have been several research studies about MCAT item selection methods to improve the overall ability score estimations accuracy. According to the literature review it has been found that most studies focused on comparing item selection methods in many conditions except for the structure of test design. In contrast with the previous studies, this study employed various test design (simple and complex) which allows the evaluation of the overall ability score estimations across multiple real test conditions. In this study, four factors were manipulated, namely the test design, number of items per dimension, correlation between dimensions and item selection methods. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. MCAT composite ability score accuracy was evaluated using absolute bias (ABSBIAS), correlation and the root mean square error (RMSE) between true and estimated ability scores. The results suggest that the multidimensional test structure, number of item per dimension and correlation between dimensions had significant effect on item selection methods for the overall score estimations. For simple structure test design it was found that V1 item selection has the lowest absolute bias estimations for both long and short test while estimating overall scores. As the model gets complex KL item selection method performed better than other two item selection method.

Keywords

Item selection method,multidimensional computer adaptive testing,multidimensional item response theory,composite score estimation

Basit ve Karmaşık Test Desenlerinde Çok Boyutlu Madde Seçme Yöntemlerinin Karşılaştırılması

Abstract

Bu araştırmada diğer araştırmaların aksine toplam yetenek puanları gerçek test koşullarına uygun olacak şekilde farklı test koşullarında karşılaştırılmıştır (basit ve karmaşık). Araştırmada test deseni, boyut başına düşen soru sayısı, boyutlar arası korelasyon ve madde seçme yöntemleri olmak üzere dört koşul manipüle edilmiştir. Veri setleri, üretilen madde ve yetenek parametreleri ve M3PL telafi edici çok boyutlu madde tepki kuramı modeli kullanılarak belirlenen korelasyonlara bağlı kalarak üretilmiştir. Çok boyutlu bireyselleştirilmiş bilgisayarlı test uygulamaları sonucu elde edilen toplam yetenek puanları mutlak yanlılık (ABSBIAS), korelasyon ve hata kareleri ortalamasının karekökü (RMSE) kullanılarak karşılaştırılmıştır. Sonuçlar incelendiğinde çok boyutlu test deseni, boyut başına düşen madde sayısı ve boyutlar arası korelasyon değişkenlerinin toplam puanları kestirmede madde seçme yöntemleri üzerinde etkilerinin olduğu belirlenmiştir. Basit yapıdaki bir test için Minimum Hata Varyansı madde seçme yönteminin hem uzun hem de kısa testler için en düşük mutlak yanlılık değerinin ürettiği belirlenmiştir. Model karmaşıklaştıkça Kullback-Leibler madde seçme yönteminin diğer iki yöntemden daha iyi performans gösterdiği belirlenmiştir.

Keywords

Madde seçme yöntemi,çok boyutlu bireyselleştirilmiş bilgisayarlı test,çok boyutlu maddde tepki kuramı,toplam puan kestirimi

References

Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801

Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.

Details

Primary Language

English

Subjects

Journal Section

Research Article

Authors

Eren Halil Özberk
HACETTEPE UNIV
Türkiye

Selahattin Gelbal
HACETTEPE UNIV

Publication Date

April 3, 2017

Submission Date

January 22, 2017

Acceptance Date

March 6, 2017

Published in Issue

Year 2017 Volume: 8 Number: 1

DOI

https://doi.org/10.21031/epod.286956

IZ

https://izlik.org/JA99SY24ET

Cite

RIS / Bibtex

APA

Özberk, E. H., & Gelbal, S. (2017). A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 34-46. https://doi.org/10.21031/epod.286956

AMA

1.Özberk EH, Gelbal S. A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. JMEEP. 2017;8(1):34-46. doi:10.21031/epod.286956

Chicago

Özberk, Eren Halil, and Selahattin Gelbal. 2017. “A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs”. Journal of Measurement and Evaluation in Education and Psychology 8 (1): 34-46. https://doi.org/10.21031/epod.286956.

EndNote

Özberk EH, Gelbal S (March 1, 2017) A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology 8 1 34–46.

IEEE

[1]E. H. Özberk and S. Gelbal, “A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs”, JMEEP, vol. 8, no. 1, pp. 34–46, Mar. 2017, doi: 10.21031/epod.286956.

ISNAD

Özberk, Eren Halil - Gelbal, Selahattin. “A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs”. Journal of Measurement and Evaluation in Education and Psychology 8/1 (March 1, 2017): 34-46. https://doi.org/10.21031/epod.286956.

JAMA

1.Özberk EH, Gelbal S. A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. JMEEP. 2017;8:34–46.

MLA

Özberk, Eren Halil, and Selahattin Gelbal. “A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs”. Journal of Measurement and Evaluation in Education and Psychology, vol. 8, no. 1, Mar. 2017, pp. 34-46, doi:10.21031/epod.286956.

Vancouver

1.Eren Halil Özberk, Selahattin Gelbal. A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. JMEEP. 2017 Mar. 1;8(1):34-46. doi:10.21031/epod.286956