Research Article
BibTex RIS Cite

A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Year 2017, Volume: 8 Issue: 1, 34 - 46, 03.04.2017
https://doi.org/10.21031/epod.286956

Abstract




Multidimensional computer adaptive testing (MCAT) is capable of measuring
multiple dimensions efficiently by using multidimensional IRT (MIRT)
applications. There have been several research studies about MCAT item
selection methods to improve the overall ability score estimations accuracy. According to the literature review it
has been found that most studies focused on comparing item selection methods in
many conditions except for the structure of test design. In contrast with the
previous studies, this study employed various test design (simple and complex)
which allows the evaluation of the overall ability score estimations across
multiple real test conditions. In this study, four factors were manipulated,
namely the test design, number of items per dimension, correlation between
dimensions and item selection methods. Using the generated item and ability
parameters, dichotomous item responses were generated in by using M3PL
compensatory multidimensional IRT model with specified correlations. MCAT
composite ability score accuracy was evaluated using absolute bias (ABSBIAS),
correlation and the root mean square error (RMSE) between true and estimated
ability scores. The results suggest that the multidimensional test structure,
number of item per dimension and correlation between dimensions had significant
effect on item selection methods for the overall score estimations. For simple
structure test design it was found that V1 item selection has the lowest
absolute bias estimations for both long and short test while estimating overall
scores. As the model gets complex KL item selection method performed better
than other two item selection method.




References

  • Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
  • Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
  • Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
  • Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
  • Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
  • Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
  • Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
  • Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
  • Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
  • Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
  • Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
  • Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
  • Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
  • Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
  • Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
  • Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
  • Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
  • Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
  • Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
  • Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
  • Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
  • Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
  • Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
  • van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
  • van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
  • van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
  • Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
  • Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
  • Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
  • Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
  • Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
  • Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
  • Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
  • Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
  • Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
  • Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
  • Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
  • Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
  • Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
  • Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
  • Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
  • Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
  • Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.

Basit ve Karmaşık Test Desenlerinde Çok Boyutlu Madde Seçme Yöntemlerinin Karşılaştırılması

Year 2017, Volume: 8 Issue: 1, 34 - 46, 03.04.2017
https://doi.org/10.21031/epod.286956

Abstract

Bu araştırmada diğer araştırmaların aksine toplam yetenek puanları gerçek
test koşullarına uygun olacak şekilde farklı test koşullarında
karşılaştırılmıştır (basit ve karmaşık). Araştırmada test deseni, boyut başına
düşen soru sayısı, boyutlar arası korelasyon ve madde seçme yöntemleri olmak
üzere dört koşul manipüle edilmiştir. Veri setleri, üretilen madde ve yetenek
parametreleri ve M3PL telafi edici çok boyutlu madde tepki kuramı modeli
kullanılarak belirlenen korelasyonlara bağlı kalarak üretilmiştir. Çok boyutlu
bireyselleştirilmiş bilgisayarlı test uygulamaları sonucu elde edilen toplam
yetenek puanları mutlak yanlılık (ABSBIAS), korelasyon ve hata kareleri
ortalamasının karekökü (RMSE) kullanılarak karşılaştırılmıştır. Sonuçlar
incelendiğinde çok boyutlu test deseni, boyut başına düşen madde sayısı ve
boyutlar arası korelasyon değişkenlerinin toplam puanları kestirmede madde
seçme yöntemleri üzerinde etkilerinin olduğu belirlenmiştir. Basit yapıdaki bir
test için Minimum Hata Varyansı madde seçme yönteminin hem uzun hem de kısa
testler için en düşük mutlak yanlılık değerinin ürettiği belirlenmiştir. Model
karmaşıklaştıkça Kullback-Leibler madde seçme yönteminin diğer iki yöntemden
daha iyi performans gösterdiği belirlenmiştir.

References

  • Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
  • Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
  • Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
  • Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
  • Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
  • Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
  • Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
  • Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
  • Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
  • Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
  • Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
  • Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
  • Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
  • Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
  • Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
  • Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
  • Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
  • Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
  • Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
  • Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
  • Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
  • Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
  • Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
  • van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
  • van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
  • van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
  • Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
  • Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
  • Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
  • Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
  • Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
  • Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
  • Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
  • Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
  • Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
  • Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
  • Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
  • Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
  • Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
  • Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
  • Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
  • Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
  • Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.
There are 53 citations in total.

Details

Journal Section Articles
Authors

Eren Halil Özberk

Selahattin Gelbal

Publication Date April 3, 2017
Acceptance Date March 6, 2017
Published in Issue Year 2017 Volume: 8 Issue: 1

Cite

APA Özberk, E. H., & Gelbal, S. (2017). A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 34-46. https://doi.org/10.21031/epod.286956