Year 2017, Volume 8 , Issue 1, Pages 34 - 46 2017-04-03

Basit ve Karmaşık Test Desenlerinde Çok Boyutlu Madde Seçme Yöntemlerinin Karşılaştırılması
A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Eren Halil ÖZBERK [1] , Selahattin GELBAL [2]


Bu araştırmada diğer araştırmaların aksine toplam yetenek puanları gerçek test koşullarına uygun olacak şekilde farklı test koşullarında karşılaştırılmıştır (basit ve karmaşık). Araştırmada test deseni, boyut başına düşen soru sayısı, boyutlar arası korelasyon ve madde seçme yöntemleri olmak üzere dört koşul manipüle edilmiştir. Veri setleri, üretilen madde ve yetenek parametreleri ve M3PL telafi edici çok boyutlu madde tepki kuramı modeli kullanılarak belirlenen korelasyonlara bağlı kalarak üretilmiştir. Çok boyutlu bireyselleştirilmiş bilgisayarlı test uygulamaları sonucu elde edilen toplam yetenek puanları mutlak yanlılık (ABSBIAS), korelasyon ve hata kareleri ortalamasının karekökü (RMSE) kullanılarak karşılaştırılmıştır. Sonuçlar incelendiğinde çok boyutlu test deseni, boyut başına düşen madde sayısı ve boyutlar arası korelasyon değişkenlerinin toplam puanları kestirmede madde seçme yöntemleri üzerinde etkilerinin olduğu belirlenmiştir. Basit yapıdaki bir test için Minimum Hata Varyansı madde seçme yönteminin hem uzun hem de kısa testler için en düşük mutlak yanlılık değerinin ürettiği belirlenmiştir. Model karmaşıklaştıkça Kullback-Leibler madde seçme yönteminin diğer iki yöntemden daha iyi performans gösterdiği belirlenmiştir.


Multidimensional computer adaptive testing (MCAT) is capable of measuring multiple dimensions efficiently by using multidimensional IRT (MIRT) applications. There have been several research studies about MCAT item selection methods to improve the overall ability score estimations accuracy. According to the literature review it has been found that most studies focused on comparing item selection methods in many conditions except for the structure of test design. In contrast with the previous studies, this study employed various test design (simple and complex) which allows the evaluation of the overall ability score estimations across multiple real test conditions. In this study, four factors were manipulated, namely the test design, number of items per dimension, correlation between dimensions and item selection methods. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. MCAT composite ability score accuracy was evaluated using absolute bias (ABSBIAS), correlation and the root mean square error (RMSE) between true and estimated ability scores. The results suggest that the multidimensional test structure, number of item per dimension and correlation between dimensions had significant effect on item selection methods for the overall score estimations. For simple structure test design it was found that V1 item selection has the lowest absolute bias estimations for both long and short test while estimating overall scores. As the model gets complex KL item selection method performed better than other two item selection method.


  • Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1
  • Ackerman, T. (1987). A comparison study of the unidimensional IRT estimation of compensatory and noncompensatory multidimensional item response data (ACT Araştırma Raporu Serisi 87-12). Iowa City, IA: American College Testing.
  • Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.doi:10.1111/j.1745-3984.1992.tb00368.x
  • Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37–53. doi:10.1111/j.1745-3992.2003.tb00136.x
  • Adams RJ, Wilson M, Wang W (1997) The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,21, 1–23. doi: 10.1177/0146621697211001
  • Bashkov, B. M. (2015). Examining the performance of the Metropolis-Hastings Robbins-Monro algorithm in the estimation of multilevel multidimensional IRT models. (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA, USA.
  • Bloxom, B., & Vale, C.D. (1987). Multidimensional adaptive testing: An approximate procedure for updating. Paper presented at the annual meeting of the psychometric society. Montreal, Canada.
  • Bock, R. D. & M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46: 443-459. doi:10.1007/BF02293801
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, USA.
  • Diao, Q. (2009). Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. (Unpublished doctoral dissertation). Michigan State University, East Lansing, MI, USA.
  • Fan, M., & Hsu, Y. (1996). Multidimensional computer adaptive testing. Paper presented at the annual meeting of the American Educational Testing Association, New York City, NY.
  • Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10–26. doi:10.1177/0146621609336112
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204-229. doi:10.3102/1076998607302636
  • Han, K. T., & Paek, I. (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement, 38(6), 486-498. doi:10.1177/0146621614536770
  • Lee, M. (2014). Application of higher-order IRT models and hierarchical IRT models to computerized adaptive testing. (Unpublished doctoral dissertation). University of California Los Angeles.
  • Liu, F. (2015). Comparisons of subscoring methods in computerized adaptive testing: A simulation study. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. doi:10.1177/014662169602000406
  • Luecht, R. M. (2004). MIRTGEN 2.0 Manual. Department of Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC.
  • Luecht, R. M., & Miller, T. R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16(3), 279-293. doi:10.1177/014662169201600308
  • Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Luo, X. (2013). The optimal design of the dual-purpose test. (Unpublished doctoral dissertation). University of North Carolina Greensboro, North Carolina, USA.
  • Mulder, J., & van der Linden, W. J. (2010). Multidimensional adaptive testing with Kullback-Leibler information item selection. In W.J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.77-101). New York: Springer.
  • Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. doi:10.1007/s11336-008-9097-5
  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354. doi:10.1007/BF02294343
  • Segall, D.O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 57–75). New York, NY: Springer.
  • Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75, 954-978. doi:10.1177/0013164415575147
  • Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5), 346-360. doi:10.1177/0146621616639305
  • Tam, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. (Unpublished doctoral dissertation). Columbia University.
  • Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (Second Edition). Hillsdale, NJ: Lawrence Erlbaum Associates, 101-133.
  • van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer-Verlag.
  • van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp.3-30). New York: Springer.
  • van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412. doi:10.3102/10769986024004398
  • Veerkamp, W.J.J. & Berger, M.P.F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi:10.3102/10769986022002203
  • Veldkamp, B.P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. doi:10.1007/BF02295132
  • Veldkamp, B.P., & van der Linden, W.J. (2010). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 231-245). New York: Springer.
  • Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76(3), 363-384. doi:10.1007/s11336-011-9215-7
  • Wang, C., Chang, H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 (1), 13-39. doi:10.1007/s11336-010-9186-0
  • Wang, C., Chang, H.-H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122. doi:10.1177/0146621612463422
  • Wang, W. C., & Chen, P. H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316. doi:10.1177/0146621604265938
  • Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136. doi:10.1037/1082-989X.9.1.116
  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. doi:10.1111/j.1745-3984.1984.tb01040.x
  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
  • Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339-360. doi:10.1111/j.1745-3984.2010.00117.x
  • Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software]. Monterey: Defense Manpower Data Center.
  • Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495-523. doi:10.1007/s11336-012-9265-5
  • Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23. doi:10.1177/0146621612455687
  • Yao, L. (2014). Multidimensional CAT item selection procedures with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38. doi:10.1111/jedm.12032
  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105. doi:10.1177/0146621606291559
  • Yao, L., & Boughton, K. A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197. doi:10.1111/j.1745-3984.2009.00076.x
  • Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469–492. doi:10.1177/0146621605284537
  • Yao, L., Pommerich, M., & Segall, D. O. (2014). Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test. Applied Psychological Measurement, 38(8), 614-631. doi:10.1177/0146621614541514
  • Zhang, J. (1996). Some fundamental issues in item response theory with applications. (Unpublished doctoral dissertation), University of Illinois, Urbana-Champaign.
Subjects Social
Journal Section Articles
Authors

Author: Eren Halil ÖZBERK
Institution: HACETTEPE UNIV
Country: Turkey


Author: Selahattin GELBAL
Institution: HACETTEPE UNIV

Dates

Publication Date : April 3, 2017

Bibtex @research article { epod286956, journal = {Journal of Measurement and Evaluation in Education and Psychology}, issn = {1309-6575}, eissn = {1309-6575}, address = {}, publisher = {Eğitimde ve Psikolojide Ölçme ve Değerlendirme Derneği}, year = {2017}, volume = {8}, pages = {34 - 46}, doi = {10.21031/epod.286956}, title = {A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs}, key = {cite}, author = {ÖZBERK, Eren Halil and GELBAL, Selahattin} }
APA ÖZBERK, E , GELBAL, S . (2017). A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology , 8 (1) , 34-46 . DOI: 10.21031/epod.286956
MLA ÖZBERK, E , GELBAL, S . "A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs". Journal of Measurement and Evaluation in Education and Psychology 8 (2017 ): 34-46 <https://dergipark.org.tr/en/pub/epod/issue/28110/286956>
Chicago ÖZBERK, E , GELBAL, S . "A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs". Journal of Measurement and Evaluation in Education and Psychology 8 (2017 ): 34-46
RIS TY - JOUR T1 - A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs AU - Eren Halil ÖZBERK , Selahattin GELBAL Y1 - 2017 PY - 2017 N1 - doi: 10.21031/epod.286956 DO - 10.21031/epod.286956 T2 - Journal of Measurement and Evaluation in Education and Psychology JF - Journal JO - JOR SP - 34 EP - 46 VL - 8 IS - 1 SN - 1309-6575-1309-6575 M3 - doi: 10.21031/epod.286956 UR - https://doi.org/10.21031/epod.286956 Y2 - 2017 ER -
EndNote %0 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs %A Eren Halil ÖZBERK , Selahattin GELBAL %T A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs %D 2017 %J Journal of Measurement and Evaluation in Education and Psychology %P 1309-6575-1309-6575 %V 8 %N 1 %R doi: 10.21031/epod.286956 %U 10.21031/epod.286956
ISNAD ÖZBERK, Eren Halil , GELBAL, Selahattin . "A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs". Journal of Measurement and Evaluation in Education and Psychology 8 / 1 (April 2017): 34-46 . https://doi.org/10.21031/epod.286956
AMA ÖZBERK E , GELBAL S . A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology. 2017; 8(1): 34-46.
Vancouver ÖZBERK E , GELBAL S . A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs. Journal of Measurement and Evaluation in Education and Psychology. 2017; 8(1): 46-34.