Year 2020, Volume 7 , Issue 3, Pages 323 - 342 2020-09-15

Development of a Multidimensional Computerized Adaptive Test based on the Bifactor Model

Murat Doğan ŞAHİN [1] , Selahattin GELBAL [2]


The purpose of this study was to conduct a real-time multidimensional computerized adaptive test (MCAT) using data from a previous paper-pencil test (PPT) regarding the grammar and vocabulary dimensions of an end-of-term proficiency exam conducted on students in a preparatory class at a university. An item pool was established through four separate 50-item sets applied in four different semesters. The fit between unidimensional, multi-unidimensional and bifactor IRT models was compared during item calibration, with the bifactor model providing the best fit for all data sets. This was followed by a hybrid simulation for 36 conditions obtained using six item selection methods, two ability estimation methods and three termination rules. The statistics and graphs obtained indicate D-rule item selection, maximum a posteriori (MAP) ability estimation and standard error termination rule as the best algorithm for the real-time MCAT application. With the minimum number of items to be administered determined as 10, the real-time application conducted on 99 examinees yielded an average number of items of 13.4. The PPT format proficiency exam consists of 50 items, leading to the conclusion that the examinees participating in the real-time MCAT are administered an average of 74.4% fewer items than the PPT. Additionally, 86 of the examinees answered between 10-13 items. The item pool use rate is 30%. Lastly, the correlation between the PPT scores and general trait scores of 32 examinees was calculated as .77.
Multidimensional Computerized Adaptive Testing, Bifactor Model, Real-time application, Hybrid Simulation
  • Akyıldız, M. & Şahin, M. D. (2017). Açıköğretimde kullanılan sınavlardan Klasik Test Kuramına ve Madde Tepki Kuramına göre elde edilen yetenek ölçülerinin karşılaştırılması. AUAd, 3(4), 141-159.
  • Bulut, O & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eurasian Journal of Educational Research, 49, 61-80.
  • Bock, R. D. & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algortihm. Pschometrika, 46(4), 443-459.
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
  • Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1-38.
  • Chang, W. (2019). Shiny: Web application framework for R. Version 1.3.2
  • Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71, 37-53.
  • DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement. 43(2), 145–168.
  • Eggen, T. (2007). Choices in CAT models in the context of educational testing. Paper presented at the CAT Models and Monitoring Paper Session, June 7, 2007 (Retrieved November 11, 2016, from http://publicdocs.iacat.org/cat2010/cat07eggen.pdf).
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
  • Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-374.
  • Ferrando, P. & Chico, E. (2007). The external validity of scores based on the twoparameter logistic model: Some comparisons between IRT and CTT. Psicológica, 28, 237-257.
  • Frey, A. & Nicki-Nils, S. (2009). Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges. Studies in Educational Evaluation, 35. 89-94
  • Frey A, Seitz N-N and Brandt S (2016) Testlet-Based Multidimensional Adaptive Testing. Front. Psychol., 7, 1758.
  • Gelbal, S. (1994). P madde güçlük indeksi ile Rasch modelinin b parametresi ve bunlara dayalı yetenek ölçüleri üzerine bir karşılaştırma. Unpublished doctoral dissertation. Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü, Ankara.
  • Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive diagnosis and testing of mental health disorders. Annual Review of Clinical Psychology, 12, 83-104.
  • Gibbons, R. D., Weiss, D. J., Kupfer, D. J., Frank, E., Fagiolini, A., Grochocinski, V. J., & Immekus, J. C. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services, 59(4), 49-58.
  • Gustafsson, J., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407-434.
  • Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer Academic Publishers.
  • Huang, H., Chen, P & Wang, W. (2012). Computerized adaptive testing using a class of high-order item response theory. Applied Psychological Measurement, 36(8), 689-706.
  • Huebner, A. R., Wang, C., Quinlan, K. & Seuber, L. (2016). Item exposure control for multidimensional computer adaptive testing under maximum likelihood and expected a posteriori estimation. Behav. Res., 48, 1443-1453
  • Jabrayilov, R., Emons, W. H. M. & Sijtsma, K. (2016). Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment. Applied Psychological Measurement, 40(8) 559-572.
  • Kalender, I., & Berberoglu, G. (2017). Can computerized adaptive testing work in students’ admission to higher education programs in Turkey? Educational Sciences: Theory & Practice, 17, 573-596.
  • Kelecioğlu, H. (2001). Örtük özellikler teorisindeki b ve a parametreleri ile klasik test teorisindeki p ve r istatistikleri arasındaki ilişki, Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 20, 104-110.
  • Lawson, S. (1991). One parameter latent trait measurement: Do the results justify the effort? In B. Thompson (Ed.), Advances in educational research: Substantive findings, methodological developments. Greenwich, CT: JAI.
  • Li, Y. H., & Schafer, W. D. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29, 3-25.
  • Lin, C. & Chang, H. (2018). Item Selection Criteria with Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing. Educational and Psychological Measurement, 79(2), 335-357.
  • Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20(4), 389-404.
  • Ndalichako, J. L., & Rogers, W. T. (1997). Comparison of finite state score theory, classical test theory, and item response theory in scoring multiple-choice items. Educational and Psychological Measurement, 57, 580-589.
  • Nieto, M. D., Abad, F. J., & Olea, J. (2018). Assessing the Big Five with bifactor computerized adaptive testing. Psychological Assessment, 30(12), 1678-1690.
  • Nydick, S. & Weiss, D. J. (2009). A hybrid simulation procedure for developments of CATs. Paper presented at the Item Pool Development Paper session at the 2009 GMAC Conference on Computerized Adaptive Testing.
  • Progar, S. & Sočan, G. (2008). An empirical comparison of Item Response Theory and Classical Test Theory. Horizons of Psychology, 17(3), 5-24.
  • Reckase, M., D. (2009). Multidimensional item response theory: Statistics for social and behavioral sciences. New York, NY: Springer.
  • Reise, S. P. (2012). The rediscovery of bifactor measurement models, Multivariate Behavioral Research, 47(5), 667-696.
  • Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19-31.
  • Sarkar, D. (2016). Lattice: Multivariate Data Visualization with R. Springer.
  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
  • Segall, D. O. (2005). Computerized adaptive testing. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement. New York: Academic Press.
  • Seo, D. G. (2011). Application of the bifactor model to computerized adaptive testing. Unpublished Doctoral Disertation. University of Minnesota.
  • Seo, D. G. & Weiss, D. J. (2015). Best Design for Multidimensional Adaptive Testing with the Bifactor Model. Educational and Psychological Measurement, 75(6), 954-978.
  • Su, Y. (2016). A comparison of constrained item selection methods in multidimensional computerized adaptive testing. Applied Psychological Measurement, 40(5) 346-360.
  • Sunderland, M., Batterham, P. Carragher, N., Calear, A. & Slade, T. (2019). Developing and Validating a Computerized Adaptive Test to Measure Broad and Specific Factors of Internalizing in Community Sample. Assessment, 26(6) 1030-1045.
  • Şahin, M. D. (2017). Examining the Results of Multidimensional Computerized Adaptive Testing Applications in Real and Generated Data Sets [Unpublished doctoral dissertation]. Hacettepe University, Graduate School of Educational Sciences, Ankara.
  • Thompson, N. A. & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16(1), 1-9.
  • van der Linden, W. J. (2016). Handbook of Item Resonse Theory. Boca Raton: CRC Press.
  • Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575-588.
  • Wang, C., Chang, H. & Boughton, K. A. (2012). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37(2), 99-122.
  • Wainer, H. W., Dorans, N. J., Flaugher, R., Green, B. F., Mislevy, R. J., Steinberg, L. & Thissen, D. (1990). Computerized adaptive testing: A primer. Hillsdale, NJ: Erlbaum.
  • Wainer, H. (2000). Rescuing computerized testing by breaking Zipf's law. Journal of Educational and Behavioral Statistics, 25, 203-224.
  • Ware J. E., Kosinski, M., Bjorner, J. B., Bayliss, M. S., Batenhorst, A., Dahlo, C. G. H., Tepper, S. & Dowson, A. (2003). Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality of Life Research, 12, 935-952.
  • Wei, H., & Lin, J. (2015). Using out-of-level items in computerized adaptive testing. International Journal of Testing, 15(1), 50-70.
  • Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting and Clinical Psychology, 53, 774-789.
  • Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84.
  • Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2(1), 1-27.
  • Weiss, D. J. & Kingsbury, G. G. (1984). Application of Computerized Adaptive Testing to Educational Problems. Journal of Educational Measuremen, 21(4), 361-375.
  • Weiss, D. J., & Gibbons, R. D. (2007). Computerized adaptive testing with the bifactor model. Paper presented at the New CAT Models session at the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved October 12, 2016, from http://publicdocs.iacat.org/cat2010/cat07weiss&gibbons.pdf
  • Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and Applications. Psychometrika, 77, 495-523.
  • Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23.
  • Yao, L. (2014). Multidimensional CAT item selection methods for domain scores and composite scores with item exposure control and content constraints. Journal of Educational Measurement, 51, 18-38.
  • Yao, L., Pommerich, M & Segall, D. O. (2014). Using multidimensional CAT to administer a short, yet price, screening test. Applied Psychological Measurement, 38(8) 614-631.
  • Zheng, Y., Chang, C. H., & Chang, H. H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care & Rehabilitation, 22, 491-499.
Primary Language en
Subjects Education, Scientific Disciplines
Published Date September
Journal Section Articles
Authors

Orcid: 0000-0002-2174-8443
Author: Murat Doğan ŞAHİN (Primary Author)
Institution: ANADOLU ÜNİVERSİTESİ, EĞİTİM FAKÜLTESİ
Country: Turkey


Orcid: 0000-0001-5181-7262
Author: Selahattin GELBAL
Institution: HACETTEPE ÜNİVERSİTESİ
Country: Turkey


Dates

Publication Date : September 15, 2020

APA Şahi̇n, M , Gelbal, S . (2020). Development of a Multidimensional Computerized Adaptive Test based on the Bifactor Model . International Journal of Assessment Tools in Education , 7 (3) , 323-342 . DOI: 10.21449/ijate.707199