TY  - JOUR
T1  - Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods
TT  - Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods
AU  - Selçuk, Eray
AU  - Demir, Ergül
PY  - 2024
DA  - June
DO  - 10.21449/ijate.1290831
JF  - International Journal of Assessment Tools in Education
JO  - Int. J. Assess. Tools Educ.
PB  - İzzet KARA
WT  - DergiPark
SN  - 2148-7456
SP  - 213
EP  - 248
VL  - 11
IS  - 2
LA  - en
AB  - This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item parameters estimated according to the maximum likelihood and Bayesian method and the differences in the RMSE of these parameters were examined. The priori distribution (normal, left-skewed, right-skewed, leptokurtic, and platykurtic), test length (10, 20, 40), sample size (100, 500, 1000), logistics model (2PL, 3PL). The simulation conditions were performed with 100 replications. Mixed model ANOVA was performed to determine RMSE differentiations. The prior distribution type, test length, and estimation method in the differentiation of ability parameter and RMSE were estimated in 2PL models; the priori distribution type and test length were significant in the differences in the ability parameter and RMSE estimated in the 3PL model. While prior distribution type, sample size, and estimation method created a significant difference in the RMSE of the item discrimination parameter estimated in the 2PL model, none of the conditions created a significant difference in the RMSE of the item difficulty parameter. The priori distribution type, sample size, and estimation method in the item discrimination RMSE were estimated in the 3PL model; the a priori distribution and estimation method created significant differentiation in the RMSE of the lower asymptote parameter. However, none of the conditions significantly changed the RMSE of item difficulty parameters.
KW  - IRT parameter estimation
KW  - Maximum likelihood estimation
KW  - Bayesian estimation method
KW  - MCMC
KW  - RMSE
N2  - This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item parameters estimated according to the maximum likelihood and Bayesian method and the differences in the RMSE of these parameters were examined. The priori distribution (normal, left-skewed, right-skewed, leptokurtic, and platykurtic), test length (10, 20, 40), sample size (100, 500, 1000), logistics model (2PL, 3PL). The simulation conditions were performed with 100 replications. Mixed model ANOVA was performed to determine RMSE differentiations. The prior distribution type, test length, and estimation method in the differentiation of ability parameter and RMSE were estimated in 2PL models; the priori distribution type and test length were significant in the differences in the ability parameter and RMSE estimated in the 3PL model. While prior distribution type, sample size, and estimation method created a significant difference in the RMSE of the item discrimination parameter estimated in the 2PL model, none of the conditions created a significant difference in the RMSE of the item difficulty parameter. The priori distribution type, sample size, and estimation method in the item discrimination RMSE were estimated in the 3PL model; the a priori distribution and estimation method created significant differentiation in the RMSE of the lower asymptote parameter. However, none of the conditions significantly changed the RMSE of item difficulty parameters.
CR  - Akour, M., &amp; Al-Omari, H. (2013). Empirical investigation of the stability of IRT item-parameters estimation. International Online Journal of Educational Sciences, 5(2), 291-301.
CR  - Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Unpublished doctoral dissertation, Florida State University]. http://purl.flvc.org/fsu/fd/FSU_migr_etd-0248
CR  - Baker, F.B. (2001). The basics of item response theory (2nd ed.). College Park, (MD): ERIC Clearinghouse on Assessment and Evaluation.
CR  - Baker, F.B., &amp; Kim, S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Marcel Dekker.
CR  - Barış-Pekmezci, F. &amp; Şengül-Avşar, A. (2021). A guide for more accurate and precise estimations in simulative unidimensional IRT models. International Journal of Assessment Tools in Education, 8(2), 423-453. https://doi.org/10.21449/ijate.790289
CR  - Bilir, M.K. (2009). Mixture item response theory-mimic model: Simultaneous estimation of differential item functioning for manifest groups and latent classes [Unpublished doctoral dissertation, Florida State University]. http://diginole.lib.fsu.edu/islandora/object/fsu:182011/datastream/PDF/view
CR  - Bock, R.D., &amp; Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431 444. https://doi.org/10.1177/014662168200600405
CR  - Bulmer, M.G. (1979). Principles of statistics. Dover Publications.
CR  - Bulut, O. &amp; Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları [Monte carlo simulation studies in item response theory with the R programming language]. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821
CR  - Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1 29. https://doi.org/10.18637/jss.v048.i06
CR  - Chuah, S.C., Drasgow F., &amp; Luecht, R. (2006). How big is big enough? Sample size requirements for cast item parameter estimation. Applied Measurement in Education, 19(3), 241-255. https://doi.org/10.1207/s15324818ame1903_5
CR  - Clarke, E. (2022, December 22). ggbeeswarm: Categorical scatter (violin point) plots. https://cran.r-project.org/web/packages/ggbeeswarm/index.html
CR  - Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates, Publishers.
CR  - Crocker, L., &amp; Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group.
CR  - Çelikten, S. &amp; Çakan, M. (2019). Bayesian ve nonBayesian kestirim yöntemlerine dayalı olarak sınıflama indekslerinin TIMSS-2015 matematik testi üzerinde incelenmesi [Investigation of classification indices on Timss-2015 mathematic-subtest through bayesian and nonbayesian estimation methods]. Necatibey Faculty of Education Electronic Journal of Science and Mathematics Education, 13(1), 105 124. https://doi.org/10.17522/balikesirnef.566446
CR  - De Ayala, R.J. (2009). The theory and practice of item response theory. The Guilford Press.
CR  - DeMars, C. (2010). Item response theory: understanding statistics measurement. Oxford University Press.
CR  - Demir, E. (2019). R Diliyle İstatistik Uygulamaları [Statistics Applications with R Language]. Pegem Akademi.
CR  - Feinberg, R.A., &amp; Rubright, J.D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35, 36 49. https://doi.org/10.1111/emip.12111
CR  - Finch, H., &amp; Edwards, J.M. (2016). Rasch model parameter estimation in the presence of a non-normal latent trait using a nonparametric Bayesian approach. Educational and Psychological Measurement, 76(4), 662 684. https://doi.org/10.1177/0013164415608418
CR  - Fraenkel, J.R., &amp; Wallen, E. (2009). How to design and evaluate research in education. McGraw-Hills Companies.
CR  - Gao, F., &amp; Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380. https://psycnet.apa.org/doi/10.1207/s15324818ame1804_2
CR  - Goldman, S.H., &amp; Raju, N.S. (1986). Recovery of one- and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21. https://doi.org/10.1177/0013164486461002
CR  - Hambleton, R.K. (1989). Principles and selected applications of item response theory.  In R.L. Linn (Ed.), Educational Measurement, (pp.147-200). American Council of Education.
CR  - Hambleton, R.K., &amp; Cook, L.L. (1983). Robustness of ítem response models and effects of test length and sample size on the precision of ability estimates. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). Vancouver.
CR  - Hambleton, R.K., &amp; Jones, R.W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38 47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
CR  - Hambleton, R.K., &amp; Swaminathan, H. (1985). Item response theory: Principals and applications. Kluwer Academic Publishers.
CR  - Hambleton, R.K., Swaminathan, H., &amp; Rogers, H.J. (1991). Fundamentals of item response theory. Sage Publications Inc.
CR  - Harwell, M., &amp; Janosky, J. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement. 15, 279-291. https://doi.org/10.1177/014662169101500308
CR  - Harwell, M., Stone, C.A., Hsu, T.C., &amp; Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101 125. https://doi.org/10.1177/014662169602000201
CR  - Hoaglin, D.C., &amp; Andrews, D.F. (1975). The reporting of computation-based results in statistics. The American Statistician, 29, 122-126. https://doi.org/10.2307/2683438
CR  - Hulin, C.L., Lissak, R.I., &amp; Drasgow, F. (1982). Recovery of two and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6, 249-260. https://psycnet.apa.org/doi/10.1177/014662168200600301
CR  - Karadavut, T. (2019). The uniform prior for bayesian estimation of ability in item response theory models. International Journal of Assessment Tools in Education, 6(4), 568-579. https://dx.doi.org/10.21449/ijate.581314
CR  - Kıbrıslıoğlu Uysal, N. (2020). Parametrik ve Parametrik Olmayan Madde Tepki Modellerinin Kestirim Hatalarının Karşılaştırılması [Comparison of estimation errors in parametric and nonparametric item response theory models] [Unpublished doctoral dissertation, Hacettepe University]. http://hdl.handle.net/11655/22495
CR  - Kirisci, L., Hsu, T.C., &amp; Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25(2), 146-162. https://doi.org/10.1177/01466210122031975
CR  - Kolen, M.J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9(2), 209-223. https://doi.org/10.1177/014662168500900209
CR  - Kothari, C.R. (2004). Research methodology: methods and techniques (2nd ed.). New Age International Publishers.
CR  - Köse, İ.A. (2010). Madde Tepki Kuramına Dayalı Tek Boyutlu ve Çok Boyutlu Modellerin Test Uzunluğu ve Örneklem Büyüklüğü Açısından Karşılaştırılması [Comparison of Unidimensional and Multidimensional Models Based On Item Response Theory In Terms of Test Length and Sample Size] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences.
CR  - Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. frontiers in Psychology, 4(863), 1-12. https://doi.org/10.3389/fpsyg.2013.00863
CR  - Lenth, R.V. (2022, December). emmeans: Estimated marginal means, aka Least-Squares Means. https://cran.r-project.org/web/packages/emmeans/index.html
CR  - Lim, H., &amp; Wells, C.S. (2020). irtplay: An R package for online item calibration, scoring, evaluation of model fit, and useful functions for unidimensional IRT. Applied psychological measurement, 44(7 8), 563 565. https://doi.org/10.1177/0146621620921247
CR  - Linacre, J.M. (2008). A user’s guide to winsteps ministep: rasch-model computer programs. https://www.winsteps.com/winman/copyright.htm
CR  - Lord, F.M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020. https://doi.org/10.1177/001316446802800401
CR  - Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
CR  - Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel forms reliability. Psychometrika, 48, 233 245. https://doi.org/10.1007/BF02294018
CR  - Martin, A.D., &amp; Quinn, K.M. (2006). Applied Bayesian inference in R using MCMCpack. The Newsletter of the R Project, 6(1), 2-7.
CR  - Martinez, J. (2017, December 1). bairt: Bayesian analysis of item response theory models. http://cran.nexr.com/web/packages/bairt/index.html
CR  - Maydeu-Ovivares, A., &amp; Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrica, 71, 713 732. https://doi.org/10.1007/s11336-005-1295-9
CR  - Meyer, D. (2022, December 1). e1071: Misc functions of the department of statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071
CR  - Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195. https://doi.org/10.1007/BF02293979
CR  - MoNE (2022). Sınavla Öğrenci Alacak Ortaöğretim Kurumlarına İlişkin Merkezî Sınav Başvuru ve Uygulama Kılavuzu [Central Examination Application and Administration Guide for Secondary Education Schools to Admit Students by Examination]. Ankara: MoNE [MEB]. https://www.meb.gov.tr/2022-lgs-kapsamindaki-merkez-sinav-kilavuzu-yayimlandi/haber/25705/tr
CR  - Morris, T.P., White, I.R., &amp; Crowther, M.J. (2017). Using simulation studies to evaluate statistical methods. Tutorial in Biostatistics, 38(11), 2074 2102. https://doi.org/10.1002/sim.8086
CR  - Orlando, M. (2004, June). Critical issues to address when applying item response theory models. Paper presented at the conference on improving health outcomes assessment, National Cancer institute, Bethesda, MD, USA.
CR  - Pekmezci Barış, F. (2018). İki Faktör Modelde (Bifactor) Diklik Varsayımının Farklı Koşullar Altında Sınanması [Investigation Of Orthogonality Assumption In Bifactor Model Under Different Conditions] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences, Ankara.
CR  - Ree, M.J., &amp; Jensen, H.E. (1980). Effects of sample size on linear equating of item characteristic curve parameters. In D.J. Weiss (Ed.), Proceedings of the 1979 computerized adaptive testing conference. (pp. 218-228). Minneapolis: University of Minnesota. https://doi.org/10.1016/B978-0-12-742780-5.50017-2
CR  - Reise, S.P., &amp; Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133 144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x
CR  - Revelle, W. (2022, October). psych: Procedures for psychological, psychometric, and personality research. https://cran.r-project.org/web/packages/psych/index.html
CR  - Robitzsch, A. (2022). sirt: Supplementary item response theory models. https://cran.r-project.org/web/packages/sirt/index.html
CR  - Samejima, F. (1993a). An approximation for the bias function of the maximum likelihood estimate of a latent variable for the general case where the item responses are discrete. Psychometrika, 58, 119-138. https://doi.org/10.1007/BF02294476
CR  - Samejima, F. (1993b). The bias function of the maximum likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, 195 209. https://doi.org/10.1007/BF02294573
CR  - Sarkar, D. (2022, October). lattice: Trellis graphics for R. R package version 0.20-45, URL http://CRAN.R-project.org/package=lattice.
CR  - SAS Institute (2020). Introduction to Bayesian analysis procedures. In User’s Guide Introduction to Bayesian Analysis Procedures. (pp. 127-161). SAS Institute Inc., Cary, (NC), USA.
CR  - Sass, D., Schmitt, T., &amp; Walker, C. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21(1), 65-88. https://doi.org/10.1080/08957340701796415
CR  - Seong, T.J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14(3), 299 311. https://psycnet.apa.org/doi/10.1177/014662169001400307
CR  - Singmann, H. (2022, December). afex: Analysis of factorial experiments. https://cran.r-project.org/web/packages/afex/afex.pdf
CR  - Soysal, S. (2017). Toplam Test Puanı ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması [Comparison of Estimation of Total Score and Subscores with Hierarchical Item Response Theory Models] [Unpublished doctoral dissertation]. Hacettepe University, Institute of Educational Sciences, Ankara.
CR  - Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1-16. https://doi.org/10.1177/014662169201600101
CR  - Swaminathan, H., &amp; Gifford, J.A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601. https://doi.org/10.1007/BF02295598
CR  - Şahin, A., &amp; Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory &amp; Practice, 17, 321-335. http://dx.doi.org/10.12738/estp.2017.1.0270
CR  - Tabachnick, B.G., &amp; Fidell, L.S. (2014). Using multivariate statistics (6th ed.). Pearson New International Edition.
CR  - Thissen, D., &amp; Wainer, H. (1983). Some standard errors in item response theory. Psychometrika, 47, 397-412. https://doi.org/10.1007/BF02293705
CR  - Thorndike, L.R. (1982). Applied Psychometrics. Houghton Mifflin Co.
CR  - Van de Schoot, R., &amp; Depaoli, S. (2014). Bayesian analyses: Where to start and what to report. The European Health Psychologist, 16(2), 75 84.
CR  - Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://doi.org/10.1007/978-0-387-98141-3
CR  - Wright, B.D., &amp; Stone, M.H. (1979). Best test design. Mesa Press
CR  - Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125-145. https://doi.org/10.1177/014662168400800201
UR  - https://doi.org/10.21449/ijate.1290831
L1  - https://dergipark.org.tr/en/download/article-file/3115535
ER  -