Research Article
BibTex RIS Cite

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

Year 2024, Volume: 11 Issue: 2, 213 - 248, 20.06.2024
https://doi.org/10.21449/ijate.1290831

Abstract

This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item parameters estimated according to the maximum likelihood and Bayesian method and the differences in the RMSE of these parameters were examined. The priori distribution (normal, left-skewed, right-skewed, leptokurtic, and platykurtic), test length (10, 20, 40), sample size (100, 500, 1000), logistics model (2PL, 3PL). The simulation conditions were performed with 100 replications. Mixed model ANOVA was performed to determine RMSE differentiations. The prior distribution type, test length, and estimation method in the differentiation of ability parameter and RMSE were estimated in 2PL models; the priori distribution type and test length were significant in the differences in the ability parameter and RMSE estimated in the 3PL model. While prior distribution type, sample size, and estimation method created a significant difference in the RMSE of the item discrimination parameter estimated in the 2PL model, none of the conditions created a significant difference in the RMSE of the item difficulty parameter. The priori distribution type, sample size, and estimation method in the item discrimination RMSE were estimated in the 3PL model; the a priori distribution and estimation method created significant differentiation in the RMSE of the lower asymptote parameter. However, none of the conditions significantly changed the RMSE of item difficulty parameters.

Ethical Statement

Ankara University, 04.11.2019, 13-339.

References

  • Akour, M., & Al-Omari, H. (2013). Empirical investigation of the stability of IRT item-parameters estimation. International Online Journal of Educational Sciences, 5(2), 291-301.
  • Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Unpublished doctoral dissertation, Florida State University]. http://purl.flvc.org/fsu/fd/FSU_migr_etd-0248
  • Baker, F.B. (2001). The basics of item response theory (2nd ed.). College Park, (MD): ERIC Clearinghouse on Assessment and Evaluation.
  • Baker, F.B., & Kim, S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Marcel Dekker.
  • Barış-Pekmezci, F. & Şengül-Avşar, A. (2021). A guide for more accurate and precise estimations in simulative unidimensional IRT models. International Journal of Assessment Tools in Education, 8(2), 423-453. https://doi.org/10.21449/ijate.790289
  • Bilir, M.K. (2009). Mixture item response theory-mimic model: Simultaneous estimation of differential item functioning for manifest groups and latent classes [Unpublished doctoral dissertation, Florida State University]. http://diginole.lib.fsu.edu/islandora/object/fsu:182011/datastream/PDF/view
  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431 444. https://doi.org/10.1177/014662168200600405
  • Bulmer, M.G. (1979). Principles of statistics. Dover Publications.
  • Bulut, O. & Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları [Monte carlo simulation studies in item response theory with the R programming language]. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1 29. https://doi.org/10.18637/jss.v048.i06
  • Chuah, S.C., Drasgow F., & Luecht, R. (2006). How big is big enough? Sample size requirements for cast item parameter estimation. Applied Measurement in Education, 19(3), 241-255. https://doi.org/10.1207/s15324818ame1903_5
  • Clarke, E. (2022, December 22). ggbeeswarm: Categorical scatter (violin point) plots. https://cran.r-project.org/web/packages/ggbeeswarm/index.html
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates, Publishers.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group.
  • Çelikten, S. & Çakan, M. (2019). Bayesian ve nonBayesian kestirim yöntemlerine dayalı olarak sınıflama indekslerinin TIMSS-2015 matematik testi üzerinde incelenmesi [Investigation of classification indices on Timss-2015 mathematic-subtest through bayesian and nonbayesian estimation methods]. Necatibey Faculty of Education Electronic Journal of Science and Mathematics Education, 13(1), 105 124. https://doi.org/10.17522/balikesirnef.566446
  • De Ayala, R.J. (2009). The theory and practice of item response theory. The Guilford Press.
  • DeMars, C. (2010). Item response theory: understanding statistics measurement. Oxford University Press.
  • Demir, E. (2019). R Diliyle İstatistik Uygulamaları [Statistics Applications with R Language]. Pegem Akademi.
  • Feinberg, R.A., & Rubright, J.D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35, 36 49. https://doi.org/10.1111/emip.12111
  • Finch, H., & Edwards, J.M. (2016). Rasch model parameter estimation in the presence of a non-normal latent trait using a nonparametric Bayesian approach. Educational and Psychological Measurement, 76(4), 662 684. https://doi.org/10.1177/0013164415608418
  • Fraenkel, J.R., & Wallen, E. (2009). How to design and evaluate research in education. McGraw-Hills Companies.
  • Gao, F., & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380. https://psycnet.apa.org/doi/10.1207/s15324818ame1804_2
  • Goldman, S.H., & Raju, N.S. (1986). Recovery of one- and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21. https://doi.org/10.1177/0013164486461002
  • Hambleton, R.K. (1989). Principles and selected applications of item response theory. In R.L. Linn (Ed.), Educational Measurement, (pp.147-200). American Council of Education.
  • Hambleton, R.K., & Cook, L.L. (1983). Robustness of ítem response models and effects of test length and sample size on the precision of ability estimates. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). Vancouver.
  • Hambleton, R.K., & Jones, R.W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38 47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principals and applications. Kluwer Academic Publishers.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage Publications Inc.
  • Harwell, M., & Janosky, J. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement. 15, 279-291. https://doi.org/10.1177/014662169101500308
  • Harwell, M., Stone, C.A., Hsu, T.C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101 125. https://doi.org/10.1177/014662169602000201
  • Hoaglin, D.C., & Andrews, D.F. (1975). The reporting of computation-based results in statistics. The American Statistician, 29, 122-126. https://doi.org/10.2307/2683438
  • Hulin, C.L., Lissak, R.I., & Drasgow, F. (1982). Recovery of two and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6, 249-260. https://psycnet.apa.org/doi/10.1177/014662168200600301
  • Karadavut, T. (2019). The uniform prior for bayesian estimation of ability in item response theory models. International Journal of Assessment Tools in Education, 6(4), 568-579. https://dx.doi.org/10.21449/ijate.581314
  • Kıbrıslıoğlu Uysal, N. (2020). Parametrik ve Parametrik Olmayan Madde Tepki Modellerinin Kestirim Hatalarının Karşılaştırılması [Comparison of estimation errors in parametric and nonparametric item response theory models] [Unpublished doctoral dissertation, Hacettepe University]. http://hdl.handle.net/11655/22495
  • Kirisci, L., Hsu, T.C., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25(2), 146-162. https://doi.org/10.1177/01466210122031975
  • Kolen, M.J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9(2), 209-223. https://doi.org/10.1177/014662168500900209
  • Kothari, C.R. (2004). Research methodology: methods and techniques (2nd ed.). New Age International Publishers.
  • Köse, İ.A. (2010). Madde Tepki Kuramına Dayalı Tek Boyutlu ve Çok Boyutlu Modellerin Test Uzunluğu ve Örneklem Büyüklüğü Açısından Karşılaştırılması [Comparison of Unidimensional and Multidimensional Models Based On Item Response Theory In Terms of Test Length and Sample Size] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences.
  • Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. frontiers in Psychology, 4(863), 1-12. https://doi.org/10.3389/fpsyg.2013.00863
  • Lenth, R.V. (2022, December). emmeans: Estimated marginal means, aka Least-Squares Means. https://cran.r-project.org/web/packages/emmeans/index.html
  • Lim, H., & Wells, C.S. (2020). irtplay: An R package for online item calibration, scoring, evaluation of model fit, and useful functions for unidimensional IRT. Applied psychological measurement, 44(7 8), 563 565. https://doi.org/10.1177/0146621620921247
  • Linacre, J.M. (2008). A user’s guide to winsteps ministep: rasch-model computer programs. https://www.winsteps.com/winman/copyright.htm
  • Lord, F.M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020. https://doi.org/10.1177/001316446802800401
  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel forms reliability. Psychometrika, 48, 233 245. https://doi.org/10.1007/BF02294018
  • Martin, A.D., & Quinn, K.M. (2006). Applied Bayesian inference in R using MCMCpack. The Newsletter of the R Project, 6(1), 2-7.
  • Martinez, J. (2017, December 1). bairt: Bayesian analysis of item response theory models. http://cran.nexr.com/web/packages/bairt/index.html
  • Maydeu-Ovivares, A., & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrica, 71, 713 732. https://doi.org/10.1007/s11336-005-1295-9
  • Meyer, D. (2022, December 1). e1071: Misc functions of the department of statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071
  • Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195. https://doi.org/10.1007/BF02293979
  • MoNE (2022). Sınavla Öğrenci Alacak Ortaöğretim Kurumlarına İlişkin Merkezî Sınav Başvuru ve Uygulama Kılavuzu [Central Examination Application and Administration Guide for Secondary Education Schools to Admit Students by Examination]. Ankara: MoNE [MEB]. https://www.meb.gov.tr/2022-lgs-kapsamindaki-merkez-sinav-kilavuzu-yayimlandi/haber/25705/tr
  • Morris, T.P., White, I.R., & Crowther, M.J. (2017). Using simulation studies to evaluate statistical methods. Tutorial in Biostatistics, 38(11), 2074 2102. https://doi.org/10.1002/sim.8086
  • Orlando, M. (2004, June). Critical issues to address when applying item response theory models. Paper presented at the conference on improving health outcomes assessment, National Cancer institute, Bethesda, MD, USA.
  • Pekmezci Barış, F. (2018). İki Faktör Modelde (Bifactor) Diklik Varsayımının Farklı Koşullar Altında Sınanması [Investigation Of Orthogonality Assumption In Bifactor Model Under Different Conditions] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences, Ankara.
  • Ree, M.J., & Jensen, H.E. (1980). Effects of sample size on linear equating of item characteristic curve parameters. In D.J. Weiss (Ed.), Proceedings of the 1979 computerized adaptive testing conference. (pp. 218-228). Minneapolis: University of Minnesota. https://doi.org/10.1016/B978-0-12-742780-5.50017-2
  • Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133 144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x
  • Revelle, W. (2022, October). psych: Procedures for psychological, psychometric, and personality research. https://cran.r-project.org/web/packages/psych/index.html
  • Robitzsch, A. (2022). sirt: Supplementary item response theory models. https://cran.r-project.org/web/packages/sirt/index.html
  • Samejima, F. (1993a). An approximation for the bias function of the maximum likelihood estimate of a latent variable for the general case where the item responses are discrete. Psychometrika, 58, 119-138. https://doi.org/10.1007/BF02294476
  • Samejima, F. (1993b). The bias function of the maximum likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, 195 209. https://doi.org/10.1007/BF02294573
  • Sarkar, D. (2022, October). lattice: Trellis graphics for R. R package version 0.20-45, URL http://CRAN.R-project.org/package=lattice.
  • SAS Institute (2020). Introduction to Bayesian analysis procedures. In User’s Guide Introduction to Bayesian Analysis Procedures. (pp. 127-161). SAS Institute Inc., Cary, (NC), USA.
  • Sass, D., Schmitt, T., & Walker, C. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21(1), 65-88. https://doi.org/10.1080/08957340701796415
  • Seong, T.J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14(3), 299 311. https://psycnet.apa.org/doi/10.1177/014662169001400307
  • Singmann, H. (2022, December). afex: Analysis of factorial experiments. https://cran.r-project.org/web/packages/afex/afex.pdf
  • Soysal, S. (2017). Toplam Test Puanı ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması [Comparison of Estimation of Total Score and Subscores with Hierarchical Item Response Theory Models] [Unpublished doctoral dissertation]. Hacettepe University, Institute of Educational Sciences, Ankara.
  • Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1-16. https://doi.org/10.1177/014662169201600101
  • Swaminathan, H., & Gifford, J.A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601. https://doi.org/10.1007/BF02295598
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17, 321-335. http://dx.doi.org/10.12738/estp.2017.1.0270
  • Tabachnick, B.G., & Fidell, L.S. (2014). Using multivariate statistics (6th ed.). Pearson New International Edition.
  • Thissen, D., & Wainer, H. (1983). Some standard errors in item response theory. Psychometrika, 47, 397-412. https://doi.org/10.1007/BF02293705
  • Thorndike, L.R. (1982). Applied Psychometrics. Houghton Mifflin Co.
  • Van de Schoot, R., & Depaoli, S. (2014). Bayesian analyses: Where to start and what to report. The European Health Psychologist, 16(2), 75 84.
  • Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://doi.org/10.1007/978-0-387-98141-3
  • Wright, B.D., & Stone, M.H. (1979). Best test design. Mesa Press
  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125-145. https://doi.org/10.1177/014662168400800201

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

Year 2024, Volume: 11 Issue: 2, 213 - 248, 20.06.2024
https://doi.org/10.21449/ijate.1290831

Abstract

This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item parameters estimated according to the maximum likelihood and Bayesian method and the differences in the RMSE of these parameters were examined. The priori distribution (normal, left-skewed, right-skewed, leptokurtic, and platykurtic), test length (10, 20, 40), sample size (100, 500, 1000), logistics model (2PL, 3PL). The simulation conditions were performed with 100 replications. Mixed model ANOVA was performed to determine RMSE differentiations. The prior distribution type, test length, and estimation method in the differentiation of ability parameter and RMSE were estimated in 2PL models; the priori distribution type and test length were significant in the differences in the ability parameter and RMSE estimated in the 3PL model. While prior distribution type, sample size, and estimation method created a significant difference in the RMSE of the item discrimination parameter estimated in the 2PL model, none of the conditions created a significant difference in the RMSE of the item difficulty parameter. The priori distribution type, sample size, and estimation method in the item discrimination RMSE were estimated in the 3PL model; the a priori distribution and estimation method created significant differentiation in the RMSE of the lower asymptote parameter. However, none of the conditions significantly changed the RMSE of item difficulty parameters.

References

  • Akour, M., & Al-Omari, H. (2013). Empirical investigation of the stability of IRT item-parameters estimation. International Online Journal of Educational Sciences, 5(2), 291-301.
  • Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Unpublished doctoral dissertation, Florida State University]. http://purl.flvc.org/fsu/fd/FSU_migr_etd-0248
  • Baker, F.B. (2001). The basics of item response theory (2nd ed.). College Park, (MD): ERIC Clearinghouse on Assessment and Evaluation.
  • Baker, F.B., & Kim, S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Marcel Dekker.
  • Barış-Pekmezci, F. & Şengül-Avşar, A. (2021). A guide for more accurate and precise estimations in simulative unidimensional IRT models. International Journal of Assessment Tools in Education, 8(2), 423-453. https://doi.org/10.21449/ijate.790289
  • Bilir, M.K. (2009). Mixture item response theory-mimic model: Simultaneous estimation of differential item functioning for manifest groups and latent classes [Unpublished doctoral dissertation, Florida State University]. http://diginole.lib.fsu.edu/islandora/object/fsu:182011/datastream/PDF/view
  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431 444. https://doi.org/10.1177/014662168200600405
  • Bulmer, M.G. (1979). Principles of statistics. Dover Publications.
  • Bulut, O. & Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları [Monte carlo simulation studies in item response theory with the R programming language]. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1 29. https://doi.org/10.18637/jss.v048.i06
  • Chuah, S.C., Drasgow F., & Luecht, R. (2006). How big is big enough? Sample size requirements for cast item parameter estimation. Applied Measurement in Education, 19(3), 241-255. https://doi.org/10.1207/s15324818ame1903_5
  • Clarke, E. (2022, December 22). ggbeeswarm: Categorical scatter (violin point) plots. https://cran.r-project.org/web/packages/ggbeeswarm/index.html
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates, Publishers.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group.
  • Çelikten, S. & Çakan, M. (2019). Bayesian ve nonBayesian kestirim yöntemlerine dayalı olarak sınıflama indekslerinin TIMSS-2015 matematik testi üzerinde incelenmesi [Investigation of classification indices on Timss-2015 mathematic-subtest through bayesian and nonbayesian estimation methods]. Necatibey Faculty of Education Electronic Journal of Science and Mathematics Education, 13(1), 105 124. https://doi.org/10.17522/balikesirnef.566446
  • De Ayala, R.J. (2009). The theory and practice of item response theory. The Guilford Press.
  • DeMars, C. (2010). Item response theory: understanding statistics measurement. Oxford University Press.
  • Demir, E. (2019). R Diliyle İstatistik Uygulamaları [Statistics Applications with R Language]. Pegem Akademi.
  • Feinberg, R.A., & Rubright, J.D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35, 36 49. https://doi.org/10.1111/emip.12111
  • Finch, H., & Edwards, J.M. (2016). Rasch model parameter estimation in the presence of a non-normal latent trait using a nonparametric Bayesian approach. Educational and Psychological Measurement, 76(4), 662 684. https://doi.org/10.1177/0013164415608418
  • Fraenkel, J.R., & Wallen, E. (2009). How to design and evaluate research in education. McGraw-Hills Companies.
  • Gao, F., & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380. https://psycnet.apa.org/doi/10.1207/s15324818ame1804_2
  • Goldman, S.H., & Raju, N.S. (1986). Recovery of one- and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21. https://doi.org/10.1177/0013164486461002
  • Hambleton, R.K. (1989). Principles and selected applications of item response theory. In R.L. Linn (Ed.), Educational Measurement, (pp.147-200). American Council of Education.
  • Hambleton, R.K., & Cook, L.L. (1983). Robustness of ítem response models and effects of test length and sample size on the precision of ability estimates. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). Vancouver.
  • Hambleton, R.K., & Jones, R.W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38 47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principals and applications. Kluwer Academic Publishers.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage Publications Inc.
  • Harwell, M., & Janosky, J. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement. 15, 279-291. https://doi.org/10.1177/014662169101500308
  • Harwell, M., Stone, C.A., Hsu, T.C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101 125. https://doi.org/10.1177/014662169602000201
  • Hoaglin, D.C., & Andrews, D.F. (1975). The reporting of computation-based results in statistics. The American Statistician, 29, 122-126. https://doi.org/10.2307/2683438
  • Hulin, C.L., Lissak, R.I., & Drasgow, F. (1982). Recovery of two and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6, 249-260. https://psycnet.apa.org/doi/10.1177/014662168200600301
  • Karadavut, T. (2019). The uniform prior for bayesian estimation of ability in item response theory models. International Journal of Assessment Tools in Education, 6(4), 568-579. https://dx.doi.org/10.21449/ijate.581314
  • Kıbrıslıoğlu Uysal, N. (2020). Parametrik ve Parametrik Olmayan Madde Tepki Modellerinin Kestirim Hatalarının Karşılaştırılması [Comparison of estimation errors in parametric and nonparametric item response theory models] [Unpublished doctoral dissertation, Hacettepe University]. http://hdl.handle.net/11655/22495
  • Kirisci, L., Hsu, T.C., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25(2), 146-162. https://doi.org/10.1177/01466210122031975
  • Kolen, M.J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9(2), 209-223. https://doi.org/10.1177/014662168500900209
  • Kothari, C.R. (2004). Research methodology: methods and techniques (2nd ed.). New Age International Publishers.
  • Köse, İ.A. (2010). Madde Tepki Kuramına Dayalı Tek Boyutlu ve Çok Boyutlu Modellerin Test Uzunluğu ve Örneklem Büyüklüğü Açısından Karşılaştırılması [Comparison of Unidimensional and Multidimensional Models Based On Item Response Theory In Terms of Test Length and Sample Size] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences.
  • Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. frontiers in Psychology, 4(863), 1-12. https://doi.org/10.3389/fpsyg.2013.00863
  • Lenth, R.V. (2022, December). emmeans: Estimated marginal means, aka Least-Squares Means. https://cran.r-project.org/web/packages/emmeans/index.html
  • Lim, H., & Wells, C.S. (2020). irtplay: An R package for online item calibration, scoring, evaluation of model fit, and useful functions for unidimensional IRT. Applied psychological measurement, 44(7 8), 563 565. https://doi.org/10.1177/0146621620921247
  • Linacre, J.M. (2008). A user’s guide to winsteps ministep: rasch-model computer programs. https://www.winsteps.com/winman/copyright.htm
  • Lord, F.M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020. https://doi.org/10.1177/001316446802800401
  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel forms reliability. Psychometrika, 48, 233 245. https://doi.org/10.1007/BF02294018
  • Martin, A.D., & Quinn, K.M. (2006). Applied Bayesian inference in R using MCMCpack. The Newsletter of the R Project, 6(1), 2-7.
  • Martinez, J. (2017, December 1). bairt: Bayesian analysis of item response theory models. http://cran.nexr.com/web/packages/bairt/index.html
  • Maydeu-Ovivares, A., & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrica, 71, 713 732. https://doi.org/10.1007/s11336-005-1295-9
  • Meyer, D. (2022, December 1). e1071: Misc functions of the department of statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071
  • Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195. https://doi.org/10.1007/BF02293979
  • MoNE (2022). Sınavla Öğrenci Alacak Ortaöğretim Kurumlarına İlişkin Merkezî Sınav Başvuru ve Uygulama Kılavuzu [Central Examination Application and Administration Guide for Secondary Education Schools to Admit Students by Examination]. Ankara: MoNE [MEB]. https://www.meb.gov.tr/2022-lgs-kapsamindaki-merkez-sinav-kilavuzu-yayimlandi/haber/25705/tr
  • Morris, T.P., White, I.R., & Crowther, M.J. (2017). Using simulation studies to evaluate statistical methods. Tutorial in Biostatistics, 38(11), 2074 2102. https://doi.org/10.1002/sim.8086
  • Orlando, M. (2004, June). Critical issues to address when applying item response theory models. Paper presented at the conference on improving health outcomes assessment, National Cancer institute, Bethesda, MD, USA.
  • Pekmezci Barış, F. (2018). İki Faktör Modelde (Bifactor) Diklik Varsayımının Farklı Koşullar Altında Sınanması [Investigation Of Orthogonality Assumption In Bifactor Model Under Different Conditions] [Unpublished doctoral dissertation]. Ankara University, Institute of Educational Sciences, Ankara.
  • Ree, M.J., & Jensen, H.E. (1980). Effects of sample size on linear equating of item characteristic curve parameters. In D.J. Weiss (Ed.), Proceedings of the 1979 computerized adaptive testing conference. (pp. 218-228). Minneapolis: University of Minnesota. https://doi.org/10.1016/B978-0-12-742780-5.50017-2
  • Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133 144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x
  • Revelle, W. (2022, October). psych: Procedures for psychological, psychometric, and personality research. https://cran.r-project.org/web/packages/psych/index.html
  • Robitzsch, A. (2022). sirt: Supplementary item response theory models. https://cran.r-project.org/web/packages/sirt/index.html
  • Samejima, F. (1993a). An approximation for the bias function of the maximum likelihood estimate of a latent variable for the general case where the item responses are discrete. Psychometrika, 58, 119-138. https://doi.org/10.1007/BF02294476
  • Samejima, F. (1993b). The bias function of the maximum likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, 195 209. https://doi.org/10.1007/BF02294573
  • Sarkar, D. (2022, October). lattice: Trellis graphics for R. R package version 0.20-45, URL http://CRAN.R-project.org/package=lattice.
  • SAS Institute (2020). Introduction to Bayesian analysis procedures. In User’s Guide Introduction to Bayesian Analysis Procedures. (pp. 127-161). SAS Institute Inc., Cary, (NC), USA.
  • Sass, D., Schmitt, T., & Walker, C. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21(1), 65-88. https://doi.org/10.1080/08957340701796415
  • Seong, T.J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14(3), 299 311. https://psycnet.apa.org/doi/10.1177/014662169001400307
  • Singmann, H. (2022, December). afex: Analysis of factorial experiments. https://cran.r-project.org/web/packages/afex/afex.pdf
  • Soysal, S. (2017). Toplam Test Puanı ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması [Comparison of Estimation of Total Score and Subscores with Hierarchical Item Response Theory Models] [Unpublished doctoral dissertation]. Hacettepe University, Institute of Educational Sciences, Ankara.
  • Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1-16. https://doi.org/10.1177/014662169201600101
  • Swaminathan, H., & Gifford, J.A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601. https://doi.org/10.1007/BF02295598
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17, 321-335. http://dx.doi.org/10.12738/estp.2017.1.0270
  • Tabachnick, B.G., & Fidell, L.S. (2014). Using multivariate statistics (6th ed.). Pearson New International Edition.
  • Thissen, D., & Wainer, H. (1983). Some standard errors in item response theory. Psychometrika, 47, 397-412. https://doi.org/10.1007/BF02293705
  • Thorndike, L.R. (1982). Applied Psychometrics. Houghton Mifflin Co.
  • Van de Schoot, R., & Depaoli, S. (2014). Bayesian analyses: Where to start and what to report. The European Health Psychologist, 16(2), 75 84.
  • Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://doi.org/10.1007/978-0-387-98141-3
  • Wright, B.D., & Stone, M.H. (1979). Best test design. Mesa Press
  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125-145. https://doi.org/10.1177/014662168400800201
There are 76 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Articles
Authors

Eray Selçuk 0000-0003-4033-4219

Ergül Demir 0000-0002-3708-8013

Early Pub Date May 22, 2024
Publication Date June 20, 2024
Submission Date May 1, 2023
Published in Issue Year 2024 Volume: 11 Issue: 2

Cite

APA Selçuk, E., & Demir, E. (2024). Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods. International Journal of Assessment Tools in Education, 11(2), 213-248. https://doi.org/10.21449/ijate.1290831

23823             23825             23824