The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Tuğba Karadavut

doi:10.21449/ijate.581314

Research Article

The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Year 2019, Volume: 6 Issue: 4, 568 - 579, 05.01.2020

Tuğba Karadavut

https://doi.org/10.21449/ijate.581314

Cited By: 4

Abstract

Item Response Theory (IRT) models traditionally assume a normal distribution for ability. Although normality is often a reasonable assumption for ability, it is rarely met for observed scores in educational and psychological measurement. Assumptions regarding ability distribution were previously shown to have an effect on IRT parameter estimation. In this study, the normal and uniform distribution prior assumptions for ability were compared for IRT parameter estimation when the actual distribution was either normal or uniform. A simulation study that included a short test with a small sample size and a long test with a large sample size was conducted for this purpose. The results suggested using a uniform distribution prior for ability to achieve more accurate estimates of the ability parameter in the 2PL and 3PL models when the true distribution of ability is not known. For the Rasch model, an explicit pattern that could be used to obtain more accurate item parameter estimates was not found.

Keywords

Item response theory, Uniform ability, Bayesian estimation

References

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland. Retrieved from http://files.eric.ed.gov/fulltext/ED458219.pdf
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cook, D. L. (1959). A replication of Lord's study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81-87.
de Ayala, R.J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
de Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3-19.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58, 357-381.
Hambleton, R. K., & Cook, L. L. (1983). The robustness ofitem response models and the effects of test length and sample size on the precision of ability estimates. In D. Weiss (Ed.), New horizons in testing (pp. 31–49). NewYork: Academic Press.
Jackman, S. (2000). Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. American Journal of Political Science, 44, 375-404.
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146–162.
Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383-389.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (with contributions by A. Birnbaum). Reading, MA: Addison-Wesley.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28, 3049-3082.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Micerri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved January 10, 2017, from https://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielson and Lydiche (for Danmarks Paedagogiske Institut).
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.
Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21, 65-88.
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40, 98-113.
Seong, T. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299-311.
Stewart, J. (2012) Does IRT provide more sensitive measures of latent traits in statistical tests? An empirical examination. Shiken Research Bulletin, 16, 15-22.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-16.
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharay (Eds.), Psychometrics: Vol. 26. Handbook of statistics (pp. 683–718). Amsterdam: Elsevier.

The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Year 2019, Volume: 6 Issue: 4, 568 - 579, 05.01.2020

Tuğba Karadavut

https://doi.org/10.21449/ijate.581314

Cited By: 4

Abstract

Item
Response Theory (IRT) models traditionally assume a normal distribution for
ability. Although normality is often a reasonable assumption for ability, it is
rarely met for observed scores in educational and psychological measurement.
Assumptions regarding ability distribution were previously shown to have an
effect on IRT parameter estimation. In this study, the normal and uniform
distribution prior assumptions for ability were compared for IRT parameter
estimation when the actual distribution was either normal or uniform. A
simulation study that included a short test with a small sample size and a long
test with a large sample size was conducted for this purpose. The results
suggested using a uniform distribution prior for ability to achieve more accurate
estimates of the ability parameter in the 2PL and 3PL models when the true
distribution of ability is not known. For the Rasch model, an explicit pattern
that could be used to obtain more accurate item parameter estimates was not
found.

Keywords

Item response theory, Uniform ability, Bayesian estimation

References

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland. Retrieved from http://files.eric.ed.gov/fulltext/ED458219.pdf
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cook, D. L. (1959). A replication of Lord's study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81-87.
de Ayala, R.J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
de Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3-19.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58, 357-381.
Hambleton, R. K., & Cook, L. L. (1983). The robustness ofitem response models and the effects of test length and sample size on the precision of ability estimates. In D. Weiss (Ed.), New horizons in testing (pp. 31–49). NewYork: Academic Press.
Jackman, S. (2000). Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. American Journal of Political Science, 44, 375-404.
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146–162.
Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383-389.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (with contributions by A. Birnbaum). Reading, MA: Addison-Wesley.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28, 3049-3082.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Micerri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved January 10, 2017, from https://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielson and Lydiche (for Danmarks Paedagogiske Institut).
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.
Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21, 65-88.
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40, 98-113.
Seong, T. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299-311.
Stewart, J. (2012) Does IRT provide more sensitive measures of latent traits in statistical tests? An empirical examination. Shiken Research Bulletin, 16, 15-22.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-16.
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharay (Eds.), Psychometrics: Vol. 26. Handbook of statistics (pp. 683–718). Amsterdam: Elsevier.

There are 30 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Journal Section	Articles
Authors	Tuğba Karadavut
Publication Date	January 5, 2020
Submission Date	June 22, 2019
Published in Issue	Year 2019 Volume: 6 Issue: 4

Cite

APA	Karadavut, T. (2020). The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models. International Journal of Assessment Tools in Education, 6(4), 568-579. https://doi.org/10.21449/ijate.581314

Cited By

Decomposition of WAIC for assessing the information gain with application to educational testing

British Journal of Mathematical and Statistical Psychology

https://doi.org/10.1111/bmsp.12383

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1290831

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests

Applied Psychological Measurement

https://doi.org/10.1177/01466216231201986

Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

Psychometrika

https://doi.org/10.1007/s11336-022-09845-x

Article Files

Full Text

23823 23825 23824