The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Tuğba Karadavut

doi:10.21449/ijate.581314

Araştırma Makalesi

The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Yıl 2019, Cilt: 6 Sayı: 4, 568 - 579, 05.01.2020

Tuğba Karadavut

https://doi.org/10.21449/ijate.581314

Cited By: 3

Öz

Item Response Theory (IRT) models traditionally assume a normal distribution for ability. Although normality is often a reasonable assumption for ability, it is rarely met for observed scores in educational and psychological measurement. Assumptions regarding ability distribution were previously shown to have an effect on IRT parameter estimation. In this study, the normal and uniform distribution prior assumptions for ability were compared for IRT parameter estimation when the actual distribution was either normal or uniform. A simulation study that included a short test with a small sample size and a long test with a large sample size was conducted for this purpose. The results suggested using a uniform distribution prior for ability to achieve more accurate estimates of the ability parameter in the 2PL and 3PL models when the true distribution of ability is not known. For the Rasch model, an explicit pattern that could be used to obtain more accurate item parameter estimates was not found.

Anahtar Kelimeler

Item response theory, Uniform ability, Bayesian estimation

Kaynakça

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland. Retrieved from http://files.eric.ed.gov/fulltext/ED458219.pdf
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cook, D. L. (1959). A replication of Lord's study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81-87.
de Ayala, R.J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
de Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3-19.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58, 357-381.
Hambleton, R. K., & Cook, L. L. (1983). The robustness ofitem response models and the effects of test length and sample size on the precision of ability estimates. In D. Weiss (Ed.), New horizons in testing (pp. 31–49). NewYork: Academic Press.
Jackman, S. (2000). Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. American Journal of Political Science, 44, 375-404.
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146–162.
Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383-389.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (with contributions by A. Birnbaum). Reading, MA: Addison-Wesley.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28, 3049-3082.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Micerri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved January 10, 2017, from https://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielson and Lydiche (for Danmarks Paedagogiske Institut).
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.
Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21, 65-88.
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40, 98-113.
Seong, T. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299-311.
Stewart, J. (2012) Does IRT provide more sensitive measures of latent traits in statistical tests? An empirical examination. Shiken Research Bulletin, 16, 15-22.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-16.
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharay (Eds.), Psychometrics: Vol. 26. Handbook of statistics (pp. 683–718). Amsterdam: Elsevier.

The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models

Yıl 2019, Cilt: 6 Sayı: 4, 568 - 579, 05.01.2020

Tuğba Karadavut

https://doi.org/10.21449/ijate.581314

Cited By: 3

Öz

Item
Response Theory (IRT) models traditionally assume a normal distribution for
ability. Although normality is often a reasonable assumption for ability, it is
rarely met for observed scores in educational and psychological measurement.
Assumptions regarding ability distribution were previously shown to have an
effect on IRT parameter estimation. In this study, the normal and uniform
distribution prior assumptions for ability were compared for IRT parameter
estimation when the actual distribution was either normal or uniform. A
simulation study that included a short test with a small sample size and a long
test with a large sample size was conducted for this purpose. The results
suggested using a uniform distribution prior for ability to achieve more accurate
estimates of the ability parameter in the 2PL and 3PL models when the true
distribution of ability is not known. For the Rasch model, an explicit pattern
that could be used to obtain more accurate item parameter estimates was not
found.

Anahtar Kelimeler

Item response theory, Uniform ability, Bayesian estimation

Kaynakça

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland. Retrieved from http://files.eric.ed.gov/fulltext/ED458219.pdf
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cook, D. L. (1959). A replication of Lord's study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81-87.
de Ayala, R.J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
de Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3-19.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58, 357-381.
Hambleton, R. K., & Cook, L. L. (1983). The robustness ofitem response models and the effects of test length and sample size on the precision of ability estimates. In D. Weiss (Ed.), New horizons in testing (pp. 31–49). NewYork: Academic Press.
Jackman, S. (2000). Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. American Journal of Political Science, 44, 375-404.
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146–162.
Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383-389.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (with contributions by A. Birnbaum). Reading, MA: Addison-Wesley.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28, 3049-3082.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Micerri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved January 10, 2017, from https://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielson and Lydiche (for Danmarks Paedagogiske Institut).
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.
Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21, 65-88.
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40, 98-113.
Seong, T. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299-311.
Stewart, J. (2012) Does IRT provide more sensitive measures of latent traits in statistical tests? An empirical examination. Shiken Research Bulletin, 16, 15-22.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-16.
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharay (Eds.), Psychometrics: Vol. 26. Handbook of statistics (pp. 683–718). Amsterdam: Elsevier.

Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitim Üzerine Çalışmalar
Bölüm	Makaleler
Yazarlar	Tuğba Karadavut
Yayımlanma Tarihi	5 Ocak 2020
Gönderilme Tarihi	22 Haziran 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 6 Sayı: 4

Kaynak Göster

APA	Karadavut, T. (2020). The Uniform Prior for Bayesian Estimation of Ability in Item Response Theory Models. International Journal of Assessment Tools in Education, 6(4), 568-579. https://doi.org/10.21449/ijate.581314

Cited By

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1290831

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests

Applied Psychological Measurement

https://doi.org/10.1177/01466216231201986

Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

Psychometrika

https://doi.org/10.1007/s11336-022-09845-x

Makale Dosyaları

Tam Metin

23823 23825 23824