Comparison of Different Ability Estimation Methods Based on 3 and 4PL Item Response Theory

Ebru Doğruöz; Çiğdem Akın Arıkan

doi:10.9779/pauefd.585774

Research Article

3 ve 4PL Madde Tepki Kuramı Modellerine Göre Farklı Yetenek Kestirim Yöntemlerinin Karşılaştırılması

Year 2020, Issue: 50 , 50 - 69 , 01.09.2020

Ebru Doğruöz , Çiğdem Akın Arıkan

https://doi.org/10.9779/pauefd.585774

Cited By: 2

https://izlik.org/JA83UP66UN

Abstract

Bu araştırmada iki kategorili Madde
Tepki Kuramı modelleri, farklı yetenek kestirim yöntemleri bağlamında
incelenmiştir. Araştırma 2015-2016 yılında 8. Sınıf öğrencilerin TEOG sınavının
matematik alt testinde yer alan 20 maddeye verdikleri yanıtlar ışığında
gerçekleştirilmiştir. Bu verilerden seçkisiz olarak seçilen 4000 yanıtlayıcı,
çalışma grubunu oluşturmaktadır. Veriler üzerinden yetenek kestirimleri ve bu
kestirimlere ait standart hata değerleri hesaplanmıştır. Bu kestirimler
tekrarlı ölçümler için iki faktörlü varyans analizi (ANOVA) kullanılarak
karşılaştırılmıştır. Araştırma bulguları 4PL modelin daha iyi uyum gösterdiğini
ortaya çıkartmıştır. WLE yetenek kestirim yönteminin doğruluğu MAP ve EAP
yetenek kestirim yönteminin doğruluğundan daha yüksektir. 4PL modele göre WLE,
3PL modele göre MAP yetenek kestirim modelinin standart hata değeri daha
düşüktür. En yüksek marjinal güvenirlik katsayı değeri 3PL model için MAP, 4PL
model için WLE yöntemine göre gerçekleştirilen kestirimlerden hesaplanmıştır.
Araştırma bulgularına dayalı olarak 4 PL model altında WLE kestirim yöntemine
göre gerçekleştirilen yetenek puanlarının doğruluğunun yüksek olduğu sonucuna
ulaşılmıştır.

Keywords

yetenek kestirim yöntemleri , madde tepki kuramı , 3PLM , 4PLM

References

Baker, F. B. (1992). Item Response Theory: Parameter Estimation Technique. New York: Marcel Dekker.
Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 3-12. http://doi.org/cp7ddc
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. Research Bulletin, 81-20. Princeton, NJ: Educational Testing Service.
Baykul, Y. (1979). Örtük özellikler ve klasik test kuramları üzerine bir karşılaştırma (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.
Berberoğlu, G. (1988). Seçme amacıyla kullanılan testlerde Rasch modelinin katkıları (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. F. M. Lord & M. R. Novick(Ed), Statistical theories of mental test scores içinde (pp. 397-472). Reading MA: Addison-Wesley.
Borgatto, A. F., Azevedo, C. L. N., Pinheiro, A., & Andrade, D. F. (2015). Comparison of ability estimation methods using irt for test with different degrees of difficulty. Communications in Statistics-Simulation and Computation, 44(2), 474-488.
Ching-Fung, B. S. (2002). Ability estimation under different item parametrization and scoring models (Unpublished Doctoral thesis). North Teksas University, Teksas.
Can, S. (2003). The analyses of secondary education institutions student selection and placement test’s verbal section with respect to item response theory models (Unpublished Master's thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Chalmers R. P. (2013). mirt: Multidimensional Item Response Theory. R package version 0.9.0, [Çevirim içi: http://CRAN.R-project.org/package=mirt].
Cole, D. A. (1987). Utility of confirmatory factor analysis in test validation research. Journal of Consulting and Clinical Psychology, 55, 584-594.
Çelik, D. (2001). The Fit of one, two and three-parameter models of item response theory (IRT) to the ministry of National Education secondary school institutions student selection and placement test data (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Çetin, B. ve Çelikten, S. (2016). Nominal response model altında yetenek kestirim yöntemlerinin karşılaştırılması. International Engineering, Science and Education Conference, 01-03 December 2016, Diyarbakır.
DeMars, C. (2010). Item response theory. New York: Oxford University Press.
De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. U. S. A. Erdemir, A. (2015). Bir, iki, üç ve dört parametreli lojistik madde tepki kuramı modellerinin karşılaştırılması (Comparison of 1PL, 2PL, 3PL and 4PL item response theory models) (Unpublished Master's thesis). Gazi University, Graduate School of Educational Sciences, Ankara.
Finch, W. Holmes, & French, Brian F. (2012). Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods. Journal of Modern Applied Statistical Methods, 11(1), Article 14. DOI: 10.22237/jmasm/1335845580.
Gardner-Medwin, A. R., & Gahan, M. (2003). Formative and summative confidence-based assessment. In J. Christie (Ed.), Proceedings of the 7th International Computer-Aided Assessment Conference (pp.147-155). Loughborough, UK: Loughborough University.
Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Nijhoff.
Hockemeyer, C. (2002). A comparison of non-deterministic procedures for the adaptive assessment of knowledge. Psychologische Beiträge, 44, 495-503.
Kılıç, İ. (1999). The fit of one- two- and three- parameter models of item response theory to the student selection test of the student selection and placement center (Unpublished Doctoral thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Kline, R. B. (2005). Principles and practices of structural equation modeling. New York: The Guildord.
Liao, W., Ho, R., & Yen, Y. (2012). The Four-Parameter Logistic Item Response Theory Model as a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality, 40(10), 1679-1694.
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.
Önder, İ. (2007). An investigation of goodness of model data fit. Hacettepe University Journal of Education, 32, 210-220.
Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods 8(2), 164-184.
Reynolds, T., Perkins, K., & Brutten, S. 1994. A comparative item analysis study of a language testing instrument. Language Testing, 11, 1-14.
Rose, N. (2010). Maximum Likelihood and Bayes Modal Ability Estimation in Two-Parametric IRT Models: Derivations and Implementation.
Rulison, K., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83-101. http://doi.org/dtqjq8
Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Seong, T. J., Kim, S. H., & Cohen, A. S. (1997). A comparison of procedures for ability estimation under the graded response model. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.
Taşdelen Teker, G., Kelecioğlu, H. ve Eroğlu, M. G. (2013). An investigation of goodness of model data fit. 4. International Conference on New Horizons in Education, June, 25-27, 2013, Roma, Italia.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Education Measurement, 35, 109-135.
Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computarized adaptive testing. Applied Psychological Measurement, 25, 317-331.
Warm, T. A. (1989).Weighted likelihood estimation of ability in item response theory. Psychometrika 54, 427-450.
Yalçın, S. (2018). Data Fit Comparison of Mixture Item Response Theory Models and Traditional Models. International Journal of Assessment Tools in Education, 5(2), 301-313 DOI:10.21449/ijate.402806.
Yapar, T. (2003). A study of the predictive validity of the Başkent a study of the predictive validity of the Başkent University English proficiency exam through the use of the two-parameter irt model’s abiliıty estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Yeğin, O. P. (2003). The predictive validity of Başkent University proficiency exam (buepe) through the use of the three-parameter irt model’s ability estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Yen, Y.-C., Ho, R.-G., Chen, L.-J., Chou, K.-Y., & Chen, Y.-L. (2010). Development and evaluation of a confidence-weighting computerized adaptive testing. Educational Technology & Society, 13, 163-176.
Yen, Y., Ho, R., Liao, W., & Chen, L. (2012). Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing. Educational Technology & Society, 15, 231–243.
Zwinderman, A. H., & van den Wollenberg, A. L. (1990). Robustness of marginal maximum likelihood estimation in the rasch model. Applied Psychological Measurement, 14(1), 73–81.

Comparison of Different Ability Estimation Methods Based on 3 and 4PL Item Response Theory

Year 2020, Issue: 50 , 50 - 69 , 01.09.2020

Ebru Doğruöz , Çiğdem Akın Arıkan

https://doi.org/10.9779/pauefd.585774

Cited By: 2

https://izlik.org/JA83UP66UN

Abstract

This research analyzed the two-category Item
Response Theory (IRT) models as part of different ability estimation methods.
The research was carried out in consideration of responses to 20 items under
the Mathematics subtest of TEOG (National Transition from Primary to Secondary
Education) exam by the 8th-grade students in 2015-2016. The study group
consisted of 400 students who were randomly selected from the students
participated in the TEOG exam. Ability estimations and standard error values
for these estimations were calculated based on the data. These estimations were
compared by two-way analysis of variance (ANOVA) for repeated measurements
According to the research findings; it was revealed that the four-parameter
logistic (4PL) item model fit better. In terms of ability estimation methods,
the accuracy of Weighted Likelihood Estimation (WLE) was higher than Maximum A
Posteriori (MAP) and Expected A Posteriori (EAP). WLE and MAP ability estimation
model gave lower standard error values compared to the 4PL and 3PL model,
respectively. The highest marginal reliability coefficient value for the 3PL
model was calculated using estimations made according to MAP while estimations
made according to WLE were used for the 4PL model. According to the research
findings, it was concluded that the accuracy of ability scores obtained by the
WLE estimation method under the 4PL model was higher

Keywords

ability estimation methods , item response theory , 3 PLM , 4PLM

References

Baker, F. B. (1992). Item Response Theory: Parameter Estimation Technique. New York: Marcel Dekker.
Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 3-12. http://doi.org/cp7ddc
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. Research Bulletin, 81-20. Princeton, NJ: Educational Testing Service.
Baykul, Y. (1979). Örtük özellikler ve klasik test kuramları üzerine bir karşılaştırma (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.
Berberoğlu, G. (1988). Seçme amacıyla kullanılan testlerde Rasch modelinin katkıları (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. F. M. Lord & M. R. Novick(Ed), Statistical theories of mental test scores içinde (pp. 397-472). Reading MA: Addison-Wesley.
Borgatto, A. F., Azevedo, C. L. N., Pinheiro, A., & Andrade, D. F. (2015). Comparison of ability estimation methods using irt for test with different degrees of difficulty. Communications in Statistics-Simulation and Computation, 44(2), 474-488.
Ching-Fung, B. S. (2002). Ability estimation under different item parametrization and scoring models (Unpublished Doctoral thesis). North Teksas University, Teksas.
Can, S. (2003). The analyses of secondary education institutions student selection and placement test’s verbal section with respect to item response theory models (Unpublished Master's thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Chalmers R. P. (2013). mirt: Multidimensional Item Response Theory. R package version 0.9.0, [Çevirim içi: http://CRAN.R-project.org/package=mirt].
Cole, D. A. (1987). Utility of confirmatory factor analysis in test validation research. Journal of Consulting and Clinical Psychology, 55, 584-594.
Çelik, D. (2001). The Fit of one, two and three-parameter models of item response theory (IRT) to the ministry of National Education secondary school institutions student selection and placement test data (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Çetin, B. ve Çelikten, S. (2016). Nominal response model altında yetenek kestirim yöntemlerinin karşılaştırılması. International Engineering, Science and Education Conference, 01-03 December 2016, Diyarbakır.
DeMars, C. (2010). Item response theory. New York: Oxford University Press.
De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. U. S. A. Erdemir, A. (2015). Bir, iki, üç ve dört parametreli lojistik madde tepki kuramı modellerinin karşılaştırılması (Comparison of 1PL, 2PL, 3PL and 4PL item response theory models) (Unpublished Master's thesis). Gazi University, Graduate School of Educational Sciences, Ankara.
Finch, W. Holmes, & French, Brian F. (2012). Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods. Journal of Modern Applied Statistical Methods, 11(1), Article 14. DOI: 10.22237/jmasm/1335845580.
Gardner-Medwin, A. R., & Gahan, M. (2003). Formative and summative confidence-based assessment. In J. Christie (Ed.), Proceedings of the 7th International Computer-Aided Assessment Conference (pp.147-155). Loughborough, UK: Loughborough University.
Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Nijhoff.
Hockemeyer, C. (2002). A comparison of non-deterministic procedures for the adaptive assessment of knowledge. Psychologische Beiträge, 44, 495-503.
Kılıç, İ. (1999). The fit of one- two- and three- parameter models of item response theory to the student selection test of the student selection and placement center (Unpublished Doctoral thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Kline, R. B. (2005). Principles and practices of structural equation modeling. New York: The Guildord.
Liao, W., Ho, R., & Yen, Y. (2012). The Four-Parameter Logistic Item Response Theory Model as a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality, 40(10), 1679-1694.
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.
Önder, İ. (2007). An investigation of goodness of model data fit. Hacettepe University Journal of Education, 32, 210-220.
Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods 8(2), 164-184.
Reynolds, T., Perkins, K., & Brutten, S. 1994. A comparative item analysis study of a language testing instrument. Language Testing, 11, 1-14.
Rose, N. (2010). Maximum Likelihood and Bayes Modal Ability Estimation in Two-Parametric IRT Models: Derivations and Implementation.
Rulison, K., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83-101. http://doi.org/dtqjq8
Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Seong, T. J., Kim, S. H., & Cohen, A. S. (1997). A comparison of procedures for ability estimation under the graded response model. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.
Taşdelen Teker, G., Kelecioğlu, H. ve Eroğlu, M. G. (2013). An investigation of goodness of model data fit. 4. International Conference on New Horizons in Education, June, 25-27, 2013, Roma, Italia.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Education Measurement, 35, 109-135.
Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computarized adaptive testing. Applied Psychological Measurement, 25, 317-331.
Warm, T. A. (1989).Weighted likelihood estimation of ability in item response theory. Psychometrika 54, 427-450.
Yalçın, S. (2018). Data Fit Comparison of Mixture Item Response Theory Models and Traditional Models. International Journal of Assessment Tools in Education, 5(2), 301-313 DOI:10.21449/ijate.402806.
Yapar, T. (2003). A study of the predictive validity of the Başkent a study of the predictive validity of the Başkent University English proficiency exam through the use of the two-parameter irt model’s abiliıty estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Yeğin, O. P. (2003). The predictive validity of Başkent University proficiency exam (buepe) through the use of the three-parameter irt model’s ability estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.
Yen, Y.-C., Ho, R.-G., Chen, L.-J., Chou, K.-Y., & Chen, Y.-L. (2010). Development and evaluation of a confidence-weighting computerized adaptive testing. Educational Technology & Society, 13, 163-176.
Yen, Y., Ho, R., Liao, W., & Chen, L. (2012). Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing. Educational Technology & Society, 15, 231–243.
Zwinderman, A. H., & van den Wollenberg, A. L. (1990). Robustness of marginal maximum likelihood estimation in the rasch model. Applied Psychological Measurement, 14(1), 73–81.

There are 41 citations in total.

Details

Primary Language	English
Journal Section	Research Article
Authors	Ebru Doğruöz 0000-0001-6572-274X Çiğdem Akın Arıkan 0000-0001-5255-8792
Submission Date	July 2, 2019
Acceptance Date	February 5, 2020
Publication Date	September 1, 2020
DOI	https://doi.org/10.9779/pauefd.585774
IZ	https://izlik.org/JA83UP66UN
Published in Issue	Year 2020 Issue: 50

Cite

APA	Doğruöz, E., & Akın Arıkan, Ç. (2020). Comparison of Different Ability Estimation Methods Based on 3 and 4PL Item Response Theory. Pamukkale Üniversitesi Eğitim Fakültesi Dergisi, 50, 50-69. https://doi.org/10.9779/pauefd.585774

Cited By

An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups

Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi

https://doi.org/10.17152/gefad.1202751

Assessing Item Difficulty, Discrimination, Guessing, and Carelessness Parameters of the Mathematics Achievement test for Secondary School Students in Edo State, Nigeria

British Journal of Education, Learning and Development Psychology

https://doi.org/10.52589/BJELDP-4SKVBGUA

Article Files

Full Text

27767 https://creativecommons.org/licenses/by-nc-nd/4.0/