puje

Pamukkale Üniversitesi Eğitim Fakültesi Dergisi

1301-0085 1309-0275

Pamukkale University

10.9779/pauefd.585774

3 ve 4PL Madde Tepki Kuramı Modellerine Göre Farklı Yetenek Kestirim Yöntemlerinin Karşılaştırılması

Comparison of Different Ability Estimation Methods Based on 3 and 4PL Item Response Theory

https://orcid.org/0000-0001-6572-274X

Doğruöz

Ebru

ÇANKIRI KARATEKİN ÜNİVERSİTESİ

https://orcid.org/0000-0001-5255-8792

Akın Arıkan

Çiğdem

ORDU ÜNİVERSİTESİ

09 01 2020

50 50 69 07 02 2019 02 05 2020

1996

Pamukkale University Journal of Education

Bu araştırmada iki kategorili MaddeTepki Kuramı modelleri, farklı yetenek kestirim yöntemleri bağlamındaincelenmiştir. Araştırma 2015-2016 yılında 8. Sınıf öğrencilerin TEOG sınavınınmatematik alt testinde yer alan 20 maddeye verdikleri yanıtlar ışığındagerçekleştirilmiştir. Bu verilerden seçkisiz olarak seçilen 4000 yanıtlayıcı,çalışma grubunu oluşturmaktadır. Veriler üzerinden yetenek kestirimleri ve bukestirimlere ait standart hata değerleri hesaplanmıştır. Bu kestirimlertekrarlı ölçümler için iki faktörlü varyans analizi (ANOVA) kullanılarakkarşılaştırılmıştır. Araştırma bulguları 4PL modelin daha iyi uyum gösterdiğiniortaya çıkartmıştır. WLE yetenek kestirim yönteminin doğruluğu MAP ve EAPyetenek kestirim yönteminin doğruluğundan daha yüksektir. 4PL modele göre WLE,3PL modele göre MAP yetenek kestirim modelinin standart hata değeri dahadüşüktür. En yüksek marjinal güvenirlik katsayı değeri 3PL model için MAP, 4PLmodel için WLE yöntemine göre gerçekleştirilen kestirimlerden hesaplanmıştır.Araştırma bulgularına dayalı olarak 4 PL model altında WLE kestirim yönteminegöre gerçekleştirilen yetenek puanlarının doğruluğunun yüksek olduğu sonucunaulaşılmıştır.

This research analyzed the two-category ItemResponse Theory (IRT) models as part of different ability estimation methods.The research was carried out in consideration of responses to 20 items underthe Mathematics subtest of TEOG (National Transition from Primary to SecondaryEducation) exam by the 8th-grade students in 2015-2016. The study groupconsisted of 400 students who were randomly selected from the studentsparticipated in the TEOG exam. Ability estimations and standard error valuesfor these estimations were calculated based on the data. These estimations werecompared by two-way analysis of variance (ANOVA) for repeated measurementsAccording to the research findings; it was revealed that the four-parameterlogistic (4PL) item model fit better. In terms of ability estimation methods,the accuracy of Weighted Likelihood Estimation (WLE) was higher than Maximum APosteriori (MAP) and Expected A Posteriori (EAP). WLE and MAP ability estimationmodel gave lower standard error values compared to the 4PL and 3PL model,respectively. The highest marginal reliability coefficient value for the 3PLmodel was calculated using estimations made according to MAP while estimationsmade according to WLE were used for the 4PL model. According to the researchfindings, it was concluded that the accuracy of ability scores obtained by theWLE estimation method under the 4PL model was higher

ability estimation methods item response theory 3 PLM 4PLM

yetenek kestirim yöntemleri madde tepki kuramı 3PLM 4PLM

Baker, F. B. (1992). Item Response Theory: Parameter Estimation Technique. New York: Marcel Dekker.

Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 3-12. http://doi.org/cp7ddc

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. Research Bulletin, 81-20. Princeton, NJ: Educational Testing Service.

Baykul, Y. (1979). Örtük özellikler ve klasik test kuramları üzerine bir karşılaştırma (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.

Berberoğlu, G. (1988). Seçme amacıyla kullanılan testlerde Rasch modelinin katkıları (Unpublished Doctoral thesis). Hacettepe University, Graduate School of Social Sciences, Ankara.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. F. M. Lord & M. R. Novick(Ed), Statistical theories of mental test scores içinde (pp. 397-472). Reading MA: Addison-Wesley.

Borgatto, A. F., Azevedo, C. L. N., Pinheiro, A., & Andrade, D. F. (2015). Comparison of ability estimation methods using irt for test with different degrees of difficulty. Communications in Statistics-Simulation and Computation, 44(2), 474-488.

Ching-Fung, B. S. (2002). Ability estimation under different item parametrization and scoring models (Unpublished Doctoral thesis). North Teksas University, Teksas.

Can, S. (2003). The analyses of secondary education institutions student selection and placement test’s verbal section with respect to item response theory models (Unpublished Master's thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.

Chalmers R. P. (2013). mirt: Multidimensional Item Response Theory. R package version 0.9.0, [Çevirim içi: http://CRAN.R-project.org/package=mirt].

Cole, D. A. (1987). Utility of confirmatory factor analysis in test validation research. Journal of Consulting and Clinical Psychology, 55, 584-594.

Çelik, D. (2001). The Fit of one, two and three-parameter models of item response theory (IRT) to the ministry of National Education secondary school institutions student selection and placement test data (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.

Çetin, B. ve Çelikten, S. (2016). Nominal response model altında yetenek kestirim yöntemlerinin karşılaştırılması. International Engineering, Science and Education Conference, 01-03 December 2016, Diyarbakır.

DeMars, C. (2010). Item response theory. New York: Oxford University Press.

De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. U. S. A. Erdemir, A. (2015). Bir, iki, üç ve dört parametreli lojistik madde tepki kuramı modellerinin karşılaştırılması (Comparison of 1PL, 2PL, 3PL and 4PL item response theory models) (Unpublished Master's thesis). Gazi University, Graduate School of Educational Sciences, Ankara.

Finch, W. Holmes, & French, Brian F. (2012). Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods. Journal of Modern Applied Statistical Methods, 11(1), Article 14. DOI: 10.22237/jmasm/1335845580.

Gardner-Medwin, A. R., & Gahan, M. (2003). Formative and summative confidence-based assessment. In J. Christie (Ed.), Proceedings of the 7th International Computer-Aided Assessment Conference (pp.147-155). Loughborough, UK: Loughborough University.

Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Nijhoff.

Hockemeyer, C. (2002). A comparison of non-deterministic procedures for the adaptive assessment of knowledge. Psychologische Beiträge, 44, 495-503.

Kılıç, İ. (1999). The fit of one- two- and three- parameter models of item response theory to the student selection test of the student selection and placement center (Unpublished Doctoral thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.

Kline, R. B. (2005). Principles and practices of structural equation modeling. New York: The Guildord.

Liao, W., Ho, R., & Yen, Y. (2012). The Four-Parameter Logistic Item Response Theory Model as a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality, 40(10), 1679-1694.

Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.

Önder, İ. (2007). An investigation of goodness of model data fit. Hacettepe University Journal of Education, 32, 210-220.

Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods 8(2), 164-184.

Reynolds, T., Perkins, K., & Brutten, S. 1994. A comparative item analysis study of a language testing instrument. Language Testing, 11, 1-14.

Rose, N. (2010). Maximum Likelihood and Bayes Modal Ability Estimation in Two-Parametric IRT Models: Derivations and Implementation.

Rulison, K., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83-101. http://doi.org/dtqjq8

Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.

Seong, T. J., Kim, S. H., & Cohen, A. S. (1997). A comparison of procedures for ability estimation under the graded response model. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.

Taşdelen Teker, G., Kelecioğlu, H. ve Eroğlu, M. G. (2013). An investigation of goodness of model data fit. 4. International Conference on New Horizons in Education, June, 25-27, 2013, Roma, Italia.

Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.

Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Education Measurement, 35, 109-135.

Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computarized adaptive testing. Applied Psychological Measurement, 25, 317-331.

Warm, T. A. (1989).Weighted likelihood estimation of ability in item response theory. Psychometrika 54, 427-450.

Yalçın, S. (2018). Data Fit Comparison of Mixture Item Response Theory Models and Traditional Models. International Journal of Assessment Tools in Education, 5(2), 301-313 DOI:10.21449/ijate.402806.

Yapar, T. (2003). A study of the predictive validity of the Başkent a study of the predictive validity of the Başkent University English proficiency exam through the use of the two-parameter irt model’s abiliıty estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.

Yeğin, O. P. (2003). The predictive validity of Başkent University proficiency exam (buepe) through the use of the three-parameter irt model’s ability estimates (Unpublished Master’s thesis). Middle East Technical University, Graduate School of Social Sciences, Ankara.

Yen, Y.-C., Ho, R.-G., Chen, L.-J., Chou, K.-Y., & Chen, Y.-L. (2010). Development and evaluation of a confidence-weighting computerized adaptive testing. Educational Technology & Society, 13, 163-176.

Yen, Y., Ho, R., Liao, W., & Chen, L. (2012). Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing. Educational Technology & Society, 15, 231–243.

Zwinderman, A. H., & van den Wollenberg, A. L. (1990). Robustness of marginal maximum likelihood estimation in the rasch model. Applied Psychological Measurement, 14(1), 73–81.