Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Melek Gülşah Şahin; Nagihan Boztunç Öztürk

doi:10.21449/ijate.634091

Araştırma Makalesi

Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Yıl 2019, Cilt: 6 Sayı: 4, 555 - 567, 05.01.2020

Melek Gülşah Şahin , Nagihan Boztunç Öztürk

https://doi.org/10.21449/ijate.634091

Cited By: 3

Öz

New statistical methods are being added to the literature as a result of scientific developments each and every day. This study aims at investigating one of these, Maximum Likelihood Score Estimation with Fences (MLEF) method, in ca-MST. The results obtained from this study will contribute to both national and international literature since there is no such study on the applicability of MLEF method in ca-MST. In line with the aim of this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs (1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation were interpreted with correlation, RMSE and AAD as an implication of measurement precision; and with conditional bias calculation in order to show the changes in each ability level. This study is a post-hoc simulation study using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R package program and MSTGen simulation software tool were used in the study. As a result, it can be said that MLEF, as a new ability estimation method, is superior to MLE method in all conditions. EAP estimation method gives the best results in terms of the measurement precision based on correlation, RMSE and AAD values, whereas the results gained via MLEF estimation method are pretty close to those in EAP estimation method. MLE proves to be less biased in ability estimation, especially in extreme ability levels, when compared to EAP ability estimation method.

Anahtar Kelimeler

MLEF, MLE, EAP, ca-MST, Ability estimation

Kaynakça

Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.
Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from https://files.eric.ed.gov/fulltext/ED562580.pdf
Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from https://cran.r-project.org/web/packages/xxIRT/xxIRT.pdf
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/
Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer

Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Yıl 2019, Cilt: 6 Sayı: 4, 555 - 567, 05.01.2020

Melek Gülşah Şahin , Nagihan Boztunç Öztürk

https://doi.org/10.21449/ijate.634091

Cited By: 3

Öz

New statistical methods are being added to the
literature as a result of scientific developments each and every day. This
study aims at investigating one of these, Maximum Likelihood Score Estimation
with Fences (MLEF) method, in ca-MST. The results obtained from this study will
contribute to both national and international literature since there is no such
study on the applicability of MLEF method in ca-MST. In line with the aim of
this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs
(1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation
methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation
were interpreted with correlation, RMSE and AAD as an implication of
measurement precision; and with conditional bias calculation in order to show
the changes in each ability level. This study is a post-hoc simulation study
using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R
package program and MSTGen simulation software tool were used in the study. As
a result, it can be said that MLEF, as a new ability estimation method, is
superior to MLE method in all conditions.
EAP estimation method gives the best results in terms of the measurement
precision based on correlation, RMSE and AAD values, whereas the results gained
via MLEF estimation method are pretty close to those in EAP estimation method.
MLE proves to be less biased in ability estimation, especially in extreme
ability levels, when compared to EAP ability estimation method.

Anahtar Kelimeler

MLEF, MLE, EAP, ca-MST, Ability estimation

Kaynakça

Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.
Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from https://files.eric.ed.gov/fulltext/ED562580.pdf
Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from https://cran.r-project.org/web/packages/xxIRT/xxIRT.pdf
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/
Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer

Toplam 32 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitim Üzerine Çalışmalar
Bölüm	Makaleler
Yazarlar	Melek Gülşah Şahin 0000-0001-5139-9777 Nagihan Boztunç Öztürk 0000-0002-2777-5311
Yayımlanma Tarihi	5 Ocak 2020
Gönderilme Tarihi	30 Temmuz 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 6 Sayı: 4

Kaynak Göster

APA	Şahin, M. G., & Boztunç Öztürk, N. (2020). Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST. International Journal of Assessment Tools in Education, 6(4), 555-567. https://doi.org/10.21449/ijate.634091

Cited By

Research on english E-learning teaching model based on digital entertainment and gamification experience: Interactive teaching experience

Entertainment Computing

https://doi.org/10.1016/j.entcom.2024.100867

Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması

Boğaziçi Üniversitesi Eğitim Dergisi

https://doi.org/10.52597/buje.1329338

Comparison of Different Computerized Adaptive Testing Approaches with Shadow Test Under Different Test Length and Ability Estimation Method Conditions

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1202599

Makale Dosyaları

Tam Metin

23823 23825 23824