Year 2019, Volume 6 , Issue 4, Pages 555 - 567 2020-01-05

Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Melek Gülşah ŞAHİN [1] , Nagihan BOZTUNÇ ÖZTÜRK [2]

New statistical methods are being added to the literature as a result of scientific developments each and every day. This study aims at investigating one of these, Maximum Likelihood Score Estimation with Fences (MLEF) method, in ca-MST. The results obtained from this study will contribute to both national and international literature since there is no such study on the applicability of MLEF method in ca-MST. In line with the aim of this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs (1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation were interpreted with correlation, RMSE and AAD as an implication of measurement precision; and with conditional bias calculation in order to show the changes in each ability level. This study is a post-hoc simulation study using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R package program and MSTGen simulation software tool were used in the study. As a result, it can be said that MLEF, as a new ability estimation method, is superior to MLE method in all conditions.  EAP estimation method gives the best results in terms of the measurement precision based on correlation, RMSE and AAD values, whereas the results gained via MLEF estimation method are pretty close to those in EAP estimation method. MLE proves to be less biased in ability estimation, especially in extreme ability levels, when compared to EAP ability estimation method.

MLEF, MLE, EAP, ca-MST, Ability estimation
  • Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.
  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
  • Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
  • Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
  • Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
  • Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
  • Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
  • Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
  • International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
  • Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
  • Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
  • Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
  • Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
  • Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
  • Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from
  • Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from
  • Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
  • Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from
  • Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
  • Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
  • Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
  • Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
  • Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
  • Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
  • Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer
Primary Language en
Subjects Education, Scientific Disciplines
Published Date December
Journal Section Articles

Orcid: 0000-0001-5139-9777
Author: Melek Gülşah ŞAHİN (Primary Author)
Country: Turkey

Orcid: 0000-0002-2777-5311
Author: Nagihan BOZTUNÇ ÖZTÜRK
Country: Turkey


Publication Date : January 5, 2020

APA ŞAHİN, M , BOZTUNÇ ÖZTÜRK, N . (2020). Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST. International Journal of Assessment Tools in Education , 6 (4) , 555-567 . DOI: 10.21449/ijate.634091