Research Article
BibTex RIS Cite

Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Year 2019, , 555 - 567, 05.01.2020
https://doi.org/10.21449/ijate.634091

Abstract

New statistical methods are being added to the literature as a result of scientific developments each and every day. This study aims at investigating one of these, Maximum Likelihood Score Estimation with Fences (MLEF) method, in ca-MST. The results obtained from this study will contribute to both national and international literature since there is no such study on the applicability of MLEF method in ca-MST. In line with the aim of this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs (1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation were interpreted with correlation, RMSE and AAD as an implication of measurement precision; and with conditional bias calculation in order to show the changes in each ability level. This study is a post-hoc simulation study using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R package program and MSTGen simulation software tool were used in the study. As a result, it can be said that MLEF, as a new ability estimation method, is superior to MLE method in all conditions. EAP estimation method gives the best results in terms of the measurement precision based on correlation, RMSE and AAD values, whereas the results gained via MLEF estimation method are pretty close to those in EAP estimation method. MLE proves to be less biased in ability estimation, especially in extreme ability levels, when compared to EAP ability estimation method.

References

  • Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.
  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
  • Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
  • Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
  • Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
  • Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
  • Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
  • Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
  • International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
  • Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
  • Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
  • Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
  • Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
  • Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
  • Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from https://files.eric.ed.gov/fulltext/ED562580.pdf
  • Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from https://cran.r-project.org/web/packages/xxIRT/xxIRT.pdf
  • Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
  • Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/
  • Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
  • Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
  • Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
  • Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
  • Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
  • Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
  • Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer

Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Year 2019, , 555 - 567, 05.01.2020
https://doi.org/10.21449/ijate.634091

Abstract

New statistical methods are being added to the
literature as a result of scientific developments each and every day. This
study aims at investigating one of these, Maximum Likelihood Score Estimation
with Fences (MLEF) method, in ca-MST. The results obtained from this study will
contribute to both national and international literature since there is no such
study on the applicability of MLEF method in ca-MST. In line with the aim of
this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs
(1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation
methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation
were interpreted with correlation, RMSE and AAD as an implication of
measurement precision; and with conditional bias calculation in order to show
the changes in each ability level. This study is a post-hoc simulation study
using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R
package program and MSTGen simulation software tool were used in the study. As
a result, it can be said that MLEF, as a new ability estimation method, is
superior to MLE method in all conditions. 
EAP estimation method gives the best results in terms of the measurement
precision based on correlation, RMSE and AAD values, whereas the results gained
via MLEF estimation method are pretty close to those in EAP estimation method.
MLE proves to be less biased in ability estimation, especially in extreme
ability levels, when compared to EAP ability estimation method.

References

  • Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.
  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
  • Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
  • Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
  • Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
  • Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
  • Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
  • Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
  • International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
  • Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
  • Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
  • Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
  • Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
  • Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
  • Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from https://files.eric.ed.gov/fulltext/ED562580.pdf
  • Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from https://cran.r-project.org/web/packages/xxIRT/xxIRT.pdf
  • Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
  • Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/
  • Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
  • Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
  • Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
  • Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
  • Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
  • Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
  • Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
  • Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer
There are 32 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Articles
Authors

Melek Gülşah Şahin 0000-0001-5139-9777

Nagihan Boztunç Öztürk 0000-0002-2777-5311

Publication Date January 5, 2020
Submission Date July 30, 2019
Published in Issue Year 2019

Cite

APA Şahin, M. G., & Boztunç Öztürk, N. (2020). Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST. International Journal of Assessment Tools in Education, 6(4), 555-567. https://doi.org/10.21449/ijate.634091

23823             23825             23824