Stabilizing Maximum Likelihood Estimation with a Damping Factor in the Initial Phase of Computerized Adaptive Testing

Alper Tosun; Eren Can Aybek; Alper Sinan

doi:10.21031/epod.1658558

Research Article

Year 2025, Volume: 16 Issue: 3, 124 - 138, 30.09.2025

Alper Tosun , Eren Can Aybek , Alper Sinan

https://doi.org/10.21031/epod.1658558

Abstract

References

Aybek, E. C., & Çıkrıkçı, R. N. (2018, September). Kendini Değerlendirme Envanteri’nin bilgisayar ortamında bireye uyarlanmış test olarak uygulanabilirliği. Turkish Psychological Counseling and Guidance Journal, 8(50), 117–141. Turkish Psychological Counseling and Guidance Association. Doi; https://dergipark.org.tr/en/download/article-file/571511
Bock, R., & Mislevy, R. (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6, 431 - 444. https://doi.org/10.1177/014662168200600405.
Chen, S., Hou, L., & Dodd, B. (1998). A Comparison of Maximum Likelihood Estimation and Expected a Posteriori Estimation in CAT Using the Partial Credit Model. Educational and Psychological Measurement, 58, 569 - 595. https://doi.org/10.1177/0013164498058004002.
Chen, S., Hou, L., Fitzpatrick, S., & Dodd, B. (1997). The Effect of Population Distribution and Method of Theta Estimation on Computerized Adaptive Testing (CAT) Using the Rating Scale Model. Educational and Psychological Measurement, 57, 422 - 439. https://doi.org/10.1177/0013164497057003004.
Cheng, Y. (2008). Computerized adaptive testing: New developments and applications (Doctoral dissertation, University of Illinois at Urbana-Champaign). University of Illinois at Urbana-Champaign. Doi; https://hdl.handle.net/2142/82159
Di Stefano, F., Pannaux, M., Correges, A., Galtier, S., Robert, V., & Saint‐Hilary, G. (2022). A comparison of estimation methods adjusting for selection bias in adaptive enrichment designs with time‐to‐event endpoints. Statistics in Medicine, 41(10), 1767-1779. doi: https://doi.org/10.1002/sim.9327
Dutilleul, P. (1999). The mle algorithm for the matrix normal distribution. Journal of Statistical Computation and Simulation, 64, 105-123. https://doi.org/10.1080/00949659908811970.
Fuh, C. D., Ip, E. H., & Chen, S. H. (2020). Computerized adaptive test using raw responses for item selection: theoretical results and applications for the up-and-down method. Statistics and Its Interface, 13(3), 317-333. doi; https://doi.org/10.4310/SII.2020.v13.n3.a3
Geraldo, I. (2022). An Automated Profile-Likelihood-Based Algorithm for Fast Computation of the Maximum Likelihood Estimate in a Statistical Model for Crash Data. J. Appl. Math., 2022, 6974166:1-6974166:11. https://doi.org/10.1155/2022/6974166.
Gorin, J., Dodd, B., Fitzpatrick, S., & Shieh, Y. (2005). Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics. Applied Psychological Measurement, 29, 433 - 456. https://doi.org/10.1177/0146621605280072.
Graf, A., Gutjahr, G., & Brannath, W. (2015). Precision of maximum likelihood estimation in adaptive designs. Statistics in Medicine, 35, 922 - 941. https://doi.org/10.1002/sim.6761.
Gündeğer, C., & Doğan, N. (2018). Bireyselleştirilmiş bilgisayarlı sınıflama testlerinde madde havuzu özelliklerinin test uzunluğu ve sınıflama doğruluğu üzerindeki etkisi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 33(4), 888-896. doi: https://doi.org/10.16986/HUJE.2016024284
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (2nd ed.). SAGE Publications. https://books.google.com.tr/books?id=gW05DQAAQBAJ
Han, K. (2016). Maximum Likelihood Score Estimation Method With Fences for Short-Length Tests and Computerized Adaptive Tests. Applied Psychological Measurement, 40, 289 - 301. https://doi.org/10.1177/0146621616631317.
Ho, T. H., & Dodd, B. G. (2012). Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test. Applied Measurement in Education, 25(4), 305–326. https://doi.org/10.1080/08957347.2012.714686
Kalender, İ. (2009). Başarı ve yetenek kestiriminde yeni bir yaklaşım: Bilgisayar ortamında bireyselleştirilmiş testler. CITO Eğitim: Kuram ve Uygulama, 5, 40–48.
Kalender, İ. (2011). Effects of different computerized adaptive testing strategies on recovery of ability (Doctoral dissertation, Middle East Technical University). Middle East Technical University Graduate School of Natural and Applied Sciences. https://hdl.handle.net/11511/21135
Karagianni, M., & Tsaousis, I. (2025). From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning. Behavioral Sciences, 15(3), 268. doi; https://doi.org/10.3390/bs15030268
Kern, J. L., & Choe, E. (2021). Using a response time–based expected a posteriori estimator to control for differential speededness in computerized adaptive test. Applied Psychological Measurement, 45(5), 361-385. DOI: https://doi.org/10.1177/01466216211014601
Lilley, M. (2007). The development and application of computer-adaptive testing in a higher education environment (Doctoral dissertation, University of Hertfordshire).
Lin, C. H., Chen, K. P., & Tsai, C. H. (2008, December). Modeling the Examinee Ability on the Computerized Adaptive Testing Using Adaptive Network-Based Fuzzy Inference System. In 2008 IEEE Asia-Pacific Services Computing Conference (pp. 139-144). IEEE. doi: https://doi.org/10.1109/APSCC.2008.53.
Lord, F. M., & Novick, M. R. (2008). Statistical theories of mental test scores. Information Age Publishing. https://books.google.com.tr/books?id=k_wnDwAAQBAJ
Magis, D. & Raiche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48(8), 1-31. https://doi.org/10.18637/jss.v048.i08
Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19. Doi; https://doi.org/10.18637/jss.v076.c01
Piepho, H. (1993). Use of the Maximum Likelihood Method in the Analysis of Phenotypic Stability. Biometrical Journal, 35, 815-822. https://doi.org/10.1002/BIMJ.4710350709.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. URL https://www.R-project.org/.
Sands, W. A., Waters, B. K., & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation. American Psychological Association. https://doi.org/10.1037/10244-000
Stocking, M. L., & Lewis, C. (1995). A new method of controlling item exposure in computerized adaptive testing. ETS Research Report Series, 1995(2), i-29. https://doi.org/10.1002/j.2333-8504.1995.tb01660.x
Suhardi, I. (2020). Alternative item selection strategies for improving test security in computerized adaptive testing of the algorithm. Research and Evaluation in Education, 6, 32-40. https://doi.org/10.21831/reid.v6i1.30508.
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2000). Computerized adaptive testing: Theory and practice. Springer. https://doi.org/10.1007/0-306-47531-6
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. Springer. https://doi.org/10.1007/978-0-387-85461-8
Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317-331. https://doi.org/10.1177/0146621012203216
Wang, T., & Vispoel, W. (1998). Properties of Ability Estimation Methods in Computerized Adaptive Testing. Journal of Educational Measurement, 35, 109-135. https://doi.org/10.1111/J.1745-3984.1998.TB00530.X.
Weiss, D., & Şahin, A. (2024). Computerized adaptive testing: From concept to implementation. The Guilford Press. ISBN: 9781462554515
Wyse, A., & Mcbride, J. (2022). Handling Extreme Scores in Vertically Scaled Fixed-Length Computerized Adaptive Tests. Measurement: Interdisciplinary Research and Perspectives, 20, 1 - 20. https://doi.org/10.1080/15366367.2021.1977583

Stabilizing Maximum Likelihood Estimation with a Damping Factor in the Initial Phase of Computerized Adaptive Testing

Year 2025, Volume: 16 Issue: 3, 124 - 138, 30.09.2025

Alper Tosun , Eren Can Aybek , Alper Sinan

https://doi.org/10.21031/epod.1658558

Abstract

Maximum Likelihood Estimation (MLE) is a widely used ability estimation method in Item Response Theory (IRT)-based CAT applications. However, traditional MLE is highly sensitive to initial responses, often leading to substantial fluctuations and estimation instability, particularly in short tests or small item pools. This study investigates the effects of incorporating a damping factor into MLE at the early stages of CAT to mitigate undue ability estimate fluctuations. Using Monte Carlo simulations based on a 3-Parameter Logistic (3PL) model in R, we examine the performance of the adjusted MLE compared to standard MLE, Maximum A Posteriori (MAP), and Expected a Posteriori (EAP) estimation methods. Results indicate that damping improves MLE stability, reducing extreme ability fluctuations and enhancing estimation accuracy, particularly in short tests and small sample conditions.

Keywords

Computerized Adaptive Testing , Maximum Likelihood Estimation , Maximum a Posteriori

References

Aybek, E. C., & Çıkrıkçı, R. N. (2018, September). Kendini Değerlendirme Envanteri’nin bilgisayar ortamında bireye uyarlanmış test olarak uygulanabilirliği. Turkish Psychological Counseling and Guidance Journal, 8(50), 117–141. Turkish Psychological Counseling and Guidance Association. Doi; https://dergipark.org.tr/en/download/article-file/571511
Bock, R., & Mislevy, R. (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6, 431 - 444. https://doi.org/10.1177/014662168200600405.
Chen, S., Hou, L., & Dodd, B. (1998). A Comparison of Maximum Likelihood Estimation and Expected a Posteriori Estimation in CAT Using the Partial Credit Model. Educational and Psychological Measurement, 58, 569 - 595. https://doi.org/10.1177/0013164498058004002.
Chen, S., Hou, L., Fitzpatrick, S., & Dodd, B. (1997). The Effect of Population Distribution and Method of Theta Estimation on Computerized Adaptive Testing (CAT) Using the Rating Scale Model. Educational and Psychological Measurement, 57, 422 - 439. https://doi.org/10.1177/0013164497057003004.
Cheng, Y. (2008). Computerized adaptive testing: New developments and applications (Doctoral dissertation, University of Illinois at Urbana-Champaign). University of Illinois at Urbana-Champaign. Doi; https://hdl.handle.net/2142/82159
Di Stefano, F., Pannaux, M., Correges, A., Galtier, S., Robert, V., & Saint‐Hilary, G. (2022). A comparison of estimation methods adjusting for selection bias in adaptive enrichment designs with time‐to‐event endpoints. Statistics in Medicine, 41(10), 1767-1779. doi: https://doi.org/10.1002/sim.9327
Dutilleul, P. (1999). The mle algorithm for the matrix normal distribution. Journal of Statistical Computation and Simulation, 64, 105-123. https://doi.org/10.1080/00949659908811970.
Fuh, C. D., Ip, E. H., & Chen, S. H. (2020). Computerized adaptive test using raw responses for item selection: theoretical results and applications for the up-and-down method. Statistics and Its Interface, 13(3), 317-333. doi; https://doi.org/10.4310/SII.2020.v13.n3.a3
Geraldo, I. (2022). An Automated Profile-Likelihood-Based Algorithm for Fast Computation of the Maximum Likelihood Estimate in a Statistical Model for Crash Data. J. Appl. Math., 2022, 6974166:1-6974166:11. https://doi.org/10.1155/2022/6974166.
Gorin, J., Dodd, B., Fitzpatrick, S., & Shieh, Y. (2005). Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics. Applied Psychological Measurement, 29, 433 - 456. https://doi.org/10.1177/0146621605280072.
Graf, A., Gutjahr, G., & Brannath, W. (2015). Precision of maximum likelihood estimation in adaptive designs. Statistics in Medicine, 35, 922 - 941. https://doi.org/10.1002/sim.6761.
Gündeğer, C., & Doğan, N. (2018). Bireyselleştirilmiş bilgisayarlı sınıflama testlerinde madde havuzu özelliklerinin test uzunluğu ve sınıflama doğruluğu üzerindeki etkisi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 33(4), 888-896. doi: https://doi.org/10.16986/HUJE.2016024284
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (2nd ed.). SAGE Publications. https://books.google.com.tr/books?id=gW05DQAAQBAJ
Han, K. (2016). Maximum Likelihood Score Estimation Method With Fences for Short-Length Tests and Computerized Adaptive Tests. Applied Psychological Measurement, 40, 289 - 301. https://doi.org/10.1177/0146621616631317.
Ho, T. H., & Dodd, B. G. (2012). Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test. Applied Measurement in Education, 25(4), 305–326. https://doi.org/10.1080/08957347.2012.714686
Kalender, İ. (2009). Başarı ve yetenek kestiriminde yeni bir yaklaşım: Bilgisayar ortamında bireyselleştirilmiş testler. CITO Eğitim: Kuram ve Uygulama, 5, 40–48.
Kalender, İ. (2011). Effects of different computerized adaptive testing strategies on recovery of ability (Doctoral dissertation, Middle East Technical University). Middle East Technical University Graduate School of Natural and Applied Sciences. https://hdl.handle.net/11511/21135
Karagianni, M., & Tsaousis, I. (2025). From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning. Behavioral Sciences, 15(3), 268. doi; https://doi.org/10.3390/bs15030268
Kern, J. L., & Choe, E. (2021). Using a response time–based expected a posteriori estimator to control for differential speededness in computerized adaptive test. Applied Psychological Measurement, 45(5), 361-385. DOI: https://doi.org/10.1177/01466216211014601
Lilley, M. (2007). The development and application of computer-adaptive testing in a higher education environment (Doctoral dissertation, University of Hertfordshire).
Lin, C. H., Chen, K. P., & Tsai, C. H. (2008, December). Modeling the Examinee Ability on the Computerized Adaptive Testing Using Adaptive Network-Based Fuzzy Inference System. In 2008 IEEE Asia-Pacific Services Computing Conference (pp. 139-144). IEEE. doi: https://doi.org/10.1109/APSCC.2008.53.
Lord, F. M., & Novick, M. R. (2008). Statistical theories of mental test scores. Information Age Publishing. https://books.google.com.tr/books?id=k_wnDwAAQBAJ
Magis, D. & Raiche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48(8), 1-31. https://doi.org/10.18637/jss.v048.i08
Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19. Doi; https://doi.org/10.18637/jss.v076.c01
Piepho, H. (1993). Use of the Maximum Likelihood Method in the Analysis of Phenotypic Stability. Biometrical Journal, 35, 815-822. https://doi.org/10.1002/BIMJ.4710350709.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. URL https://www.R-project.org/.
Sands, W. A., Waters, B. K., & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation. American Psychological Association. https://doi.org/10.1037/10244-000
Stocking, M. L., & Lewis, C. (1995). A new method of controlling item exposure in computerized adaptive testing. ETS Research Report Series, 1995(2), i-29. https://doi.org/10.1002/j.2333-8504.1995.tb01660.x
Suhardi, I. (2020). Alternative item selection strategies for improving test security in computerized adaptive testing of the algorithm. Research and Evaluation in Education, 6, 32-40. https://doi.org/10.21831/reid.v6i1.30508.
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2000). Computerized adaptive testing: Theory and practice. Springer. https://doi.org/10.1007/0-306-47531-6
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. Springer. https://doi.org/10.1007/978-0-387-85461-8
Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317-331. https://doi.org/10.1177/0146621012203216
Wang, T., & Vispoel, W. (1998). Properties of Ability Estimation Methods in Computerized Adaptive Testing. Journal of Educational Measurement, 35, 109-135. https://doi.org/10.1111/J.1745-3984.1998.TB00530.X.
Weiss, D., & Şahin, A. (2024). Computerized adaptive testing: From concept to implementation. The Guilford Press. ISBN: 9781462554515
Wyse, A., & Mcbride, J. (2022). Handling Extreme Scores in Vertically Scaled Fixed-Length Computerized Adaptive Tests. Measurement: Interdisciplinary Research and Perspectives, 20, 1 - 20. https://doi.org/10.1080/15366367.2021.1977583

There are 35 citations in total.

Details

Primary Language	English
Subjects	Classical Test Theories, Item Response Theory
Journal Section	Articles
Authors	Alper Tosun 0000-0001-9715-5209 Eren Can Aybek 0000-0003-3040-2337 Alper Sinan 0000-0001-6632-5500
Publication Date	September 30, 2025
Submission Date	March 15, 2025
Acceptance Date	June 9, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 3

Cite

APA	Tosun, A., Aybek, E. C., & Sinan, A. (2025). Stabilizing Maximum Likelihood Estimation with a Damping Factor in the Initial Phase of Computerized Adaptive Testing. Journal of Measurement and Evaluation in Education and Psychology, 16(3), 124-138. https://doi.org/10.21031/epod.1658558

Download Cover Image

Article Files

Full Text