Research Article
BibTex RIS Cite

Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması

Year 2024, , 9 - 27, 31.08.2024
https://doi.org/10.52597/buje.1329338

Abstract

Bu araştırmada çeşitli simülasyon koşullarında çok aşamalı testlerin performansları, hata kareler ortalamasının karekökü (Root Mean Square Error-RMSE), tahminin standart hatası (Standard Error of Estimate-SEE), yanlılık (BIAS) ve ortalama mutlak hata (Mean Absolute Error-MAE) değerlendirme kriterleri açısından karşılaştırılmıştır. Test simülasyonunda panel deseni (1-3, 1-2-3, 1-3-3), modül uzunluğu (6, 12, 18), örneklem büyüklüğü (300, 1000, 3000), yetenek parametresi kestirim yöntemi (beklenen sonsal dağılım [Expected a Posteriori-EAP], maksimum sonsal dağılım [Maximum a Posteriori-MAP] ve sınırlı en çok olabilirlik kestirimi [Maximum Likelihood Estimation with Fences-MLEF]) olmak üzere 81 koşul (3x3x3x3) belirlenmiştir. Araştırma sonucunda RMSE ile MAE değerlerinin genellikle benzer sonuçlar verdiği ve modül uzunluğu arttıkça ölçme doğruluğunun da arttığı bulunmuştur. Ayrıca RMSE, SEE ve MAE’nin 1-3 panel deseninde en yüksek, 1-3-3 deseninde ise en düşük değerleri aldığı saptanmıştır. Araştırmacılara 1-3-3 panel deseninde, en az 12 modül uzunluğunda ve EAP yöntemi kullanarak çalışma yapmaları önerilmektedir.

References

  • Armstrong, R. D., & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. Research Report 02-07. Newton, PA: School Admission Council.
  • Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
  • Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
  • Belov, D. I., & Armstrong, R. D. (2008). A monte carlo approach to the design, assembly, and evaluation of multistage adaptive tests. Applied Psychological Measurement, 32(2), 119–137. https://doi.org/10.1177/0146621606297308
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
  • Boztunç Öztürk, N. (2019). How the length and characteristics of routing module affect ability estimation in ca-MST? Universal Journal of Educational Research, 7(1), 164–170. https://doi.org/10.13189/ujer.2019.070121
  • Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5–20. https://doi.org/10.1177/0013164406288162
  • Büyükkıdık, S. & Ayva Yörü, F. G. (2022, Eylül). Çok aşamalı testlerin panel deseni, modül uzunluğu, örneklem büyüklüğü ve yetenek kestirim yöntemleri açısından karşılaştırılması [Sözlü bildiri]. 8. Uluslararası Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, İzmir.
  • Chen, L. Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partical credit model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. baskı). Routledge.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
  • Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework. [Yayımlanmamış doktora tezi]. The University of North Carolina.
  • Dallas, A., Wang, X., Furter, R., & Luecht, R. M. (2012, Nisan). Item pool size, targeted item writing and panel replication strategies for a 1-3-3 multistage test design [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
  • Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE® revised general test. ETS Research Report Series, 2011(2), i–44. https://www.ets.org/research/policy_research_reports/publications/report/2011/itjm.html
  • Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335–356. https://doi.org/10.1177/0146621603256804
  • De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
  • Doğruöz, E. (2018). Bireyselleştirilmiş çok aşamalı testlerin test birleştirme yöntemlerine göre incelenmesi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Drasgow, F., & Mattern, K. (2006). New tests and new items: Opportunities and issues. D. Bartram & R. Hambleton (Haz.), Computer based testing and internet: Issues and advances içinde (s. 59–77). Educational testing service: London.
  • Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Education, 25(2), 118–141. https://doi.org/10.1080/08957347.2012.660363
  • Erdem Kara, B. (2019). Değişen madde fonksiyonu gösteren madde oranının bireyselleştirilmiş bilgisayarlı ve çok aşamalı testler üzerindeki etkisi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Ertaş Polat, F. G. (2022). Çok aşamalı bireye uyarlanmış testlerde farklı koşullardan elde edilen yetenek kestirimlerinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Hambleton, R. K., & Xing, D. (2006). Optimal and nonoptimal computerbased test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221–239 https://doi.org/10.1207/s15324818ame1903_4
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
  • Han, K. C. T., & Guo, F. (2014). Multistage testing by shaping modules on the fly. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 119–133). Chapman and Hall/CRC.
  • Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37(8), 666–668. https://doi.org/10.1177/0146621613499639
  • Hembry, I. F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
  • International Association for the Evaluation of Educational Achievement. (2021). TIMSS 2019 international database [Veri seti]. TIMSS & PIRLS International Study Center. https://timss2019.org/international-database/?_gl=1*1gitpgj*_ga*OTg0NzE0MzYuMTY0NTk5NzE4MQ..*_ga_L2FMXN42HR*MTY0Njc3OTQ2OC41LjAuMTY0Njc3OTQ2OC4w
  • Jodoin, M. G. (2003). Psychometric properties of several computer-based test designs with ideal and constrained item pool [Yayımlanmamış doktora tezi]. University of Massachusetts-Amherst.
  • Jodoin, M. G., Zenisky, A., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test design for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203–220. http://doi.org/10.1207/s15324818ame1903_3
  • Keng, L., & Dodd, B.G. (2009, Nisan). A comparison of the performance of testlet based computer adaptive tests and multistage tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), San Diego, CA.
  • Kim, H., & Plake, B. S. (1993, Nisan). Monte carlo simulation comparison of two-stage testing and computerized adaptive testing [Sözlü bildiri]. National Council on Measurement in Education (NCME), Atlanta, GA.
  • Kim, J., Chung, H., Dodd, B. G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574–588. https://doi.org/10.1177/0013164411428977
  • Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70–79. https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12063
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, (NJ): Lawrence Erlbaum Associates.
  • Luecht, R. M. (2000, Nisan). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), New Orleans, LA.
  • Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189–202. https://doi.org/10.1207/s15324818ame1903_2
  • Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing. (Rapor No. RR-2011-12). College Board, New York. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x
  • Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
  • Magis, D., Yan, D. & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
  • Mason, B. J., Patry, M., & Bernstein, D. J. (2001). An examination of equivalence between non adaptive computer-based test and traditional testing. Journal of Educational Computing Research 24(1), 29–39. https://doi.org/10.2190/9EPM-B14R-XQWT-WVNL
  • Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19(3), 185–187. https://doi.org/10.1207/s15324818ame1903_1
  • Milli Eğitim Bakanlığı [MEB]. (2016). TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. sınıflar. https://timss.meb.gov.tr/meb_iys_dosyalar/2022_03/07135609_TIMSS_2015_Ulusal_Rapor.pdf
  • Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70(350), 351–356. https://doi.org/10.2307/2285821
  • Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing [Yayımlanmamış doktora tezi). The University of Massachusetts Amherst. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=4283&context=dissertations_1
  • Reese, L. M., Schnipke, D. L., & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
  • Samejima, F. (1968). Estimation of latent ability using a response patterns of graded scores. Psychometrika Monograph, 17, i–169.https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
  • Sarı, H. İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388–406. https://doi.org/10.21031/epod.280183
  • Sari, H. İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Yayımlanmamış doktora tezi]. University of Florida.
  • Schnipke, D. L., & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing. (Rapor No: 97–01). ERIC Database.
  • Stark, S., & Chernyshenko, O. S. (2006). Multistage testing: Widely or narrowly applicable? Applied Measurement in Education, 19(3), 257–260. https://doi.org/10.1207/s15324818ame1903_6
  • Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 9(2), 191–206. https://doi.org/10.21449/ijate.676947
  • Şahin, M. G., & Boztunç Öztürk, N. (2019). Analyzing the maximum likelihood score estimation method with fences in ca-MST. International Journal of Assessment Tools in Education 6(4), 555–567. https://dx.doi.org/10.21449/ijate.634091
  • van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. W. J. van der Linden, & C. A. W. Glas (Haz.), Elements of adaptive testing içinde (s. 3–30). New York: Springer.
  • Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Yayımlanmamış doktora tezi]. Michigan State University.
  • Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, S. Y. (2004). Web-based assesment and test analyses (WATA) Q3 system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71. https://doi.org/10.1111/j.1365-2729.2004.00066.x
  • Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109–135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
  • Wang, X., Fluegge, L., & Luecht, R. M. (2012, Nisan). A large-scale comparative study of the accuracy and efficiency of ca-MST panel design configurations [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
  • Warm, A. W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
  • Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. Academic Press: New York.
  • Weissman, A., Belov, D., & Armstrong, R. (2007). Information-based versus number-correct routing in multistage classification tests. (LSAC Research Report No:07–05). Newtown, PA.
  • Yan, D., Lewis, C., & von Davier, A. (2014a). Overview of computerized multistage tests. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 3–20). London, England: Chapman & Hall.
  • Yan, D., von Davier, A. A., & Lewis, C. (Haz.). (2014b). Computerized multistage testing: Theory and applications. CRC Press.
  • Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Yayımlanmamış doktora tezi]. Michigan State University.
  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment [Yayımlanmamış doktora tezi]. University of Massachusetts Amherst. https://scholarworks.umass.edu/dissertations/AAI3136800 adresinden erişilmiştir.
  • Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. Yan, D., Von Davier, A., & Lewis, C. (Haz.), Computerized multistage testing: Theory and applications, içinde (s. 21–36). London, England: Chapman & Hall.
  • Zheng, Y. & Chang, H. H. (2014). Multistage testing, on-the-fly multistage testing, and beyond. Y. Cheng, & H. H. Chang (Haz.), Advancing methodologies to support both summative and formative assessments içinde (s. 21–40). Charlotte, NC: Information Age Publishing.
  • Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104–118. https://doi.org/10.1177/0146621614544519
  • Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage adaptive testing for a large-scale classification test: Design, heuristic assembly, and comparison with other testing modes. ACT Research Report Series, ACT.

Comparing Multi-Stage Tests under Different Conditions in Terms of Panel Design, Module Length, Sample Size and Ability Parameter Estimation Methods

Year 2024, , 9 - 27, 31.08.2024
https://doi.org/10.52597/buje.1329338

Abstract

In this research, the performances of multi-stage tests under various simulation conditions have been compared in terms of evaluation criteria, including root mean square error (RMSE), standard error of estimate (SEE), bias, and mean absolute error (MAE). In the test simulation, 81 conditions (3x3x3x3) have been determined, including panel design (1-3, 1-2-3, 1-3-3), module length (6, 12, 18), sample size (300, 1000, 3000), and ability parameter estimation methods (expected a posteriori [EAP], maximum a posteriori [MAP], and maximum likelihood estimation with fences [MLEF]). The research findings indicate that RMSE and MAE values generally produce similar results, and measurement accuracy tends to increase with the lengthening of the module. Additionally, it was observed that RMSE, SEE, and MAE have the highest values in the 1-3 panel design and the lowest values in the 1-3-3 design. Researchers are recommended to conduct their studies using a 1-3-3 panel design, with a minimum module length of 12, and employing the EAP method.

References

  • Armstrong, R. D., & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. Research Report 02-07. Newton, PA: School Admission Council.
  • Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
  • Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
  • Belov, D. I., & Armstrong, R. D. (2008). A monte carlo approach to the design, assembly, and evaluation of multistage adaptive tests. Applied Psychological Measurement, 32(2), 119–137. https://doi.org/10.1177/0146621606297308
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
  • Boztunç Öztürk, N. (2019). How the length and characteristics of routing module affect ability estimation in ca-MST? Universal Journal of Educational Research, 7(1), 164–170. https://doi.org/10.13189/ujer.2019.070121
  • Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5–20. https://doi.org/10.1177/0013164406288162
  • Büyükkıdık, S. & Ayva Yörü, F. G. (2022, Eylül). Çok aşamalı testlerin panel deseni, modül uzunluğu, örneklem büyüklüğü ve yetenek kestirim yöntemleri açısından karşılaştırılması [Sözlü bildiri]. 8. Uluslararası Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, İzmir.
  • Chen, L. Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partical credit model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. baskı). Routledge.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
  • Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework. [Yayımlanmamış doktora tezi]. The University of North Carolina.
  • Dallas, A., Wang, X., Furter, R., & Luecht, R. M. (2012, Nisan). Item pool size, targeted item writing and panel replication strategies for a 1-3-3 multistage test design [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
  • Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE® revised general test. ETS Research Report Series, 2011(2), i–44. https://www.ets.org/research/policy_research_reports/publications/report/2011/itjm.html
  • Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335–356. https://doi.org/10.1177/0146621603256804
  • De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
  • Doğruöz, E. (2018). Bireyselleştirilmiş çok aşamalı testlerin test birleştirme yöntemlerine göre incelenmesi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Drasgow, F., & Mattern, K. (2006). New tests and new items: Opportunities and issues. D. Bartram & R. Hambleton (Haz.), Computer based testing and internet: Issues and advances içinde (s. 59–77). Educational testing service: London.
  • Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Education, 25(2), 118–141. https://doi.org/10.1080/08957347.2012.660363
  • Erdem Kara, B. (2019). Değişen madde fonksiyonu gösteren madde oranının bireyselleştirilmiş bilgisayarlı ve çok aşamalı testler üzerindeki etkisi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Ertaş Polat, F. G. (2022). Çok aşamalı bireye uyarlanmış testlerde farklı koşullardan elde edilen yetenek kestirimlerinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
  • Hambleton, R. K., & Xing, D. (2006). Optimal and nonoptimal computerbased test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221–239 https://doi.org/10.1207/s15324818ame1903_4
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
  • Han, K. C. T., & Guo, F. (2014). Multistage testing by shaping modules on the fly. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 119–133). Chapman and Hall/CRC.
  • Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37(8), 666–668. https://doi.org/10.1177/0146621613499639
  • Hembry, I. F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
  • International Association for the Evaluation of Educational Achievement. (2021). TIMSS 2019 international database [Veri seti]. TIMSS & PIRLS International Study Center. https://timss2019.org/international-database/?_gl=1*1gitpgj*_ga*OTg0NzE0MzYuMTY0NTk5NzE4MQ..*_ga_L2FMXN42HR*MTY0Njc3OTQ2OC41LjAuMTY0Njc3OTQ2OC4w
  • Jodoin, M. G. (2003). Psychometric properties of several computer-based test designs with ideal and constrained item pool [Yayımlanmamış doktora tezi]. University of Massachusetts-Amherst.
  • Jodoin, M. G., Zenisky, A., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test design for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203–220. http://doi.org/10.1207/s15324818ame1903_3
  • Keng, L., & Dodd, B.G. (2009, Nisan). A comparison of the performance of testlet based computer adaptive tests and multistage tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), San Diego, CA.
  • Kim, H., & Plake, B. S. (1993, Nisan). Monte carlo simulation comparison of two-stage testing and computerized adaptive testing [Sözlü bildiri]. National Council on Measurement in Education (NCME), Atlanta, GA.
  • Kim, J., Chung, H., Dodd, B. G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574–588. https://doi.org/10.1177/0013164411428977
  • Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70–79. https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12063
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, (NJ): Lawrence Erlbaum Associates.
  • Luecht, R. M. (2000, Nisan). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), New Orleans, LA.
  • Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189–202. https://doi.org/10.1207/s15324818ame1903_2
  • Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing. (Rapor No. RR-2011-12). College Board, New York. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x
  • Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
  • Magis, D., Yan, D. & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
  • Mason, B. J., Patry, M., & Bernstein, D. J. (2001). An examination of equivalence between non adaptive computer-based test and traditional testing. Journal of Educational Computing Research 24(1), 29–39. https://doi.org/10.2190/9EPM-B14R-XQWT-WVNL
  • Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19(3), 185–187. https://doi.org/10.1207/s15324818ame1903_1
  • Milli Eğitim Bakanlığı [MEB]. (2016). TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. sınıflar. https://timss.meb.gov.tr/meb_iys_dosyalar/2022_03/07135609_TIMSS_2015_Ulusal_Rapor.pdf
  • Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70(350), 351–356. https://doi.org/10.2307/2285821
  • Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
  • Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing [Yayımlanmamış doktora tezi). The University of Massachusetts Amherst. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=4283&context=dissertations_1
  • Reese, L. M., Schnipke, D. L., & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
  • Samejima, F. (1968). Estimation of latent ability using a response patterns of graded scores. Psychometrika Monograph, 17, i–169.https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
  • Sarı, H. İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388–406. https://doi.org/10.21031/epod.280183
  • Sari, H. İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Yayımlanmamış doktora tezi]. University of Florida.
  • Schnipke, D. L., & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing. (Rapor No: 97–01). ERIC Database.
  • Stark, S., & Chernyshenko, O. S. (2006). Multistage testing: Widely or narrowly applicable? Applied Measurement in Education, 19(3), 257–260. https://doi.org/10.1207/s15324818ame1903_6
  • Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 9(2), 191–206. https://doi.org/10.21449/ijate.676947
  • Şahin, M. G., & Boztunç Öztürk, N. (2019). Analyzing the maximum likelihood score estimation method with fences in ca-MST. International Journal of Assessment Tools in Education 6(4), 555–567. https://dx.doi.org/10.21449/ijate.634091
  • van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. W. J. van der Linden, & C. A. W. Glas (Haz.), Elements of adaptive testing içinde (s. 3–30). New York: Springer.
  • Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Yayımlanmamış doktora tezi]. Michigan State University.
  • Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, S. Y. (2004). Web-based assesment and test analyses (WATA) Q3 system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71. https://doi.org/10.1111/j.1365-2729.2004.00066.x
  • Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109–135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
  • Wang, X., Fluegge, L., & Luecht, R. M. (2012, Nisan). A large-scale comparative study of the accuracy and efficiency of ca-MST panel design configurations [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
  • Warm, A. W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
  • Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. Academic Press: New York.
  • Weissman, A., Belov, D., & Armstrong, R. (2007). Information-based versus number-correct routing in multistage classification tests. (LSAC Research Report No:07–05). Newtown, PA.
  • Yan, D., Lewis, C., & von Davier, A. (2014a). Overview of computerized multistage tests. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 3–20). London, England: Chapman & Hall.
  • Yan, D., von Davier, A. A., & Lewis, C. (Haz.). (2014b). Computerized multistage testing: Theory and applications. CRC Press.
  • Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Yayımlanmamış doktora tezi]. Michigan State University.
  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment [Yayımlanmamış doktora tezi]. University of Massachusetts Amherst. https://scholarworks.umass.edu/dissertations/AAI3136800 adresinden erişilmiştir.
  • Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. Yan, D., Von Davier, A., & Lewis, C. (Haz.), Computerized multistage testing: Theory and applications, içinde (s. 21–36). London, England: Chapman & Hall.
  • Zheng, Y. & Chang, H. H. (2014). Multistage testing, on-the-fly multistage testing, and beyond. Y. Cheng, & H. H. Chang (Haz.), Advancing methodologies to support both summative and formative assessments içinde (s. 21–40). Charlotte, NC: Information Age Publishing.
  • Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104–118. https://doi.org/10.1177/0146621614544519
  • Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage adaptive testing for a large-scale classification test: Design, heuristic assembly, and comparison with other testing modes. ACT Research Report Series, ACT.
There are 71 citations in total.

Details

Primary Language Turkish
Subjects Measurement and Evaluation in Education (Other)
Journal Section Original Articles
Authors

Serap Büyükkıdık 0000-0003-4335-2949

Fatma Gökçen Ayva Yörü 0000-0002-4555-1987

Publication Date August 31, 2024
Published in Issue Year 2024

Cite

APA Büyükkıdık, S., & Ayva Yörü, F. G. (2024). Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması. Bogazici University Journal of Education, 41(2), 9-27. https://doi.org/10.52597/buje.1329338