Research Article
BibTex RIS Cite

Test Eşitlemede Yerel Bağımsızlık Varsayımının İhlalinin Delta ve Bootstrap Eşitleme Hatalarına Etkisinin Çeşitli Değişkenlere Göre İncelenmesi

Year 2023, Volume: 36 Issue: 1, 23 - 50, 29.04.2023
https://doi.org/10.19171/uefad.1231069

Abstract

Bu araştırmada doğrusal, eşit yüzdelikli ve polinominal loglinear öndüzgünleşirilmiş eşit yüzdelikli test eşitleme yöntemlerinde hataların belirlenmesinde kullanılan delta ve bootstrap eşitleme hatası kestirme yöntemlerinin örneklem büyüklüğü, madde sayısı ve ikinci boyuta yüklenen madde yüzdesi değişkenleri bakımından incelenmesi amaçlanmıştır. Araştırmada eşitleme yöntemleri farklı koşullarda simülasyon verileri ile kontrollü olarak karşılaştırıldığından araştırma bir simülasyon çalışması niteliğindedir. Simülasyon çalışmalarında elde edilen simülasyon verisinin gerçek yanıtları temsil etmesi için simülasyon verileri PISA 2018 matematik sınavı birinci formundan elde edilen verilere ait dağılımlarından yararlanarak üretilmiştir. Araştırmada, örneklem büyüklüğü (250, 1000, 5000), test uzunluğu (20, 40, 60), ikinci boyuta yüklenen madde oranı (%15-%30-%50) olmak üzere 36 koşul incelenmiştir. Bu koşullar altında 2PLM’ye uyumlu iki kategorili ve 100 replikasyon ile 3600 veri seti elde edilmiştir. Araştırmamızda “random gruplar deseni” kullanılmıştır. Genel olarak örneklem büyüklüğü azaldıkça elde edilen hata miktarlarında artış olduğu en az hata içeren koşulun 5000 örneklem büyüklüğü ve testte yer alan madde sayısının 20 olması durumunda elde edildiği, en iyi performansa sahip eşitleme yönteminin doğrusal eşitleme yöntemi olduğu ve eşitleme hatalarını belirlemede kullanılan yöntem olarak da delta yöntemi olduğu sonucuna ulaşılmıştır. Ayrıca testin tek boyutlu yapısının bozulup çok boyutlu olması durumunda ikinci boyuta yüklenen madde oranı bakımından elde edilen eşitleme hatalarında sistematik bir bulguya rastlanılmadığı, araştırmada ele alınan test eşitleme yöntemleri, eşitleme hatalarını belirlemede kullanılan yöntem, örneklem büyüklüğü ve testte yer alan madde sayısı koşullarına göre değişkenlik gösterdiği tespit edilmiştir.

References

  • Aiken, L. R. (2000). Psychological testing and assesment. Allyn and Bacon.
  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1-36. https://doi.org/10.18637/jss.v074.i08
  • Baykul, Y. (2015). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Pegem Akademi.
  • Brossman, B. G., & Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481. https://doi.org/10.1177/0146621613484083
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R. P. (2016). mirtCAT: Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications. Journal of Statistical Software, 71(5), 1–39. https://doi.org/10.18637/jss.v071.i05.
  • Cook, L. L., & Eignor, D. R. (1991). An NCME module on IRT Equating methods. Educational Measurement: Issues and Practice, 10(3), 191-199.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Javonich College.
  • Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). Harper & Row.
  • Cui, Z. (2006). Two new alternative smoothing methods in equating: The cubic B-spline presmoothing method and the direct presmoothing method (Publication No. 3229654) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. https://doi.org/10.1177/0146621607300854
  • Cui, Z., & Kolen, M. J. (2009). Evaluation of two new smoothing methods in equating: The cubic B‐spline presmoothing method and the direct presmoothing method. Journal of Educational Measurement, 46(2), 135-158.
  • De Gruijter, D. N., & Leo, J. T. (2007). Statistical test theory for the behavioral sciences. Chapman and Hall/CRC.
  • DeMars, C. (2010). Item response theory. Oxford University Press.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.
  • Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series, 2010(2), i-41. https://doi.org/10.1002/j.2333-8504.2010.tb02236.x
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists multivariate applications book series. Lawrence Erlbaum Associates.
  • Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. Journal of Educational Research, 94, 275-283. https://doi.org/10.1080/00220670109598763
  • Felan, G. D. (2002). Test equating: Mean, linear, equipercentile, and ıtem response theory. Annual Meeting of the Southwest Educational Research Association, 1-24.
  • Finch, H. (2006). Comparison of the performance of varimax and promax rotations: Factor structure recovery for dichotomous items. Journal of Educational Measurement, 43, 39-52.
  • Finch, H., & French, B. F. (2019). A comparison of estimation techniques for IRT models with small samples. Applied Measurement in Education, 32(2), 77-96.
  • Finch, H., French, B. F., & Immekus, J. C. (2014). Applied psychometrics using SAS. IAP.
  • Haberman, S. J. (1974). Log-linear models for frequency tables with ordered classifications. Biometrics, 589-600.
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
  • Hanson, B.A. (1990). An investigation of methods for improving estimation of test score distributions (Research Rep. No. 90-4). American College Testing.
  • Hanson, B. A., Zeng, L., & Colton, D. A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (No. 94). American College Testing Program.
  • Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3),195-240.
  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). American Council on Education/Praeger.
  • İnci, Y. (2014). Örneklem büyüklüğünün test eşitlemeye etkisi (Yayın No. 363203) [Yüksek lisans tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246. https://doi.org/10.1111/jedm.12012
  • Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscale scores by usingout-of-scale information. Applied Psychological Measurement, 28, 407–426.
  • Kahraman, N., & Thompson, T. (2011). Relating unidimensional IRT parameters to a multidi-mensional response space: A review of two alternative projection IRT models for subscalescores. Journal of Educational Measurement,48, 146–164.
  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). American Council on Education & Praeger.
  • Kelecioğlu, H., & Öztürk Gübeş, N. (2013). Comparing linear equating and equipercentile equating methods using random groups design. International. Online Journal of Educational Sciences, 5(1), 227-241.
  • Kılıç, S. (2011). Neyin Peşindeyiz? Kutsal p değerinin mi (istatistiksel önemlilik) yoksa klinik önemliliğin mi? Journal of Mood Disorders (1), 46-48.
  • Kilmen, S. (2010). Madde Tepki Kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması (Yayın No. 279926) [Doktora tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kim, H.Y. (2014). A comparison of smoothing methods for the anchor item nonequivalent groups design (Publication No. 3638390) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global. https://doi.org/10.17077/etd.qysisl6w
  • Kim, S. H., & Cohen, A. S. (2002). A Comparison of Linking and Concurrent Calibration Under the Graded Response Model. Applied Psychological Measurement, 26(1), 25-41. https://doi.org/10.1177/0146621602026001002
  • Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests (Publication No. 10750515) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Kim, S., von Davier, A. A. , & Haberman, S. (2008). Small-sample equating using a synthetic linking function. Journalof Educational Measurement, 45(4), 325–342. https://doi.org/10.1111/j.1745-3984.2008.00068.x
  • Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
  • Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29-37.
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
  • Kolen, M. J., & Hendrickson, A. B. (2013). Scaling, norming, and equating. In APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 201-222). American Psychological Association.
  • Lee, G., & Lee, W. C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224-241. https://doi.org/10.1080/08957347.2016.1171770
  • Lee, G., Lee, W., Kolen, M. J., Park, I. –Y., Kim, D. I., & Yang, J. S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28, 681-700.
  • Lee, W. C., & Ban, J. C. (2010). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48.
  • Lee, W., & Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen, & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2; CASMA Monograph No. 2.2.; pp. 115-142). Center for Advanced Studies in Measurement and Assessment, University of Iowa. Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-monograph-2.2.pdf
  • Liu, C. (2011). A comparison of statistics for selecting smoothing parameters for loglinear presmoothing and cubic spline postsmoothing under a random groups design. (Publication No. 3461186) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23–29.
  • Livingston, S. A. (2004). Equating test scores (without IRT). Educational Testing Service.
  • Livingston, S. A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement, 46(3), 330–343. https://doi.org/10.1111/j.1745-3984.2009.00084.x
  • Lord, F. M., & Novick M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • McDonald R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
  • McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189– 216). Erlbaum.
  • Mutluer, C. (2021). Klasik test kuramına ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırması: Uluslararası öğrenci değerlendirme programı (PISA) 2012 matematik testi örneği (Yayın No. 658052) [Doktora tezi, Gazi Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25(1), 53– 67. https://doi.org/10.1177/01466216010251004.
  • Özçelik, D. A. (2010). Eğitim programları ve öğretim. Pegem Akademi.
  • Özkan, M. (2015). Teog kapsamında uygulanan matematik alt testi ile matematik mazeret alt testinin istatistiksel eşitliğinin sınanması (Yayın No. 396176) [Yüksek lisans tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/.
  • Parshall, C. G., Houghton, P. D. B., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement,32(1), 37–54. https://doi.org/10.1111/j.1745-3984.1995.tb00455.x.
  • R Core Team (2019). R: A Language and environment for statistical computing version 4.1. 1. R Foundation for Statistical Computing. https://www.R-project.org/
  • Salmaner Doğan, R., & Tan, Ş. (2022). Madde tepki kuramında eşitleme hatalarının belirlenmesinde kullanılan delta ve bootstrap yöntemlerinin çeşitli değişkenlere göre incelenmesi. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 42(2), 1053-1081. https://doi.org/10.17152/gefad.913241
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77(4), 329-352. https://doi.org/10.6092/issn.1973-2201/7066
  • Sass D. A., Schmitt T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73-103. https://doi.org/10.1080/00273170903504810
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement,42(4), 309–330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small samples under the NEAT design: A simulation study. (Publication No. 3473486) [Doctoral dissertation, University of North Carolina]. ProQuest Dissertations & Theses Global.
  • Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics ( 6th ed.). Pearson.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108-121. https://doi.org/10.1080/08957347.2016.1138956
  • Team, R. C. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: Available at: https://www. R-project. org/.
  • Uğurlu, S. (2020). Comparison of equating methods for multidimensional tests which contain items with differential item functioning (Yayın No. 656957) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239-252. https://doi.org/10.1177/014662168801200303
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The chain and post‐stratification methods for observed‐score equating: Their relationship to population invariance. Journal of Educational Measurement, 41(1), 15-32. https://doi.org/10.1111/j.1745-3984.2004.tb01156.x
  • Zhang, Z. (2022). Estimating standard errors of IRT true score equating coefficients using imputed item parameters. The Journal of Experimental Education, 90(3), 760-782. https://doi.org/10.1080/00220973.2020.1751579
  • Zhang, Z., & Zhao, M. (2019). Standard errors of IRT parameter scale transformation coefficients: Comparison of bootstrap method, delta method, and multiple imputation method. Journal of Educational Measurement, 56(2), 302-330. https://doi.org/10.1111/jedm.12210
  • Zhu, W. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69(1), 11-23. https://doi.org/10.1080/02701367.1998.10607662

Investigation of the Effect of Violation of the Local Independence Assumption in Test Equating on Delta and Bootstrap Equating Errors According to Various Variables

Year 2023, Volume: 36 Issue: 1, 23 - 50, 29.04.2023
https://doi.org/10.19171/uefad.1231069

Abstract

This study aimed to examine the delta and bootstrap equating error estimation methods used in determining errors in linear, equipercentile and polynomial loglinear presmoothing equipercentile test equating methods regarding sample size, number of items, and percentage of items loaded in the second dimension. The study is simulation research since the equating methods in the research are compared using the simulation data under different controlled conditions. The simulation data obtained in the simulation studies were produced using the distributions of the data obtained from the first form of the PISA 2018 mathematics exam to represent the actual responses. In the study, 36 conditions including sample size (250, 1000, 5000), test length (20, 40, 60), and the ratio of items loaded on the second dimension (15%, 30%, 50%) were examined. Under these conditions, 3600 data sets were obtained with two categories and 100 replications compatible with 2PLM. "Random groups design" was used in our study. It has been concluded that, in general, the number of errors increases as the number of samples decreases and that the condition with the minor error is obtained when the sample size of 5000 and the number of items in the test is 20. It has also been concluded that the equating method with the best performance in equating and determining errors is the linear equating method and the delta method, respectively. In addition, there is no systematic finding in the equating errors in terms of the rate of items loaded into the second dimension when a multidimensional structure is obtained as a result of the corruption of the one-dimensional structure of the test, and the equating errors vary according to the test equating methods discussed in the research, the method used to determine the equating errors, the sample size and the number of items in the test.

References

  • Aiken, L. R. (2000). Psychological testing and assesment. Allyn and Bacon.
  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1-36. https://doi.org/10.18637/jss.v074.i08
  • Baykul, Y. (2015). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Pegem Akademi.
  • Brossman, B. G., & Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481. https://doi.org/10.1177/0146621613484083
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R. P. (2016). mirtCAT: Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications. Journal of Statistical Software, 71(5), 1–39. https://doi.org/10.18637/jss.v071.i05.
  • Cook, L. L., & Eignor, D. R. (1991). An NCME module on IRT Equating methods. Educational Measurement: Issues and Practice, 10(3), 191-199.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Javonich College.
  • Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). Harper & Row.
  • Cui, Z. (2006). Two new alternative smoothing methods in equating: The cubic B-spline presmoothing method and the direct presmoothing method (Publication No. 3229654) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. https://doi.org/10.1177/0146621607300854
  • Cui, Z., & Kolen, M. J. (2009). Evaluation of two new smoothing methods in equating: The cubic B‐spline presmoothing method and the direct presmoothing method. Journal of Educational Measurement, 46(2), 135-158.
  • De Gruijter, D. N., & Leo, J. T. (2007). Statistical test theory for the behavioral sciences. Chapman and Hall/CRC.
  • DeMars, C. (2010). Item response theory. Oxford University Press.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.
  • Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series, 2010(2), i-41. https://doi.org/10.1002/j.2333-8504.2010.tb02236.x
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists multivariate applications book series. Lawrence Erlbaum Associates.
  • Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. Journal of Educational Research, 94, 275-283. https://doi.org/10.1080/00220670109598763
  • Felan, G. D. (2002). Test equating: Mean, linear, equipercentile, and ıtem response theory. Annual Meeting of the Southwest Educational Research Association, 1-24.
  • Finch, H. (2006). Comparison of the performance of varimax and promax rotations: Factor structure recovery for dichotomous items. Journal of Educational Measurement, 43, 39-52.
  • Finch, H., & French, B. F. (2019). A comparison of estimation techniques for IRT models with small samples. Applied Measurement in Education, 32(2), 77-96.
  • Finch, H., French, B. F., & Immekus, J. C. (2014). Applied psychometrics using SAS. IAP.
  • Haberman, S. J. (1974). Log-linear models for frequency tables with ordered classifications. Biometrics, 589-600.
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
  • Hanson, B.A. (1990). An investigation of methods for improving estimation of test score distributions (Research Rep. No. 90-4). American College Testing.
  • Hanson, B. A., Zeng, L., & Colton, D. A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (No. 94). American College Testing Program.
  • Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3),195-240.
  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). American Council on Education/Praeger.
  • İnci, Y. (2014). Örneklem büyüklüğünün test eşitlemeye etkisi (Yayın No. 363203) [Yüksek lisans tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246. https://doi.org/10.1111/jedm.12012
  • Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscale scores by usingout-of-scale information. Applied Psychological Measurement, 28, 407–426.
  • Kahraman, N., & Thompson, T. (2011). Relating unidimensional IRT parameters to a multidi-mensional response space: A review of two alternative projection IRT models for subscalescores. Journal of Educational Measurement,48, 146–164.
  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). American Council on Education & Praeger.
  • Kelecioğlu, H., & Öztürk Gübeş, N. (2013). Comparing linear equating and equipercentile equating methods using random groups design. International. Online Journal of Educational Sciences, 5(1), 227-241.
  • Kılıç, S. (2011). Neyin Peşindeyiz? Kutsal p değerinin mi (istatistiksel önemlilik) yoksa klinik önemliliğin mi? Journal of Mood Disorders (1), 46-48.
  • Kilmen, S. (2010). Madde Tepki Kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması (Yayın No. 279926) [Doktora tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kim, H.Y. (2014). A comparison of smoothing methods for the anchor item nonequivalent groups design (Publication No. 3638390) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global. https://doi.org/10.17077/etd.qysisl6w
  • Kim, S. H., & Cohen, A. S. (2002). A Comparison of Linking and Concurrent Calibration Under the Graded Response Model. Applied Psychological Measurement, 26(1), 25-41. https://doi.org/10.1177/0146621602026001002
  • Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests (Publication No. 10750515) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Kim, S., von Davier, A. A. , & Haberman, S. (2008). Small-sample equating using a synthetic linking function. Journalof Educational Measurement, 45(4), 325–342. https://doi.org/10.1111/j.1745-3984.2008.00068.x
  • Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
  • Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29-37.
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
  • Kolen, M. J., & Hendrickson, A. B. (2013). Scaling, norming, and equating. In APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 201-222). American Psychological Association.
  • Lee, G., & Lee, W. C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224-241. https://doi.org/10.1080/08957347.2016.1171770
  • Lee, G., Lee, W., Kolen, M. J., Park, I. –Y., Kim, D. I., & Yang, J. S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28, 681-700.
  • Lee, W. C., & Ban, J. C. (2010). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48.
  • Lee, W., & Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen, & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2; CASMA Monograph No. 2.2.; pp. 115-142). Center for Advanced Studies in Measurement and Assessment, University of Iowa. Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-monograph-2.2.pdf
  • Liu, C. (2011). A comparison of statistics for selecting smoothing parameters for loglinear presmoothing and cubic spline postsmoothing under a random groups design. (Publication No. 3461186) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23–29.
  • Livingston, S. A. (2004). Equating test scores (without IRT). Educational Testing Service.
  • Livingston, S. A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement, 46(3), 330–343. https://doi.org/10.1111/j.1745-3984.2009.00084.x
  • Lord, F. M., & Novick M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • McDonald R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
  • McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189– 216). Erlbaum.
  • Mutluer, C. (2021). Klasik test kuramına ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırması: Uluslararası öğrenci değerlendirme programı (PISA) 2012 matematik testi örneği (Yayın No. 658052) [Doktora tezi, Gazi Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25(1), 53– 67. https://doi.org/10.1177/01466216010251004.
  • Özçelik, D. A. (2010). Eğitim programları ve öğretim. Pegem Akademi.
  • Özkan, M. (2015). Teog kapsamında uygulanan matematik alt testi ile matematik mazeret alt testinin istatistiksel eşitliğinin sınanması (Yayın No. 396176) [Yüksek lisans tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/.
  • Parshall, C. G., Houghton, P. D. B., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement,32(1), 37–54. https://doi.org/10.1111/j.1745-3984.1995.tb00455.x.
  • R Core Team (2019). R: A Language and environment for statistical computing version 4.1. 1. R Foundation for Statistical Computing. https://www.R-project.org/
  • Salmaner Doğan, R., & Tan, Ş. (2022). Madde tepki kuramında eşitleme hatalarının belirlenmesinde kullanılan delta ve bootstrap yöntemlerinin çeşitli değişkenlere göre incelenmesi. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 42(2), 1053-1081. https://doi.org/10.17152/gefad.913241
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77(4), 329-352. https://doi.org/10.6092/issn.1973-2201/7066
  • Sass D. A., Schmitt T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73-103. https://doi.org/10.1080/00273170903504810
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement,42(4), 309–330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small samples under the NEAT design: A simulation study. (Publication No. 3473486) [Doctoral dissertation, University of North Carolina]. ProQuest Dissertations & Theses Global.
  • Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics ( 6th ed.). Pearson.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108-121. https://doi.org/10.1080/08957347.2016.1138956
  • Team, R. C. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: Available at: https://www. R-project. org/.
  • Uğurlu, S. (2020). Comparison of equating methods for multidimensional tests which contain items with differential item functioning (Yayın No. 656957) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239-252. https://doi.org/10.1177/014662168801200303
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The chain and post‐stratification methods for observed‐score equating: Their relationship to population invariance. Journal of Educational Measurement, 41(1), 15-32. https://doi.org/10.1111/j.1745-3984.2004.tb01156.x
  • Zhang, Z. (2022). Estimating standard errors of IRT true score equating coefficients using imputed item parameters. The Journal of Experimental Education, 90(3), 760-782. https://doi.org/10.1080/00220973.2020.1751579
  • Zhang, Z., & Zhao, M. (2019). Standard errors of IRT parameter scale transformation coefficients: Comparison of bootstrap method, delta method, and multiple imputation method. Journal of Educational Measurement, 56(2), 302-330. https://doi.org/10.1111/jedm.12210
  • Zhu, W. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69(1), 11-23. https://doi.org/10.1080/02701367.1998.10607662

Investigation of the Effect of Violation of the Local Independence Assumption in Test Equating on Delta and Bootstrap Equating Errors According to Various Variables

Year 2023, Volume: 36 Issue: 1, 23 - 50, 29.04.2023
https://doi.org/10.19171/uefad.1231069

Abstract

This study aimed to examine the delta and bootstrap equating error estimation methods used in determining errors in linear, equipercentile and polynomial loglinear presmoothing equipercentile test equating methods regarding sample size, number of items, and percentage of items loaded in the second dimension. The study is simulation research since the equating methods in the research are compared using the simulation data under different controlled conditions. The simulation data obtained in the simulation studies were produced using the distributions of the data obtained from the first form of the PISA 2018 mathematics exam to represent the actual responses. In the study, 36 conditions including sample size (250, 1000, 5000), test length (20, 40, 60), and the ratio of items loaded on the second dimension (15%, 30%, 50%) were examined. Under these conditions, 3600 data sets were obtained with two categories and 100 replications compatible with 2PLM. "Random groups design" was used in our study. It has been concluded that, in general, the number of errors increases as the number of samples decreases and that the condition with the minor error is obtained when the sample size of 5000 and the number of items in the test is 20. It has also been concluded that the equating method with the best performance in equating and determining errors is the linear equating method and the delta method, respectively. In addition, there is no systematic finding in the equating errors in terms of the rate of items loaded into the second dimension when a multidimensional structure is obtained as a result of the corruption of the one-dimensional structure of the test, and the equating errors vary according to the test equating methods discussed in the research, the method used to determine the equating errors, the sample size and the number of items in the test.

References

  • Aiken, L. R. (2000). Psychological testing and assesment. Allyn and Bacon.
  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1-36. https://doi.org/10.18637/jss.v074.i08
  • Baykul, Y. (2015). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Pegem Akademi.
  • Brossman, B. G., & Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481. https://doi.org/10.1177/0146621613484083
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R. P. (2016). mirtCAT: Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications. Journal of Statistical Software, 71(5), 1–39. https://doi.org/10.18637/jss.v071.i05.
  • Cook, L. L., & Eignor, D. R. (1991). An NCME module on IRT Equating methods. Educational Measurement: Issues and Practice, 10(3), 191-199.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Javonich College.
  • Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). Harper & Row.
  • Cui, Z. (2006). Two new alternative smoothing methods in equating: The cubic B-spline presmoothing method and the direct presmoothing method (Publication No. 3229654) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. https://doi.org/10.1177/0146621607300854
  • Cui, Z., & Kolen, M. J. (2009). Evaluation of two new smoothing methods in equating: The cubic B‐spline presmoothing method and the direct presmoothing method. Journal of Educational Measurement, 46(2), 135-158.
  • De Gruijter, D. N., & Leo, J. T. (2007). Statistical test theory for the behavioral sciences. Chapman and Hall/CRC.
  • DeMars, C. (2010). Item response theory. Oxford University Press.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.
  • Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series, 2010(2), i-41. https://doi.org/10.1002/j.2333-8504.2010.tb02236.x
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists multivariate applications book series. Lawrence Erlbaum Associates.
  • Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. Journal of Educational Research, 94, 275-283. https://doi.org/10.1080/00220670109598763
  • Felan, G. D. (2002). Test equating: Mean, linear, equipercentile, and ıtem response theory. Annual Meeting of the Southwest Educational Research Association, 1-24.
  • Finch, H. (2006). Comparison of the performance of varimax and promax rotations: Factor structure recovery for dichotomous items. Journal of Educational Measurement, 43, 39-52.
  • Finch, H., & French, B. F. (2019). A comparison of estimation techniques for IRT models with small samples. Applied Measurement in Education, 32(2), 77-96.
  • Finch, H., French, B. F., & Immekus, J. C. (2014). Applied psychometrics using SAS. IAP.
  • Haberman, S. J. (1974). Log-linear models for frequency tables with ordered classifications. Biometrics, 589-600.
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
  • Hanson, B.A. (1990). An investigation of methods for improving estimation of test score distributions (Research Rep. No. 90-4). American College Testing.
  • Hanson, B. A., Zeng, L., & Colton, D. A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (No. 94). American College Testing Program.
  • Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3),195-240.
  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). American Council on Education/Praeger.
  • İnci, Y. (2014). Örneklem büyüklüğünün test eşitlemeye etkisi (Yayın No. 363203) [Yüksek lisans tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246. https://doi.org/10.1111/jedm.12012
  • Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscale scores by usingout-of-scale information. Applied Psychological Measurement, 28, 407–426.
  • Kahraman, N., & Thompson, T. (2011). Relating unidimensional IRT parameters to a multidi-mensional response space: A review of two alternative projection IRT models for subscalescores. Journal of Educational Measurement,48, 146–164.
  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). American Council on Education & Praeger.
  • Kelecioğlu, H., & Öztürk Gübeş, N. (2013). Comparing linear equating and equipercentile equating methods using random groups design. International. Online Journal of Educational Sciences, 5(1), 227-241.
  • Kılıç, S. (2011). Neyin Peşindeyiz? Kutsal p değerinin mi (istatistiksel önemlilik) yoksa klinik önemliliğin mi? Journal of Mood Disorders (1), 46-48.
  • Kilmen, S. (2010). Madde Tepki Kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması (Yayın No. 279926) [Doktora tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kim, H.Y. (2014). A comparison of smoothing methods for the anchor item nonequivalent groups design (Publication No. 3638390) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global. https://doi.org/10.17077/etd.qysisl6w
  • Kim, S. H., & Cohen, A. S. (2002). A Comparison of Linking and Concurrent Calibration Under the Graded Response Model. Applied Psychological Measurement, 26(1), 25-41. https://doi.org/10.1177/0146621602026001002
  • Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests (Publication No. 10750515) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Kim, S., von Davier, A. A. , & Haberman, S. (2008). Small-sample equating using a synthetic linking function. Journalof Educational Measurement, 45(4), 325–342. https://doi.org/10.1111/j.1745-3984.2008.00068.x
  • Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
  • Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29-37.
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
  • Kolen, M. J., & Hendrickson, A. B. (2013). Scaling, norming, and equating. In APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 201-222). American Psychological Association.
  • Lee, G., & Lee, W. C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224-241. https://doi.org/10.1080/08957347.2016.1171770
  • Lee, G., Lee, W., Kolen, M. J., Park, I. –Y., Kim, D. I., & Yang, J. S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28, 681-700.
  • Lee, W. C., & Ban, J. C. (2010). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48.
  • Lee, W., & Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen, & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2; CASMA Monograph No. 2.2.; pp. 115-142). Center for Advanced Studies in Measurement and Assessment, University of Iowa. Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-monograph-2.2.pdf
  • Liu, C. (2011). A comparison of statistics for selecting smoothing parameters for loglinear presmoothing and cubic spline postsmoothing under a random groups design. (Publication No. 3461186) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations & Theses Global.
  • Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23–29.
  • Livingston, S. A. (2004). Equating test scores (without IRT). Educational Testing Service.
  • Livingston, S. A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement, 46(3), 330–343. https://doi.org/10.1111/j.1745-3984.2009.00084.x
  • Lord, F. M., & Novick M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • McDonald R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
  • McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189– 216). Erlbaum.
  • Mutluer, C. (2021). Klasik test kuramına ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırması: Uluslararası öğrenci değerlendirme programı (PISA) 2012 matematik testi örneği (Yayın No. 658052) [Doktora tezi, Gazi Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25(1), 53– 67. https://doi.org/10.1177/01466216010251004.
  • Özçelik, D. A. (2010). Eğitim programları ve öğretim. Pegem Akademi.
  • Özkan, M. (2015). Teog kapsamında uygulanan matematik alt testi ile matematik mazeret alt testinin istatistiksel eşitliğinin sınanması (Yayın No. 396176) [Yüksek lisans tezi, Ankara Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/.
  • Parshall, C. G., Houghton, P. D. B., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement,32(1), 37–54. https://doi.org/10.1111/j.1745-3984.1995.tb00455.x.
  • R Core Team (2019). R: A Language and environment for statistical computing version 4.1. 1. R Foundation for Statistical Computing. https://www.R-project.org/
  • Salmaner Doğan, R., & Tan, Ş. (2022). Madde tepki kuramında eşitleme hatalarının belirlenmesinde kullanılan delta ve bootstrap yöntemlerinin çeşitli değişkenlere göre incelenmesi. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 42(2), 1053-1081. https://doi.org/10.17152/gefad.913241
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77(4), 329-352. https://doi.org/10.6092/issn.1973-2201/7066
  • Sass D. A., Schmitt T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73-103. https://doi.org/10.1080/00273170903504810
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement,42(4), 309–330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small samples under the NEAT design: A simulation study. (Publication No. 3473486) [Doctoral dissertation, University of North Carolina]. ProQuest Dissertations & Theses Global.
  • Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics ( 6th ed.). Pearson.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108-121. https://doi.org/10.1080/08957347.2016.1138956
  • Team, R. C. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: Available at: https://www. R-project. org/.
  • Uğurlu, S. (2020). Comparison of equating methods for multidimensional tests which contain items with differential item functioning (Yayın No. 656957) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239-252. https://doi.org/10.1177/014662168801200303
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The chain and post‐stratification methods for observed‐score equating: Their relationship to population invariance. Journal of Educational Measurement, 41(1), 15-32. https://doi.org/10.1111/j.1745-3984.2004.tb01156.x
  • Zhang, Z. (2022). Estimating standard errors of IRT true score equating coefficients using imputed item parameters. The Journal of Experimental Education, 90(3), 760-782. https://doi.org/10.1080/00220973.2020.1751579
  • Zhang, Z., & Zhao, M. (2019). Standard errors of IRT parameter scale transformation coefficients: Comparison of bootstrap method, delta method, and multiple imputation method. Journal of Educational Measurement, 56(2), 302-330. https://doi.org/10.1111/jedm.12210
  • Zhu, W. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69(1), 11-23. https://doi.org/10.1080/02701367.1998.10607662
There are 76 citations in total.

Details

Primary Language Turkish
Subjects Other Fields of Education
Journal Section Articles
Authors

Mehmet Fatih Doğuyurt 0000-0001-9206-3321

Şeref Tan 0000-0002-9892-3369

Early Pub Date April 29, 2023
Publication Date April 29, 2023
Submission Date January 8, 2023
Published in Issue Year 2023 Volume: 36 Issue: 1

Cite

APA Doğuyurt, M. F., & Tan, Ş. (2023). Test Eşitlemede Yerel Bağımsızlık Varsayımının İhlalinin Delta ve Bootstrap Eşitleme Hatalarına Etkisinin Çeşitli Değişkenlere Göre İncelenmesi. Uludağ Üniversitesi Eğitim Fakültesi Dergisi, 36(1), 23-50. https://doi.org/10.19171/uefad.1231069