Research Article
BibTex RIS Cite

Madde Takımı İçeren Testlerde Ölçek Dönüştürme ve Test Eşitleme Yöntemlerinin İncelenmesi

Year 2025, Volume: 33 Issue: 3, 658 - 671, 25.07.2025
https://doi.org/10.24106/kefdergi.1750267

Abstract

Çalışmanın amacı: Bu çalışmada madde takımları içeren testlerde farklı madde tepki kuramı modelleri ve örneklem büyüklükleri koşullarına dayalı test eşitleme performansları incelenmiştir.
Materyal ve Yöntem: Bu amaçla araştırmada, eTIMSS 2019 bilim testine ait veriler kullanılarak, Tek Boyutlu Madde Tepki Kuramı (TBMTK), Madde Takımı Tepki Kuramı (MTTK) ve bifaktör modelleri altında farklı örneklem büyüklüklerinde yapılan ölçek dönüştürme yöntemleri ve test eşitleme sonuçları incelenmiştir. Denk olmayan gruplarda ortak madde deseni altında ortalama-sigma ve Stocking-Lord ölçek dönüştürme yöntemleri ve gerçek ile gözlenen puana dayalı eşitleme yöntemleri kullanılmıştır. Değerlendirme ölçütleri olarak RMSE ve BIAS değerleri hesaplanmıştır.
Bulgular: Genel olarak düşük düzeyde madde takımı etkisinin olduğu bilim testinde TBMTK modeline dayalı ölçek dönüştürme ve bifaktör modele dayalı test eşitleme sonuçlarının daha düşük hata değerleri ürettiği görülmüştür. Ayrıca örneklem büyüklüğü arttıkça genel olarak parametre kestirimlerinin hata değerlerinin azaldığı gözlemlenmiş olup özellikle MTTK ile çalışıldığında örneklem sayısının 500’den fazla olması gerektiği sonucuna varılmıştır.
Önemli Vurgular: Madde takımı etkisi göz önüne alındığında, bifaktör model daha doğru ve kararlı sonuçlar sunarak adil ve güvenilir puan eşitlemesi yapılmasını sağlamaktadır. Gerçek veri seti kullanılarak gerçekleştirilen bu çalışma ile madde takımları içeren testlerde madde takımı etkisinin pratikte nasıl bir etki yarattığı somut bir şekilde ortaya konulmuştur.

References

  • Asriadi M., H. (2023). Equating of standardized science subjects tests using various methods: which is the most profitable? Thabiea : Journal of Natural Science Teaching, 6(1), 51-64.
  • Atalay Kabasakal, K. (2014). Değişen madde fonksiyonunun test eşitlemeye etkisi [Doktora tezi]. Hacettepe Üniversitesi.
  • Babcock, B., & Hodge, K. J. (2020). Rasch versus classical equating in the context of small sample sizes. Educational and Psychological Measurement, 80(3), 499-521. https://doi.org/10.1177/0013164419878
  • Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162 https://doi.org/10.1111/j.1745-3984.1991.tb00350.x
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168. https://doi.org/10.1007/BF02294533
  • Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2020). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi.
  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychol Methods, 16(3), 221–248. 10.1037/a0023350
  • Cao, Y., Lu, R., & Tao, W. (2014). Effect of item response theory (IRT) model selection on testlet- based test equating (ETS Research Report No. RR-14-19). Educational Testing Service. https://doi.org/10.1002/ets2.12017
  • Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Chen, J. (2014). Model selection for IRT equating of testlet-based tests in the random groups design [Doctoral dissertation] The University of Iowa. ProQuest Dissertations Publishing.
  • Cui, Z., & Kolen, M. J. (2009). Evaluation of two new smoothing methods in equating: The cubic B‐ spline presmoothing method and the direct presmoothing method. Journal of Educational Measurement, 46(2), 135-158. https://doi.org/10.1111/j.1745-3984.2009.00074.x
  • DeMars, C. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145–168. https://doi.org/10.1111/j.1745-3984.2006.00010.x
  • Doğuyurt, A. (2023). İkili puanlanan testlerde yerel madde bağımsızlık varsayımının ihlâlinin test eşitleme yöntemlerine etkisi [Doctoral dissertation]. Gazi Üniversitesi.
  • Dorans, N. J., & Feigenbaum, M. D. (1994). Equating issues engendered by changes to the SATR and PSAT/NMSQTR (ETS RM-94-10). ETS.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
  • Gibbons, R. D., & Hedeker, D. (1992). Full information item bifactor analysis. Psychometrika, 57(3), 423-436. https://doi.org/10.1007/BF02295430
  • Gök, B., & Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. https://doi.org/10.17860/efd.78698
  • Gül, E., Doğan-Gül, Ç., Çokluk-Bökeoğlu, Ö. & Özkan, M. (2017). Temel eğitimden ortaöğretime geçiş matematik alt testi asıl sınav ve mazeret sınavlarının madde tepki kuramına göre eşitlenmesi. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 17(4), 1900-1915. https://doi.org/10.17240/aibuefd.2017.17.32772-363973
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149.
  • Hambleton, R.K., & Cook, L.L. (1983). Robustness of ítem response models and effects of test length and sample size on the precision of ability estimates. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). Vancouver.
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE Publications.
  • Han, T., Kolen, M., & Pohlmann, J.(1997). A comparison among IRT true-andobserved score equatings and traditional equipercentile equating. Applied Measurement in Education,10,105–121.
  • Hanson, B. A. & Béguin, A. A. (2002). Obtaining a common scale for Item Response Theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3-24. https://doi.org/10.1177/0146621602026001001
  • He, W., Li, F., Wolfe, E. W., & Mao, X. (2012). Model selection for equating testlet-based tests in the NEAT design: An empirical study. Annual NCME Conference.
  • He, Y., Zhongmin, C. & Osterlind, S. J. New robust scale transformation methods in the presence of outlying common items. Applied Psychological Measurement 39 (8), 613-626. https://doi.org/10.1177/0146621615587003
  • Huang, F., Li, Z., Liu, Y., Su, J., Yin, L., & Zhang, M. (2022). An extension of testlet-based equating to the polytomous testlet response theory model. Frontiers in Psychology, 12, 743362. https://doi.org/10.3389/fpsyg.2021.743362
  • Kilmen, S. &, Demirtaşlı, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia - Social and Behavioral Sciences 46, 130 – 134. 10.1016/j.sbspro.2012.05.081
  • Kim, S. & Cohen, A. S. (1991). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29(1), 51-66.
  • Kim, S. & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43(1), 53-76.
  • Kim, K. Y., Lim, E., & Lee, W. C. (2019). A comparison of the relative performance of four IRT models on equating passage-based tests. International Journal of Testing, 19(3), 248–269. https://doi.org/10.1080/15305058.2018.1530239
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer.
  • Lee, W. C., and Ban, J. C. (2009). Comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48. https://doi.org/10.1080/08957340903423537
  • Lee, W., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Equating mixed-format tests using dichotomous common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2). CASMA Monograph. Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
  • Lee, G., Kolen, M. J., Frisbie, D. A., & Ankenmann, R. D. (2001). Comparison of dichotomous and polytomous item response models in equating scores from tests composed of testlets. Applied Psychological Measurement, 25, 357–372. https://doi.org/10.1177/01466210122032226
  • Linacre, J. M. (1994). Sample size and item calibration stability. Rasch measurement transactions, 7, 328.
  • Liu, C., & Kolen, M. J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. Mixed-format tests: Psychometric properties with a primary focus on equating, 1, 75-94.
  • Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23–29.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F. M. & Wingersky, M. S. (1984). Comparison of IRT true score and equipercentile observed score “equatings.”. Applied Psychological Measurement, 8, 453–46.
  • Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179–193.
  • Marais, I. D., & Andrich, D. (2008). Effects of varying magnitude and patterns of local dependence in the unidimensional Rasch model. Journal of Applied Measurement, 9(2), 105–124.
  • Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1), 1–23.
  • Ogasawara, H. (2001). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 3–24.
  • Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68(2), 193–211.
  • Özdemir, G., & Atar, B. (2022). Investigation of the missing data imputation methods on characteristic curve transformation methods used in test equating. Journal of Measurement and Evaluation in Education and Psychology, 13(2), 105-116. https://doi.org/10.21031/epod.1029044Öztürk Gübeş, N. &, Kelecioğlu, H. (2016). The impact of test dimensionality, common-item set format, and scale linking methods on mixed format test equating. Educational Sciences: Theory & Practice, 16(3), 715-734. 10.12738/estp.2016.3.0218
  • Öztürk Gübeş, N. (2019). Test eşitlemede çok boyutluluğun eş zamanlı ve ayrı kalibrasyona etkisi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 34(4), doi: 1061-1074. 10.16986/HUJE.2019049186
  • Reckase, M. D. (2009). Multidimensional item response theory. Springer.
  • Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349–359.
  • Rijmen, F. (2009). Three multidimensional models for testlet-based tests: Formal relations and an empirical comparison. ETS Research Report, 2009(2), 1–41.
  • Robitzsch, 2024. Bias-reduced Haebara and Stocking–Lord linking. J Multidisciplinary Scientific journal, 7(3), 373-384. https://doi.org/10.3390/j7030021
  • Ryan, J., & Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Council of Chief State School Officers.
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309-330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Spence, P. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Doctoratal dissertation] University of Florida.
  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108–121. https://doi.org/10.1080/08957347.2016.1138956
  • Uşkul, B. (2024). Madde takımı tabanlı testlerde ölçek dönüştürme hatalarının incelenmesi [Doctoral dissertation]. Hacettepe Üniversitesi.
  • Uysal, İ. &, Kilmen, S. (2016). Comparison of item response theory test equating methods for mixed format tests. International Online Journal of Educational Sciences, 2016, 8 (2), 1-11. http://dx.doi.org/10.15345/iojes.2016.02.001
  • Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185–202.
  • Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203–220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x
  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
  • Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26(2), 190–218. https://doi.org/10.1177/0146621602026001007
  • Wang, S. & Liu, H. (2018). Minimum sample size needed for equipercentile equating under the random groups design. In M. J. Kolen ve W. Lee (Ed.), Mixed-format tests: Psychometric properties with a primary focus on equating (vol 2.5, s. 107-126). Center for Advanced Studies in Measurement and Assessment.
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.
  • Yıldırım Seheryeli, M., Yahşi-Sarı, H., & Kelecioğlu, H. (2021). Comparison of Kernel Equating and Kernel Local Equating in Item Response Theory Observed Score Equating. Journal of Measurement and Evaluation in Education and Psychology, 12(4), 348-357. https://doi.org/10.21031/epod.900843
  • Yılmaz Koğar, E., & Kelecioğlu, H. (2017). Examination of different item response theory models on tests composed of testlets. Journal of Education and Learning, 6(4), 113-126. 10.5539/jel.v6n4p113
  • Zor, Y. M. (2023). Investigation of multidimensional scale transformation methods applied to multidimensional tests according to various conditions. Adıyaman University Journal of Educational Sciences, 13(1), 41-53. http://dx.doi.org/10.17984/adyuebd.1239198

Examination of Scale Transformation and Test Equating Methods in Testlet Based Tests

Year 2025, Volume: 33 Issue: 3, 658 - 671, 25.07.2025
https://doi.org/10.24106/kefdergi.1750267

Abstract

Purpose: This study examines the test equating performance under various item response theory models and sample size conditions in testlet based tests.
Design/Methodology/Approach: Utilizing data from the eTIMSS 2019 science test, the study compares scale transformation methods and test equating results under Unidimensional Item Response Theory (UIRT), Testlet Response Theory (TRT) and bifactor models with varying sample sizes. Scale transformation methods, including the mean-sigma and Stocking-Lord methods, as well as observed and true score equating methods, were employed within the framework of a common-item nonequivalent groups design. To evaluate the equating performance, RMSE and BIAS values were calculated.
Findings: The findings indicate that in a science test with low testlet effects, scale transformation results based on the UIRT model and test equating results based on the bifactor model demonstrated lower error rates. Moreover, as sample size increased, the error in parameter estimations generally decreased, with the TRT model specifically requiring a sample size of at least 500 for robust estimations.

Highlights: The bifactor model, taking testlet effects into account, yielded more precise and consistent results, facilitating fair and reliable score equating. This study, utilizing real data, concretely illustrates the practical implications of testlet effects in tests containing testlets.

References

  • Asriadi M., H. (2023). Equating of standardized science subjects tests using various methods: which is the most profitable? Thabiea : Journal of Natural Science Teaching, 6(1), 51-64.
  • Atalay Kabasakal, K. (2014). Değişen madde fonksiyonunun test eşitlemeye etkisi [Doktora tezi]. Hacettepe Üniversitesi.
  • Babcock, B., & Hodge, K. J. (2020). Rasch versus classical equating in the context of small sample sizes. Educational and Psychological Measurement, 80(3), 499-521. https://doi.org/10.1177/0013164419878
  • Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162 https://doi.org/10.1111/j.1745-3984.1991.tb00350.x
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168. https://doi.org/10.1007/BF02294533
  • Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2020). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi.
  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychol Methods, 16(3), 221–248. 10.1037/a0023350
  • Cao, Y., Lu, R., & Tao, W. (2014). Effect of item response theory (IRT) model selection on testlet- based test equating (ETS Research Report No. RR-14-19). Educational Testing Service. https://doi.org/10.1002/ets2.12017
  • Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Chen, J. (2014). Model selection for IRT equating of testlet-based tests in the random groups design [Doctoral dissertation] The University of Iowa. ProQuest Dissertations Publishing.
  • Cui, Z., & Kolen, M. J. (2009). Evaluation of two new smoothing methods in equating: The cubic B‐ spline presmoothing method and the direct presmoothing method. Journal of Educational Measurement, 46(2), 135-158. https://doi.org/10.1111/j.1745-3984.2009.00074.x
  • DeMars, C. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145–168. https://doi.org/10.1111/j.1745-3984.2006.00010.x
  • Doğuyurt, A. (2023). İkili puanlanan testlerde yerel madde bağımsızlık varsayımının ihlâlinin test eşitleme yöntemlerine etkisi [Doctoral dissertation]. Gazi Üniversitesi.
  • Dorans, N. J., & Feigenbaum, M. D. (1994). Equating issues engendered by changes to the SATR and PSAT/NMSQTR (ETS RM-94-10). ETS.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
  • Gibbons, R. D., & Hedeker, D. (1992). Full information item bifactor analysis. Psychometrika, 57(3), 423-436. https://doi.org/10.1007/BF02295430
  • Gök, B., & Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. https://doi.org/10.17860/efd.78698
  • Gül, E., Doğan-Gül, Ç., Çokluk-Bökeoğlu, Ö. & Özkan, M. (2017). Temel eğitimden ortaöğretime geçiş matematik alt testi asıl sınav ve mazeret sınavlarının madde tepki kuramına göre eşitlenmesi. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 17(4), 1900-1915. https://doi.org/10.17240/aibuefd.2017.17.32772-363973
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149.
  • Hambleton, R.K., & Cook, L.L. (1983). Robustness of ítem response models and effects of test length and sample size on the precision of ability estimates. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). Vancouver.
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE Publications.
  • Han, T., Kolen, M., & Pohlmann, J.(1997). A comparison among IRT true-andobserved score equatings and traditional equipercentile equating. Applied Measurement in Education,10,105–121.
  • Hanson, B. A. & Béguin, A. A. (2002). Obtaining a common scale for Item Response Theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3-24. https://doi.org/10.1177/0146621602026001001
  • He, W., Li, F., Wolfe, E. W., & Mao, X. (2012). Model selection for equating testlet-based tests in the NEAT design: An empirical study. Annual NCME Conference.
  • He, Y., Zhongmin, C. & Osterlind, S. J. New robust scale transformation methods in the presence of outlying common items. Applied Psychological Measurement 39 (8), 613-626. https://doi.org/10.1177/0146621615587003
  • Huang, F., Li, Z., Liu, Y., Su, J., Yin, L., & Zhang, M. (2022). An extension of testlet-based equating to the polytomous testlet response theory model. Frontiers in Psychology, 12, 743362. https://doi.org/10.3389/fpsyg.2021.743362
  • Kilmen, S. &, Demirtaşlı, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia - Social and Behavioral Sciences 46, 130 – 134. 10.1016/j.sbspro.2012.05.081
  • Kim, S. & Cohen, A. S. (1991). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29(1), 51-66.
  • Kim, S. & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43(1), 53-76.
  • Kim, K. Y., Lim, E., & Lee, W. C. (2019). A comparison of the relative performance of four IRT models on equating passage-based tests. International Journal of Testing, 19(3), 248–269. https://doi.org/10.1080/15305058.2018.1530239
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer.
  • Lee, W. C., and Ban, J. C. (2009). Comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48. https://doi.org/10.1080/08957340903423537
  • Lee, W., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Equating mixed-format tests using dichotomous common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2). CASMA Monograph. Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
  • Lee, G., Kolen, M. J., Frisbie, D. A., & Ankenmann, R. D. (2001). Comparison of dichotomous and polytomous item response models in equating scores from tests composed of testlets. Applied Psychological Measurement, 25, 357–372. https://doi.org/10.1177/01466210122032226
  • Linacre, J. M. (1994). Sample size and item calibration stability. Rasch measurement transactions, 7, 328.
  • Liu, C., & Kolen, M. J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. Mixed-format tests: Psychometric properties with a primary focus on equating, 1, 75-94.
  • Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23–29.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F. M. & Wingersky, M. S. (1984). Comparison of IRT true score and equipercentile observed score “equatings.”. Applied Psychological Measurement, 8, 453–46.
  • Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179–193.
  • Marais, I. D., & Andrich, D. (2008). Effects of varying magnitude and patterns of local dependence in the unidimensional Rasch model. Journal of Applied Measurement, 9(2), 105–124.
  • Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1), 1–23.
  • Ogasawara, H. (2001). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 3–24.
  • Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68(2), 193–211.
  • Özdemir, G., & Atar, B. (2022). Investigation of the missing data imputation methods on characteristic curve transformation methods used in test equating. Journal of Measurement and Evaluation in Education and Psychology, 13(2), 105-116. https://doi.org/10.21031/epod.1029044Öztürk Gübeş, N. &, Kelecioğlu, H. (2016). The impact of test dimensionality, common-item set format, and scale linking methods on mixed format test equating. Educational Sciences: Theory & Practice, 16(3), 715-734. 10.12738/estp.2016.3.0218
  • Öztürk Gübeş, N. (2019). Test eşitlemede çok boyutluluğun eş zamanlı ve ayrı kalibrasyona etkisi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 34(4), doi: 1061-1074. 10.16986/HUJE.2019049186
  • Reckase, M. D. (2009). Multidimensional item response theory. Springer.
  • Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349–359.
  • Rijmen, F. (2009). Three multidimensional models for testlet-based tests: Formal relations and an empirical comparison. ETS Research Report, 2009(2), 1–41.
  • Robitzsch, 2024. Bias-reduced Haebara and Stocking–Lord linking. J Multidisciplinary Scientific journal, 7(3), 373-384. https://doi.org/10.3390/j7030021
  • Ryan, J., & Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Council of Chief State School Officers.
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309-330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Spence, P. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Doctoratal dissertation] University of Florida.
  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108–121. https://doi.org/10.1080/08957347.2016.1138956
  • Uşkul, B. (2024). Madde takımı tabanlı testlerde ölçek dönüştürme hatalarının incelenmesi [Doctoral dissertation]. Hacettepe Üniversitesi.
  • Uysal, İ. &, Kilmen, S. (2016). Comparison of item response theory test equating methods for mixed format tests. International Online Journal of Educational Sciences, 2016, 8 (2), 1-11. http://dx.doi.org/10.15345/iojes.2016.02.001
  • Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185–202.
  • Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203–220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x
  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
  • Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26(2), 190–218. https://doi.org/10.1177/0146621602026001007
  • Wang, S. & Liu, H. (2018). Minimum sample size needed for equipercentile equating under the random groups design. In M. J. Kolen ve W. Lee (Ed.), Mixed-format tests: Psychometric properties with a primary focus on equating (vol 2.5, s. 107-126). Center for Advanced Studies in Measurement and Assessment.
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.
  • Yıldırım Seheryeli, M., Yahşi-Sarı, H., & Kelecioğlu, H. (2021). Comparison of Kernel Equating and Kernel Local Equating in Item Response Theory Observed Score Equating. Journal of Measurement and Evaluation in Education and Psychology, 12(4), 348-357. https://doi.org/10.21031/epod.900843
  • Yılmaz Koğar, E., & Kelecioğlu, H. (2017). Examination of different item response theory models on tests composed of testlets. Journal of Education and Learning, 6(4), 113-126. 10.5539/jel.v6n4p113
  • Zor, Y. M. (2023). Investigation of multidimensional scale transformation methods applied to multidimensional tests according to various conditions. Adıyaman University Journal of Educational Sciences, 13(1), 41-53. http://dx.doi.org/10.17984/adyuebd.1239198
There are 68 citations in total.

Details

Primary Language English
Subjects Measurement and Evaluation in Education (Other)
Journal Section Research Article
Authors

Harun Dilek 0000-0001-5671-6858

Kübra Atalay Kabasakal 0000-0002-3580-5568

Sebahat Gören 0000-0002-6453-3258

Publication Date July 25, 2025
Submission Date December 25, 2024
Acceptance Date July 25, 2025
Published in Issue Year 2025 Volume: 33 Issue: 3

Cite

APA Dilek, H., Atalay Kabasakal, K., & Gören, S. (2025). Examination of Scale Transformation and Test Equating Methods in Testlet Based Tests. Kastamonu Education Journal, 33(3), 658-671. https://doi.org/10.24106/kefdergi.1750267

10037