TY  - JOUR
T1  - Çok Boyutlu Madde Tepki Kuramı Test Eşitleme Yöntemlerinin Karşılaştırılması
TT  - Comparison of Multidimensional Item Response Theory Test Equating Methods
AU  - Demiröz, Burcu
AU  - Doğan, Nuri
PY  - 2025
DA  - June
Y2  - 2025
DO  - 10.52826/mcbuefd.1587864
JF  - Manisa Celal Bayar Üniversitesi Eğitim Fakültesi Dergisi
JO  - MCBÜEFD
PB  - Manisa Celal Bayar Üniversitesi
WT  - DergiPark
SN  - 1309-8918
SP  - 156
EP  - 176
VL  - 13
IS  - 1
LA  - tr
AB  - Bu araştırmada, çok boyutlu testlerden elde edilen puanların eşitlenmesinde kullanılan çift-faktör çok boyutlu madde tepki kuramı gözlenen puan eşitleme, tam çok boyutlu madde tepki kuramı gözlenen puan ve çok boyutlu madde tepki kuramı gözlenen puan eşitleme tek boyutlu yaklaşım yöntemlerinden elde edilen sonuçların karşılaştırılması amaçlanmıştır. Karşılaştırma yaparken ortak madde deseni altında çeşitli faktörlere göre elde edilen eşitlenmiş puanlar, bu puanlara ait eşitlemenin standart hatası, yanlılık ve hata kareler ortalamasının karekökü değerleri incelenmiştir. Simülasyon verileri kullanılmıştır. Örneklem büyüklüğü, ortak madde oranı, boyutlar arasındaki ilişki düzeyi, çok boyutlu test eşitleme yöntemleri 3 ve kalibrasyon yöntemleri 2 farklı koşul içermektedir. Bu değişkenlerin farklı seviyelerinin kombinasyonu sonucunda 162 koşul oluşturulmuştur. Veri setlerinin üretilmesi ve eşitleme çalışmaları R programlama dili kullanılarak gerçekleştirilmiştir. Eş zamanlı ve ayrı kalibrasyon yöntemleri için örneklem büyüklüğü arttıkça hata değerlerinin azaldığı gözlenmiştir. Örneklem büyüklüğünün en az 3000 olması önerilmektedir. Eş zamanlı kalibrasyon yöntemi kullanıldığında ortak madde oranı %20; ayrı kalibrasyon yöntemi kullanıldığında ortak madde oranı %50 olduğunda en az hata değerleri gözlenmiştir. Ortak madde oranı eş zamanlı kalibrasyonda en çok %20; ayrı kalibrasyonda en az %50 olmalıdır. Eş zamanlı kalibrasyonda ayrı kalibrasyon yönteminden daha küçük hata değerleri gözlenmiştir.
KW  - test eşitleme
KW  - madde tepki kuramı
KW  - çift-faktör model
KW  - ölçek kalibrasyonu
KW  - hata
N2  - In this study, it was aimed to compare the results obtained from different methods used in equating scores derived from multidimensional tests, including bifactor MIRT observed score equating, full MIRT observed score equating, and the unidimensional approximation of MIRT observed score equating. While making the comparison, equated scores obtained under the common-item design were examined along with the standard error of equating, bias, and the root mean square error. Simulation data were used. Sample size, common item rate, level of relationship between dimensions, multidimensional test equating methods include 3 and calibration methods include 2 different conditions. As a result of the combination of different levels of these variables, 162 conditions were created. Data generation and equating procedures were carried out using the R programming language. It was observed that the error values decreased as the sample size increased for the concurrent and separate calibration methods. A minimum sample size of 3000 is recommended. The lowest error values were observed when the common-item proportion was 20% for concurrent calibration and 50% for separate calibration. The common-item proportion should be at most 20% for concurrent calibration and at least 50% for separate calibration. Concurrent calibration yielded lower error values than separate calibration.
CR  - Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508–600). American Council on Education.
CR  - Atar, B., &amp; Yeşiltaş, G. (2017). Çok boyutlu eşitleme yöntemlerinin eşdeğer olmayan gruplarda ortak madde deseni için performanslarının incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 421–434.
CR  - Baker, F. B., &amp; Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162.
CR  - Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory [Doctoral Dissertation]. University of Iowa.
CR  - Brossman, B. G., &amp; Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481.
CR  - Cao, L. (2008). Mixed-format test equating: Effects of test dimensionality and common item sets [Doctoral dissertation]. University of Maryland, College Park.
CR  - Choi, J. (2019). Comparison of MIRT observed score equating methods under the commonitem nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
CR  - Çokluk, Ö., Uçar, A., &amp; Balta, E. (2022). Madde tepki kuramına dayalı gerçek puan eşitlemede ölçek dönüştürme yöntemlerinin incelenmesi. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, 55(1), 1-36.
CR  - Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Sheipl, F., &amp; Hothorn, T. (2021). mvtnorm: Multivariate Normal and t Distribution. R package version 1.1-3, URL http://CRAN.Rproject.org/ package=mvtnorm.
CR  - Gök, B., &amp; Kelecioğlu, H. (2014). Comparison of IRT equating methods using the commonitem nonequivalent groups design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
CR  - Hou, J. (2007). Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
CR  - Kilmen, S., &amp; Demirtaşlı, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia-Social and Behavioral Sciences, 46, 130-134.
CR  - Kim, K. (2017). IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model [Doctoral Dissertation]. The University of Iowa.
CR  - Kim, K. Y., &amp; Lee, W. C. (2018). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen &amp; W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 243–261). Center for Advanced Studies in Measurement and Assessment.
CR  - Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests [Doctoral Dissertation]. University of Iowa.
CR  - Kim, S. Y., Lee, W., &amp; Kolen, M. J. (2019). Simple-structure multidimensional item response theory equating for multidimensional test. Educational and Psychological Measurement, 80(1), 91-125.
CR  - Kolen, M. J., &amp; Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
CR  - Kumlu, G. (2019). Test ve alt testlerde eşitlemenin farklı koşullar açısından incelenmesi [Doktora Tezi]. Hacettepe Üniversitesi.
CR  - Lee, E. (2013). Equating multidimensional test under a random groups design: A comparison of various equating procedures [Doctoral Dissertation]. University of Iowa.
CR  - Lee, G., &amp; Lee, W. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29, 224-241.
CR  - Lee, W., &amp; Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen &amp; W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.2, s. 115-142). Center for Advanced Studies in Measurement and Assessment.
CR  - Lee, W. C., He, Y., Hagge, S., Wang, W., &amp; Kolen, M. J. (2012). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen &amp; W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equatingn (Vol. 2.2, s. 13–44). Center for Advanced Studies in Measurement and Assessment.
CR  - Meng, H. (2007). A comparison study of IRT calibration methods for mixed-format tests in vertical scaling [Unpublished Doctoral Dissertation]. University of Iowa.
CR  - Panidvadtana, P., Sujiva, S., &amp; Srisuttiyakorn, S. (2021). A Comparison of the accuracy of multidimensional IRT equating methods for mixed-format tests. Kasetsart Journal of Social Sciences, 42, 215-220.
CR  - Peterson, J. L. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests [Doctoral dissertation], University of Iowa.
CR  - R Development Core Team. (2022). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
CR  - Tao, W., &amp; Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29, 108-121.
CR  - Uğurlu, S. (2020). Comparison of equating methods for multidimensional test which contain items contain items with differential item functioning [Doktora Tezi]. Hacettepe Üniversitesi.
CR  - Wang, S., &amp; Liu, H. (2018). Minimum sample size needed for equipercentile equating under the random groups design. In M. J. Kolen &amp; W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 107–126). Center for Advanced Studies in Measurement and Assessment.
CR  - Wang, T. (2006). Standard errors of equating for equipercentile equating with log-linear presmoothing using the delta method (CASMA Research Report No. 14). Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
CR  - Yao, L., &amp; Boughton, K. (2009). Multidimensional linking for tests with mixed item types. Journal of Educational Measurement, 46(2), 177–197.
CR  - Zhang, O. (2012). Observed score and true score equating form multidimensional response theory under nonequivalent group anchor test design [Doctoral Dissertation]. University of Florida.
UR  - https://doi.org/10.52826/mcbuefd.1587864
L1  - https://dergipark.org.tr/tr/download/article-file/4377904
ER  -