Çok Boyutlu Madde Tepki Kuramı Test Eşitleme Yöntemlerinin Karşılaştırılması

Burcu Demiröz; Nuri Doğan

doi:10.52826/mcbuefd.1587864

Araştırma Makalesi

Comparison of Multidimensional Item Response Theory Test Equating Methods

Yıl 2025, Cilt: 13 Sayı: 1 , 156 - 176 , 27.06.2025

Burcu Demiröz , Nuri Doğan

https://doi.org/10.52826/mcbuefd.1587864

https://izlik.org/JA29WX75SM

Öz

In this study, it was aimed to compare the results obtained from different methods used in equating scores derived from multidimensional tests, including bifactor MIRT observed score equating, full MIRT observed score equating, and the unidimensional approximation of MIRT observed score equating. While making the comparison, equated scores obtained under the common-item design were examined along with the standard error of equating, bias, and the root mean square error. Simulation data were used. Sample size, common item rate, level of relationship between dimensions, multidimensional test equating methods include 3 and calibration methods include 2 different conditions. As a result of the combination of different levels of these variables, 162 conditions were created. Data generation and equating procedures were carried out using the R programming language. It was observed that the error values decreased as the sample size increased for the concurrent and separate calibration methods. A minimum sample size of 3000 is recommended. The lowest error values were observed when the common-item proportion was 20% for concurrent calibration and 50% for separate calibration. The common-item proportion should be at most 20% for concurrent calibration and at least 50% for separate calibration. Concurrent calibration yielded lower error values than separate calibration.

Anahtar Kelimeler

test equating , item response theory , bi-factor model , scale linking , error

Kaynakça

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508–600). American Council on Education.
Atar, B., & Yeşiltaş, G. (2017). Çok boyutlu eşitleme yöntemlerinin eşdeğer olmayan gruplarda ortak madde deseni için performanslarının incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 421–434.
Baker, F. B., & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162.
Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory [Doctoral Dissertation]. University of Iowa.
Brossman, B. G., & Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481.
Cao, L. (2008). Mixed-format test equating: Effects of test dimensionality and common item sets [Doctoral dissertation]. University of Maryland, College Park.
Choi, J. (2019). Comparison of MIRT observed score equating methods under the commonitem nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
Çokluk, Ö., Uçar, A., & Balta, E. (2022). Madde tepki kuramına dayalı gerçek puan eşitlemede ölçek dönüştürme yöntemlerinin incelenmesi. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, 55(1), 1-36.
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Sheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate Normal and t Distribution. R package version 1.1-3, URL http://CRAN.Rproject.org/ package=mvtnorm.
Gök, B., & Kelecioğlu, H. (2014). Comparison of IRT equating methods using the commonitem nonequivalent groups design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
Hou, J. (2007). Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
Kilmen, S., & Demirtaşlı, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia-Social and Behavioral Sciences, 46, 130-134.
Kim, K. (2017). IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model [Doctoral Dissertation]. The University of Iowa.
Kim, K. Y., & Lee, W. C. (2018). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 243–261). Center for Advanced Studies in Measurement and Assessment.
Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests [Doctoral Dissertation]. University of Iowa.
Kim, S. Y., Lee, W., & Kolen, M. J. (2019). Simple-structure multidimensional item response theory equating for multidimensional test. Educational and Psychological Measurement, 80(1), 91-125.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
Kumlu, G. (2019). Test ve alt testlerde eşitlemenin farklı koşullar açısından incelenmesi [Doktora Tezi]. Hacettepe Üniversitesi.
Lee, E. (2013). Equating multidimensional test under a random groups design: A comparison of various equating procedures [Doctoral Dissertation]. University of Iowa.
Lee, G., & Lee, W. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29, 224-241.
Lee, W., & Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.2, s. 115-142). Center for Advanced Studies in Measurement and Assessment.
Lee, W. C., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equatingn (Vol. 2.2, s. 13–44). Center for Advanced Studies in Measurement and Assessment.
Meng, H. (2007). A comparison study of IRT calibration methods for mixed-format tests in vertical scaling [Unpublished Doctoral Dissertation]. University of Iowa.
Panidvadtana, P., Sujiva, S., & Srisuttiyakorn, S. (2021). A Comparison of the accuracy of multidimensional IRT equating methods for mixed-format tests. Kasetsart Journal of Social Sciences, 42, 215-220.
Peterson, J. L. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests [Doctoral dissertation], University of Iowa.
R Development Core Team. (2022). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29, 108-121.
Uğurlu, S. (2020). Comparison of equating methods for multidimensional test which contain items contain items with differential item functioning [Doktora Tezi]. Hacettepe Üniversitesi.
Wang, S., & Liu, H. (2018). Minimum sample size needed for equipercentile equating under the random groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 107–126). Center for Advanced Studies in Measurement and Assessment.
Wang, T. (2006). Standard errors of equating for equipercentile equating with log-linear presmoothing using the delta method (CASMA Research Report No. 14). Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
Yao, L., & Boughton, K. (2009). Multidimensional linking for tests with mixed item types. Journal of Educational Measurement, 46(2), 177–197.
Zhang, O. (2012). Observed score and true score equating form multidimensional response theory under nonequivalent group anchor test design [Doctoral Dissertation]. University of Florida.

Çok Boyutlu Madde Tepki Kuramı Test Eşitleme Yöntemlerinin Karşılaştırılması

Yıl 2025, Cilt: 13 Sayı: 1 , 156 - 176 , 27.06.2025

Burcu Demiröz , Nuri Doğan

https://doi.org/10.52826/mcbuefd.1587864

https://izlik.org/JA29WX75SM

Öz

Bu araştırmada, çok boyutlu testlerden elde edilen puanların eşitlenmesinde kullanılan çift-faktör çok boyutlu madde tepki kuramı gözlenen puan eşitleme, tam çok boyutlu madde tepki kuramı gözlenen puan ve çok boyutlu madde tepki kuramı gözlenen puan eşitleme tek boyutlu yaklaşım yöntemlerinden elde edilen sonuçların karşılaştırılması amaçlanmıştır. Karşılaştırma yaparken ortak madde deseni altında çeşitli faktörlere göre elde edilen eşitlenmiş puanlar, bu puanlara ait eşitlemenin standart hatası, yanlılık ve hata kareler ortalamasının karekökü değerleri incelenmiştir. Simülasyon verileri kullanılmıştır. Örneklem büyüklüğü, ortak madde oranı, boyutlar arasındaki ilişki düzeyi, çok boyutlu test eşitleme yöntemleri 3 ve kalibrasyon yöntemleri 2 farklı koşul içermektedir. Bu değişkenlerin farklı seviyelerinin kombinasyonu sonucunda 162 koşul oluşturulmuştur. Veri setlerinin üretilmesi ve eşitleme çalışmaları R programlama dili kullanılarak gerçekleştirilmiştir. Eş zamanlı ve ayrı kalibrasyon yöntemleri için örneklem büyüklüğü arttıkça hata değerlerinin azaldığı gözlenmiştir. Örneklem büyüklüğünün en az 3000 olması önerilmektedir. Eş zamanlı kalibrasyon yöntemi kullanıldığında ortak madde oranı %20; ayrı kalibrasyon yöntemi kullanıldığında ortak madde oranı %50 olduğunda en az hata değerleri gözlenmiştir. Ortak madde oranı eş zamanlı kalibrasyonda en çok %20; ayrı kalibrasyonda en az %50 olmalıdır. Eş zamanlı kalibrasyonda ayrı kalibrasyon yönteminden daha küçük hata değerleri gözlenmiştir.

Anahtar Kelimeler

test eşitleme , madde tepki kuramı , çift-faktör model , ölçek kalibrasyonu , hata

Kaynakça

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508–600). American Council on Education.
Atar, B., & Yeşiltaş, G. (2017). Çok boyutlu eşitleme yöntemlerinin eşdeğer olmayan gruplarda ortak madde deseni için performanslarının incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 421–434.
Baker, F. B., & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162.
Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory [Doctoral Dissertation]. University of Iowa.
Brossman, B. G., & Lee, W. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37, 460-481.
Cao, L. (2008). Mixed-format test equating: Effects of test dimensionality and common item sets [Doctoral dissertation]. University of Maryland, College Park.
Choi, J. (2019). Comparison of MIRT observed score equating methods under the commonitem nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
Çokluk, Ö., Uçar, A., & Balta, E. (2022). Madde tepki kuramına dayalı gerçek puan eşitlemede ölçek dönüştürme yöntemlerinin incelenmesi. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, 55(1), 1-36.
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Sheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate Normal and t Distribution. R package version 1.1-3, URL http://CRAN.Rproject.org/ package=mvtnorm.
Gök, B., & Kelecioğlu, H. (2014). Comparison of IRT equating methods using the commonitem nonequivalent groups design. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136.
Hou, J. (2007). Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design [Doctoral Dissertation]. University of Iowa.
Kilmen, S., & Demirtaşlı, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia-Social and Behavioral Sciences, 46, 130-134.
Kim, K. (2017). IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model [Doctoral Dissertation]. The University of Iowa.
Kim, K. Y., & Lee, W. C. (2018). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 243–261). Center for Advanced Studies in Measurement and Assessment.
Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests [Doctoral Dissertation]. University of Iowa.
Kim, S. Y., Lee, W., & Kolen, M. J. (2019). Simple-structure multidimensional item response theory equating for multidimensional test. Educational and Psychological Measurement, 80(1), 91-125.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. Springer Science and Business Media.
Kumlu, G. (2019). Test ve alt testlerde eşitlemenin farklı koşullar açısından incelenmesi [Doktora Tezi]. Hacettepe Üniversitesi.
Lee, E. (2013). Equating multidimensional test under a random groups design: A comparison of various equating procedures [Doctoral Dissertation]. University of Iowa.
Lee, G., & Lee, W. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29, 224-241.
Lee, W., & Brossman, B. G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.2, s. 115-142). Center for Advanced Studies in Measurement and Assessment.
Lee, W. C., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Linking methods for the full-information bifactor model under the common-item nonequivalent groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equatingn (Vol. 2.2, s. 13–44). Center for Advanced Studies in Measurement and Assessment.
Meng, H. (2007). A comparison study of IRT calibration methods for mixed-format tests in vertical scaling [Unpublished Doctoral Dissertation]. University of Iowa.
Panidvadtana, P., Sujiva, S., & Srisuttiyakorn, S. (2021). A Comparison of the accuracy of multidimensional IRT equating methods for mixed-format tests. Kasetsart Journal of Social Sciences, 42, 215-220.
Peterson, J. L. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests [Doctoral dissertation], University of Iowa.
R Development Core Team. (2022). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29, 108-121.
Uğurlu, S. (2020). Comparison of equating methods for multidimensional test which contain items contain items with differential item functioning [Doktora Tezi]. Hacettepe Üniversitesi.
Wang, S., & Liu, H. (2018). Minimum sample size needed for equipercentile equating under the random groups design. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2.5, s. 107–126). Center for Advanced Studies in Measurement and Assessment.
Wang, T. (2006). Standard errors of equating for equipercentile equating with log-linear presmoothing using the delta method (CASMA Research Report No. 14). Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
Yao, L., & Boughton, K. (2009). Multidimensional linking for tests with mixed item types. Journal of Educational Measurement, 46(2), 177–197.
Zhang, O. (2012). Observed score and true score equating form multidimensional response theory under nonequivalent group anchor test design [Doctoral Dissertation]. University of Florida.

Toplam 32 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Uygulamalı ve Gelişimsel Psikoloji (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Burcu Demiröz 0000-0002-8326-4671 Nuri Doğan 0000-0001-6274-2016
Gönderilme Tarihi	20 Kasım 2024
Kabul Tarihi	28 Mayıs 2025
Yayımlanma Tarihi	27 Haziran 2025
DOI	https://doi.org/10.52826/mcbuefd.1587864
IZ	https://izlik.org/JA29WX75SM
Yayımlandığı Sayı	Yıl 2025 Cilt: 13 Sayı: 1

Kaynak Göster

APA	Demiröz, B., & Doğan, N. (2025). Çok Boyutlu Madde Tepki Kuramı Test Eşitleme Yöntemlerinin Karşılaştırılması. Manisa Celal Bayar Üniversitesi Eğitim Fakültesi Dergisi, 13(1), 156-176. https://doi.org/10.52826/mcbuefd.1587864

Makale Dosyaları

Tam Metin