Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi

Ömay Çokluk-bökeoglu; Arzu Uçar; Ebru Balta

doi:10.30964/auebfd.1001128

Araştırma Makalesi

Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi

Yıl 2022, Cilt: 55 Sayı: 1, 1 - 36, 31.03.2022

Ömay Çokluk-bökeoglu , Arzu Uçar , Ebru Balta

https://doi.org/10.30964/auebfd.1001128

Cited By: 1

Öz

Bu araştırmada, Madde Tepki Kuramı’na (MTK) dayalı gerçek puan eşitlemede, ölçek dönüştürme yöntemlerinin (ortalama-ortalama (OO), ortalama-standart sapma (OS), Stocking-Lord (SL), Haebara (HB)) farklı koşullar altında eşitleme hatalarının karşılaştırılması amaçlanmıştır. Araştırmanın amacı doğrultusunda, yöntemlerin hatalarını karşılaştırmak için örneklem büyüklüğü (500, 1000, 3000, 10000), test uzunluğu (40, 50, 80), ortak madde oranı (%20-%30-%40), parametre kestirim modeli (iki ve üç parametreli lojistik model (2PLM ve 3PLM)) ve grupların yetenek dağılımı (benzer (N(0-1) - N(0-1)), farklı (N(0-1) - N(0.5,1)) koşulları altında 2PLM ve 3PLM’ye uyumlu iki kategorili 50 yineleme ile 7200 veri seti oluşturulmuştur. Veri toplamı deseni olarak “denk olmayan gruplarda ortak madde/test (NEAT) deseni” kullanılmıştır. Veri üretiminde ve analizinde R yazılımı kullanılmıştır. Araştırmadan elde edilen bulgular, eşitleme hatası (RMSD) ölçütüne göre değerlendirilmiştir. Çalışmanın sonucunda, tüm koşullar göz önünde bulundurulduğunda, SL yönteminin RMSD değerlerinin, diğer yöntemlere göre daha yüksek olduğu görülmekle birlikte, OO ve OS yöntemlerinin birbirine benzer RMSD değerleri ürettiği görülmüştür. Ayrıca, ölçek dönüştürme yöntemlerine ilişkin RMSD değerleri karşılaştırıldığında, 2PLM ve 3PLM’nin kullanıldığı durumlarda benzer sonuçlar elde edildiği, örneklem büyüklüğü ve test uzunluğu arttıkça SL yöntemi dışında diğer yöntemlerin eşitleme hatalarında azalma oluştuğu ve ortak madde oranının %40 ve grupların yetenek dağılımının benzer olduğu durumlarda, yöntemlerin, RMSD değerlerinin daha düşük olduğu gözlenmiştir.

Anahtar Kelimeler

Haebara , MTK gerçek puan eşitleme ortalama-ortalama , ortalama-standart sapma , ölçek dönüştürme , Stocking-Lord , test eşitleme

Kaynakça

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd Eds.) (pp. 508–600). Washington, DC: American Council on Education.
Angoff, W. H. (1984). Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service.
Babcock, B., Albano, A., and Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608–628. doi:10.1177/0013164411428609
Baker, F. B., and Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162. Retrived from http://www.jstor.org/stable/1434796
Barnard, J. J. (1996). In search of equity in educational measurement: traditional versus modern equating methods. Paper presented at the ASEESA’s National Conference at the HSRC Conference Center, Pretoria, South Africa.
Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13-20. Retrived from http://www.jstor.org/stable/1434562
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
Chon, K. H., Lee, W. C., and Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Casma Research Report, 26, November). Retrived from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-26.pdf
Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. USA: Harcourt Brace Jovanovich College.
Cui, Z.,and Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. doi: 10.1177/0146621607300854
Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3 (1), 3-17. doi:10.1207/s15324818ame0301_2
Dorans, N. J., and Holland P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37 (4), 281-306. doi:10.1111/j.1745-3984.2000.tb01088.x
Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating (ETS Research Report, 41, December). Retrived from https://www.ets.org/Media/Research/pdf/RR-10-29.pdf
Eid, G. K. (2005). The effects of sample size on the equating of test items. Education, 126 (1), 165-180. Retrived from https://www.thefreelibrary.com/The+effects+of+sample+size+on+the+equating+of+test+items.-a0136846803
Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Unpublished Doctoral Dissertation). The University of North Carolina, Chapel Hill.
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1-30. Retrived from https://www.jstatsoft.org/index
Gök, B. ve Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. doi:10.17860/efd.78698
Felan, G. D. ( February,2002). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144-149. Retrived from https://doi.org/10.4992/psycholres1954.22.144
Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: Sage.
Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Hanson, B. A., and Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
Harris, D. J., and Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240. doi:10.1207/s15324818ame0603_3
Harwell, M., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20 (2), 101-125. Retrived from https://conservancy.umn.edu/bitstream/handle/11299/119086/1/v20n2p101.pdf
He, Q. (2010). Maintaining standards in on-demand testing using item response theory (Office of Qualifications and Examinations Regulation (OFQUAL), February). Retrived from.https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/605861/0210_QingpingHe_Maintaining-standards.pdf
He, Y. (2011). Evaluating equating properties for mixed-format tests. (Unpublished Doctoral Dissertation). University of Iowa, Iowa City.
Holland, P. W., and Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement (4th Eds) (pp. 187-220). Westport, CT: Praeger.
Holland, P. W., Dorans, N. J., and Petersen, N. S. (2007). Equating test scores. In C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (pp. 169-203). Oxford, UK: Elsevier
Hu, H., Rogers, T. W., and Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presenceof outlier common items. Applied Psychological Measurement, 32(4), 311-333. doi:10.1177/0146621606292215 Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia
Kang, T., and Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on Measurement in Education, San Diego, CA.
Karkee, T. B., and Wright, K. R. (2004). Evaluation of Linking Methods for Placing ThreeParameter Logistic Item Parameter Estimates onto a One-Parameter Scale. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.
Kaskowitz, G. S., and De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25 (1), 39-52. doi:10.1177/01466216010251003
Kolen, M. J. (1981). Comparison of Traditional and Item Response Theory Methods for Equating Tests. Journal of Educational Measurement, 18 (1), 1-11. Retrived from http://www.jstor.org/stable/1434813
Kolen, M. J. (1985). Standard errors of Tucker Equating. Applied Psychological Measurement, 9 (2), 209-223, doi: 10.1177/014662168500900209
Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7, 29-36. Retrived from https://eric.ed.gov/?id=EJ388096
Kolen, M. J., and Brennan, R. L. (1995). Test Equating: methods and practices. New York: SpringerVerlag
Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York:Springer.
Kilmen, S. (2010). Madde tepki kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması. (Yayınlanmamış doktora tezi). Ankara Üniversitesi, Ankara.
Kim, S., and Cohen, A. S. (1998). A comprasion of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131-143. doi:10.1177/01466216980222003
Kim, S., and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26 (3), 255-270. doi:10.1177/0146621602026003002
Kim, S., and Lee, W. C. (2004). IRT scale linking methods for mixed-format tests(ACT Research Report 5, December). Retrived from https://www.act.org/content/dam/act/unsecured/documents/ACT_RR2004-5.pdf
Kim, S., and Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43 (1), 53-76. Retrived from http://www.jstor.org/stable/20461809
Kim, S., and Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. doi:10.1207/s15324818ame1904_7
Lee, Y. S. (2007). A comparison of methods for nonparametric estimation of item characteristic curves for binary items. Applied Psychological Measurement, 31 (2), 121-134. doi: 10.1177/0146621606290248.
Lee, G., and Fitzpatrick, A. R. (2008). A new approach to test score equating using item response theory with fixed c-parameters. Asia Pacific Education Review, 9(3), 248–261. doi:10.1007/bf03026714
Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
Li, Y. H., and Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. doi: 10.1177/01466216000242002.
Liou, M., Cheng, P. E., and Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21 (4), 349-369.doi: 10.1177/01466216970214005
Livingston, S. A., and Kim, S. (2010). Random-Groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175–185. doi:10.1111/j.1745-3984.2010.00107.x
Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1982.tb01306.x
Loyd, B. H., and Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.23338504.1982.tb01306.x
Mao, X., von Davier, A. A., and Rupp, S. (2006). Comparisons of the Kernel equating method with the traditional equating methods on PRAXISTM data (ETS RR-06-30, December). Princeton, NJ: Educational Testing Service.
Marco, G. L. (1977). Item Characteristic Curve Solutions to Three Intractable Testing Problems. Journal of Educational Measurement, 14 (2), 139-160. Retrived from http://www.jstor.org/stable/1434012
Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Michaelides, M. P. ( April, 2003). Sensitivity of IRT equating to the behavior of test equating items. Paper presented at the American Educational Research Association, Chicago, Illinois.
Mohandas, R. (1996). Test equating, problems and solutions: Equating English test forms for the Indonesian junior secondary schoool final examination administered in 1994. (Unpublished Doctoral Dissertation). Flinders University, South Australia.
Norman Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods.(Unpublished Doctoral Dissertation). University of Nebraska, Lincoln.
Nozawa, Y. (2009). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. (Unpublished Doctoral Dissertation), The University of Iowa, Iowa City.
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review(Otaru University of Commerce),51(1), 1-23. Retrived from https://www.researchgate.net/publication/241025868_Asymptotic_Standard_Errors_of_IRT_Equating_Coefficients_Using_Moments
Partchev, I. (2016). Package 'irtoys'. (Version 0.2.0). Retrieved from https://cran.r-project.org/web/packages/irtoys/irtoys.pdf
Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1993). Scaling, norming and equating. In Linn, R. L. (Ed.) Educational measurement (pp. 221-262). USA: The Oryx.
Potenza, M. T., and Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19(1), 23-37. doi:10.1177/014662169501900104
Rizopoulos, D. (2015). Package 'ltm'. Retrieved from https://cran.r-project.org/web/packages/ltm/ltm.pdf
Ryan, J., and Brockmann, F. (2009). A prictitioner’s introduction to equating. Retrieved from https://files.eric.ed.gov/fulltext/ED544690.pdf
Sarkar, D. (2017). Package “lattice”. Retrieved from https://cran.r-project.org/web/packages/lattice/lattice.pdf
Skaggs, G. (1990). To match or not to match samples on ability for equating: A discussion of five articles. Applied Measurement in Education, 3 (1), 105-113. doi:10.1207/s15324818ame0301_8
Skagss, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(2).309–330 Retrived from https://doi.org/10.1111/j.1745-3984.2005.00018.x
Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.3102/00346543056004495
Speron, E. (2009). A comparison of metric linking procedures in item response theory. (Unpublished Doctoral Dissertation).University Of Illinois, Illinois.
Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory. (Unpublished Doctoral Dissertation), University of Florida, Florida.
Stocking, M.L., and Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. doi:10.1177/014662168300700208
Sinharay, S., and Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16, May). Princeton NJ: Educational Testing Service.
Qu, Y. (2007). The effect of weighting in Kernel equating using counter-balanced designs. (Unpublished Doctoral Dissertation). Michigan State University, East Lansing.
Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. Retrieved from http://www.jstor.org/stable/1435244
Tsai, T. H. (March,1997). Estimating minumum sample sizes in random groups equating. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
Uysal, İ. (2014).Madde Tepki Kuramı'na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması. (Yayınlanmamış Yüksek Lisans Tezi). Abant İzzet Baysal Üniversitesi,Bolu.
von Davier, A.A. (2010). Statistical Models For Test Equating, Scaling and Linking. New York: Springer.
von Davier, A. A., and Wilson, C. (2007). IRT true-score test equating: A guide through assumptions and applications. Educational and Psychological Measurement, 67 (6), 940-957. doi:10.1177/0013164407301543
Way, W. D., and Tang, K. L. (April,1991,). A comparison of four logistic model equating methods. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, Illinois.
Weeks, J. P. (2010). Plink: An R package for linking mixed-format tests using IRT- based methods. Journal of Statistical Software, 35(12), 1-33. Retrieved from https://www.jstatsoft.org/article/view/v035i12
Woldbeck, T. (April, 1998). Basic concepts in modern methods of test equating. Paper presented at the Annual Meeting of the Southwest Psychological Association, New Orleans, Louisiana
Yang, W. L., and Houang, R. T. (April, 1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the Annual Meeting of the American Educational Research Asssociation, New York City, New York.
Zeng, L. (1991). Standard errors of linear equating for the single-group design. (ACT Research Report Series. 91–4, August). Iowa City, Iowa.
Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. (Unpublished Doctoral Dissertation). University of Massachusett, Amherst

Investigation of Scale Transformation Methods in True Score Equating Based on Item Response Theory

Yıl 2022, Cilt: 55 Sayı: 1, 1 - 36, 31.03.2022

Ömay Çokluk-bökeoglu , Arzu Uçar , Ebru Balta

https://doi.org/10.30964/auebfd.1001128

Cited By: 1

Öz

In this research, it was aimed to compare equating errors of scale transformation methods (mean-mean (MM), mean-sigma (MS), Heabera (HB) and Stocking-Lord (SL)) in true score equating based on item response theory (IRT) under different conditions. In line with the purpose of the study, 7200 dichotomous data sets which were consistent with two and three- parameter logistic model were generated with 50 replication under the conditions of sample size (500, 1,000, 3,000, 10,000), test length (40, 50, 80), rate of the common item (20%, 30%, 40%), type of model used in parameter estimation (two and three-parameter logistic models (2PLM and 3PLM)), and ability distribution of groups (similar (N(0-1) - N(0-1)), different (N(0-1) - N(0.5,1)) for the obtained performance of methods. Common item nonequivalent groups equating design was used. R software was used for data generation and analyses. The results obtained from the study were evaluated by using equating error (RMSD) criterion. As a result of the study, considering all the conditions, it was seen that the RMSD values of the SL method were higher than the other methods, but it was seen that the MM and MS methods produced similar RMSD values. In addition, when the RMSD values of the scale transformation methods are compared, similar results are obtained in cases where 2PLM and 3PLM are used, as the sample size and test length increase, equating errors of other methods except the SL method decrease, and It was observed that the methods had lower RMSD values in cases where the common item rate is 40% and the ability distribution of the groups is similar.

Anahtar Kelimeler

Haebara , IRT true score equating , mean-mean , mean-sigma , scale transformation , Stocking-Lord , test equating

Kaynakça

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd Eds.) (pp. 508–600). Washington, DC: American Council on Education.
Angoff, W. H. (1984). Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service.
Babcock, B., Albano, A., and Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608–628. doi:10.1177/0013164411428609
Baker, F. B., and Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162. Retrived from http://www.jstor.org/stable/1434796
Barnard, J. J. (1996). In search of equity in educational measurement: traditional versus modern equating methods. Paper presented at the ASEESA’s National Conference at the HSRC Conference Center, Pretoria, South Africa.
Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13-20. Retrived from http://www.jstor.org/stable/1434562
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
Chon, K. H., Lee, W. C., and Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Casma Research Report, 26, November). Retrived from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-26.pdf
Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. USA: Harcourt Brace Jovanovich College.
Cui, Z.,and Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. doi: 10.1177/0146621607300854
Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3 (1), 3-17. doi:10.1207/s15324818ame0301_2
Dorans, N. J., and Holland P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37 (4), 281-306. doi:10.1111/j.1745-3984.2000.tb01088.x
Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating (ETS Research Report, 41, December). Retrived from https://www.ets.org/Media/Research/pdf/RR-10-29.pdf
Eid, G. K. (2005). The effects of sample size on the equating of test items. Education, 126 (1), 165-180. Retrived from https://www.thefreelibrary.com/The+effects+of+sample+size+on+the+equating+of+test+items.-a0136846803
Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Unpublished Doctoral Dissertation). The University of North Carolina, Chapel Hill.
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1-30. Retrived from https://www.jstatsoft.org/index
Gök, B. ve Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. doi:10.17860/efd.78698
Felan, G. D. ( February,2002). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144-149. Retrived from https://doi.org/10.4992/psycholres1954.22.144
Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: Sage.
Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Hanson, B. A., and Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
Harris, D. J., and Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240. doi:10.1207/s15324818ame0603_3
Harwell, M., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20 (2), 101-125. Retrived from https://conservancy.umn.edu/bitstream/handle/11299/119086/1/v20n2p101.pdf
He, Q. (2010). Maintaining standards in on-demand testing using item response theory (Office of Qualifications and Examinations Regulation (OFQUAL), February). Retrived from.https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/605861/0210_QingpingHe_Maintaining-standards.pdf
He, Y. (2011). Evaluating equating properties for mixed-format tests. (Unpublished Doctoral Dissertation). University of Iowa, Iowa City.
Holland, P. W., and Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement (4th Eds) (pp. 187-220). Westport, CT: Praeger.
Holland, P. W., Dorans, N. J., and Petersen, N. S. (2007). Equating test scores. In C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (pp. 169-203). Oxford, UK: Elsevier
Hu, H., Rogers, T. W., and Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presenceof outlier common items. Applied Psychological Measurement, 32(4), 311-333. doi:10.1177/0146621606292215 Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia
Kang, T., and Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on Measurement in Education, San Diego, CA.
Karkee, T. B., and Wright, K. R. (2004). Evaluation of Linking Methods for Placing ThreeParameter Logistic Item Parameter Estimates onto a One-Parameter Scale. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.
Kaskowitz, G. S., and De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25 (1), 39-52. doi:10.1177/01466216010251003
Kolen, M. J. (1981). Comparison of Traditional and Item Response Theory Methods for Equating Tests. Journal of Educational Measurement, 18 (1), 1-11. Retrived from http://www.jstor.org/stable/1434813
Kolen, M. J. (1985). Standard errors of Tucker Equating. Applied Psychological Measurement, 9 (2), 209-223, doi: 10.1177/014662168500900209
Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7, 29-36. Retrived from https://eric.ed.gov/?id=EJ388096
Kolen, M. J., and Brennan, R. L. (1995). Test Equating: methods and practices. New York: SpringerVerlag
Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York:Springer.
Kilmen, S. (2010). Madde tepki kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması. (Yayınlanmamış doktora tezi). Ankara Üniversitesi, Ankara.
Kim, S., and Cohen, A. S. (1998). A comprasion of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131-143. doi:10.1177/01466216980222003
Kim, S., and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26 (3), 255-270. doi:10.1177/0146621602026003002
Kim, S., and Lee, W. C. (2004). IRT scale linking methods for mixed-format tests(ACT Research Report 5, December). Retrived from https://www.act.org/content/dam/act/unsecured/documents/ACT_RR2004-5.pdf
Kim, S., and Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43 (1), 53-76. Retrived from http://www.jstor.org/stable/20461809
Kim, S., and Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. doi:10.1207/s15324818ame1904_7
Lee, Y. S. (2007). A comparison of methods for nonparametric estimation of item characteristic curves for binary items. Applied Psychological Measurement, 31 (2), 121-134. doi: 10.1177/0146621606290248.
Lee, G., and Fitzpatrick, A. R. (2008). A new approach to test score equating using item response theory with fixed c-parameters. Asia Pacific Education Review, 9(3), 248–261. doi:10.1007/bf03026714
Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
Li, Y. H., and Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. doi: 10.1177/01466216000242002.
Liou, M., Cheng, P. E., and Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21 (4), 349-369.doi: 10.1177/01466216970214005
Livingston, S. A., and Kim, S. (2010). Random-Groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175–185. doi:10.1111/j.1745-3984.2010.00107.x
Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1982.tb01306.x
Loyd, B. H., and Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.23338504.1982.tb01306.x
Mao, X., von Davier, A. A., and Rupp, S. (2006). Comparisons of the Kernel equating method with the traditional equating methods on PRAXISTM data (ETS RR-06-30, December). Princeton, NJ: Educational Testing Service.
Marco, G. L. (1977). Item Characteristic Curve Solutions to Three Intractable Testing Problems. Journal of Educational Measurement, 14 (2), 139-160. Retrived from http://www.jstor.org/stable/1434012
Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
Michaelides, M. P. ( April, 2003). Sensitivity of IRT equating to the behavior of test equating items. Paper presented at the American Educational Research Association, Chicago, Illinois.
Mohandas, R. (1996). Test equating, problems and solutions: Equating English test forms for the Indonesian junior secondary schoool final examination administered in 1994. (Unpublished Doctoral Dissertation). Flinders University, South Australia.
Norman Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods.(Unpublished Doctoral Dissertation). University of Nebraska, Lincoln.
Nozawa, Y. (2009). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. (Unpublished Doctoral Dissertation), The University of Iowa, Iowa City.
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review(Otaru University of Commerce),51(1), 1-23. Retrived from https://www.researchgate.net/publication/241025868_Asymptotic_Standard_Errors_of_IRT_Equating_Coefficients_Using_Moments
Partchev, I. (2016). Package 'irtoys'. (Version 0.2.0). Retrieved from https://cran.r-project.org/web/packages/irtoys/irtoys.pdf
Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1993). Scaling, norming and equating. In Linn, R. L. (Ed.) Educational measurement (pp. 221-262). USA: The Oryx.
Potenza, M. T., and Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19(1), 23-37. doi:10.1177/014662169501900104
Rizopoulos, D. (2015). Package 'ltm'. Retrieved from https://cran.r-project.org/web/packages/ltm/ltm.pdf
Ryan, J., and Brockmann, F. (2009). A prictitioner’s introduction to equating. Retrieved from https://files.eric.ed.gov/fulltext/ED544690.pdf
Sarkar, D. (2017). Package “lattice”. Retrieved from https://cran.r-project.org/web/packages/lattice/lattice.pdf
Skaggs, G. (1990). To match or not to match samples on ability for equating: A discussion of five articles. Applied Measurement in Education, 3 (1), 105-113. doi:10.1207/s15324818ame0301_8
Skagss, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(2).309–330 Retrived from https://doi.org/10.1111/j.1745-3984.2005.00018.x
Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.3102/00346543056004495
Speron, E. (2009). A comparison of metric linking procedures in item response theory. (Unpublished Doctoral Dissertation).University Of Illinois, Illinois.
Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory. (Unpublished Doctoral Dissertation), University of Florida, Florida.
Stocking, M.L., and Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. doi:10.1177/014662168300700208
Sinharay, S., and Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16, May). Princeton NJ: Educational Testing Service.
Qu, Y. (2007). The effect of weighting in Kernel equating using counter-balanced designs. (Unpublished Doctoral Dissertation). Michigan State University, East Lansing.
Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. Retrieved from http://www.jstor.org/stable/1435244
Tsai, T. H. (March,1997). Estimating minumum sample sizes in random groups equating. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
Uysal, İ. (2014).Madde Tepki Kuramı'na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması. (Yayınlanmamış Yüksek Lisans Tezi). Abant İzzet Baysal Üniversitesi,Bolu.
von Davier, A.A. (2010). Statistical Models For Test Equating, Scaling and Linking. New York: Springer.
von Davier, A. A., and Wilson, C. (2007). IRT true-score test equating: A guide through assumptions and applications. Educational and Psychological Measurement, 67 (6), 940-957. doi:10.1177/0013164407301543
Way, W. D., and Tang, K. L. (April,1991,). A comparison of four logistic model equating methods. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, Illinois.
Weeks, J. P. (2010). Plink: An R package for linking mixed-format tests using IRT- based methods. Journal of Statistical Software, 35(12), 1-33. Retrieved from https://www.jstatsoft.org/article/view/v035i12
Woldbeck, T. (April, 1998). Basic concepts in modern methods of test equating. Paper presented at the Annual Meeting of the Southwest Psychological Association, New Orleans, Louisiana
Yang, W. L., and Houang, R. T. (April, 1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the Annual Meeting of the American Educational Research Asssociation, New York City, New York.
Zeng, L. (1991). Standard errors of linear equating for the single-group design. (ACT Research Report Series. 91–4, August). Iowa City, Iowa.
Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. (Unpublished Doctoral Dissertation). University of Massachusett, Amherst

Toplam 85 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Alan Eğitimleri
Bölüm	Araştırma Makalesi
Yazarlar	Ömay Çokluk-bökeoglu 0000-0002-3879-9204 Arzu Uçar 0000-0002-0099-1348 Ebru Balta 0000-0002-2173-7189
Yayımlanma Tarihi	31 Mart 2022
Yayımlandığı Sayı	Yıl 2022 Cilt: 55 Sayı: 1

Kaynak Göster

APA	Çokluk-bökeoglu, Ö., Uçar, A., & Balta, E. (2022). Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi. Ankara University Journal of Faculty of Educational Sciences (JFES), 55(1), 1-36. https://doi.org/10.30964/auebfd.1001128

Cited By

RANDOM GRUP DESENİ ALTINDA TAM MIRT EŞİTLEMEDE ÖRNEKLEM BÜYÜKLÜĞÜNÜN ETKİSİ

Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi

https://doi.org/10.21764/maeuefd.1361350

Makale Dosyaları

Tam Metin

Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, CC BY-NC-ND 4.0 lisansını kullanmaktadır.