Araştırma Makalesi
BibTex RIS Kaynak Göster

Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi

Yıl 2022, Cilt: 55 Sayı: 1, 1 - 36, 31.03.2022
https://doi.org/10.30964/auebfd.1001128

Öz

Bu araştırmada, Madde Tepki Kuramı’na (MTK) dayalı gerçek puan eşitlemede, ölçek dönüştürme yöntemlerinin (ortalama-ortalama (OO), ortalama-standart sapma (OS), Stocking-Lord (SL), Haebara (HB)) farklı koşullar altında eşitleme hatalarının karşılaştırılması amaçlanmıştır. Araştırmanın amacı doğrultusunda, yöntemlerin hatalarını karşılaştırmak için örneklem büyüklüğü (500, 1000, 3000, 10000), test uzunluğu (40, 50, 80), ortak madde oranı (%20-%30-%40), parametre kestirim modeli (iki ve üç parametreli lojistik model (2PLM ve 3PLM)) ve grupların yetenek dağılımı (benzer (N(0-1) - N(0-1)), farklı (N(0-1) - N(0.5,1)) koşulları altında 2PLM ve 3PLM’ye uyumlu iki kategorili 50 yineleme ile 7200 veri seti oluşturulmuştur. Veri toplamı deseni olarak “denk olmayan gruplarda ortak madde/test (NEAT) deseni” kullanılmıştır. Veri üretiminde ve analizinde R yazılımı kullanılmıştır. Araştırmadan elde edilen bulgular, eşitleme hatası (RMSD) ölçütüne göre değerlendirilmiştir. Çalışmanın sonucunda, tüm koşullar göz önünde bulundurulduğunda, SL yönteminin RMSD değerlerinin, diğer yöntemlere göre daha yüksek olduğu görülmekle birlikte, OO ve OS yöntemlerinin birbirine benzer RMSD değerleri ürettiği görülmüştür. Ayrıca, ölçek dönüştürme yöntemlerine ilişkin RMSD değerleri karşılaştırıldığında, 2PLM ve 3PLM’nin kullanıldığı durumlarda benzer sonuçlar elde edildiği, örneklem büyüklüğü ve test uzunluğu arttıkça SL yöntemi dışında diğer yöntemlerin eşitleme hatalarında azalma oluştuğu ve ortak madde oranının %40 ve grupların yetenek dağılımının benzer olduğu durumlarda, yöntemlerin, RMSD değerlerinin daha düşük olduğu gözlenmiştir.

Kaynakça

  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd Eds.) (pp. 508–600). Washington, DC: American Council on Education.
  • Angoff, W. H. (1984). Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service.
  • Babcock, B., Albano, A., and Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608–628. doi:10.1177/0013164411428609
  • Baker, F. B., and Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162. Retrived from http://www.jstor.org/stable/1434796
  • Barnard, J. J. (1996). In search of equity in educational measurement: traditional versus modern equating methods. Paper presented at the ASEESA’s National Conference at the HSRC Conference Center, Pretoria, South Africa.
  • Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13-20. Retrived from http://www.jstor.org/stable/1434562
  • Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
  • Chon, K. H., Lee, W. C., and Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Casma Research Report, 26, November). Retrived from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-26.pdf
  • Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. USA: Harcourt Brace Jovanovich College.
  • Cui, Z.,and Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. doi: 10.1177/0146621607300854
  • Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3 (1), 3-17. doi:10.1207/s15324818ame0301_2
  • Dorans, N. J., and Holland P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37 (4), 281-306. doi:10.1111/j.1745-3984.2000.tb01088.x
  • Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating (ETS Research Report, 41, December). Retrived from https://www.ets.org/Media/Research/pdf/RR-10-29.pdf
  • Eid, G. K. (2005). The effects of sample size on the equating of test items. Education, 126 (1), 165-180. Retrived from https://www.thefreelibrary.com/The+effects+of+sample+size+on+the+equating+of+test+items.-a0136846803
  • Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Unpublished Doctoral Dissertation). The University of North Carolina, Chapel Hill.
  • González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1-30. Retrived from https://www.jstatsoft.org/index
  • Gök, B. ve Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. doi:10.17860/efd.78698
  • Felan, G. D. ( February,2002). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144-149. Retrived from https://doi.org/10.4992/psycholres1954.22.144
  • Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Hanson, B. A., and Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
  • Harris, D. J., and Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240. doi:10.1207/s15324818ame0603_3
  • Harwell, M., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20 (2), 101-125. Retrived from https://conservancy.umn.edu/bitstream/handle/11299/119086/1/v20n2p101.pdf
  • He, Q. (2010). Maintaining standards in on-demand testing using item response theory (Office of Qualifications and Examinations Regulation (OFQUAL), February). Retrived from.https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/605861/0210_QingpingHe_Maintaining-standards.pdf
  • He, Y. (2011). Evaluating equating properties for mixed-format tests. (Unpublished Doctoral Dissertation). University of Iowa, Iowa City.
  • Holland, P. W., and Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement (4th Eds) (pp. 187-220). Westport, CT: Praeger.
  • Holland, P. W., Dorans, N. J., and Petersen, N. S. (2007). Equating test scores. In C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (pp. 169-203). Oxford, UK: Elsevier
  • Hu, H., Rogers, T. W., and Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presenceof outlier common items. Applied Psychological Measurement, 32(4), 311-333. doi:10.1177/0146621606292215 Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia
  • Kang, T., and Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on Measurement in Education, San Diego, CA.
  • Karkee, T. B., and Wright, K. R. (2004). Evaluation of Linking Methods for Placing ThreeParameter Logistic Item Parameter Estimates onto a One-Parameter Scale. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.
  • Kaskowitz, G. S., and De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25 (1), 39-52. doi:10.1177/01466216010251003
  • Kolen, M. J. (1981). Comparison of Traditional and Item Response Theory Methods for Equating Tests. Journal of Educational Measurement, 18 (1), 1-11. Retrived from http://www.jstor.org/stable/1434813
  • Kolen, M. J. (1985). Standard errors of Tucker Equating. Applied Psychological Measurement, 9 (2), 209-223, doi: 10.1177/014662168500900209
  • Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7, 29-36. Retrived from https://eric.ed.gov/?id=EJ388096
  • Kolen, M. J., and Brennan, R. L. (1995). Test Equating: methods and practices. New York: SpringerVerlag
  • Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York:Springer.
  • Kilmen, S. (2010). Madde tepki kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması. (Yayınlanmamış doktora tezi). Ankara Üniversitesi, Ankara.
  • Kim, S., and Cohen, A. S. (1998). A comprasion of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131-143. doi:10.1177/01466216980222003
  • Kim, S., and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26 (3), 255-270. doi:10.1177/0146621602026003002
  • Kim, S., and Lee, W. C. (2004). IRT scale linking methods for mixed-format tests(ACT Research Report 5, December). Retrived from https://www.act.org/content/dam/act/unsecured/documents/ACT_RR2004-5.pdf
  • Kim, S., and Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43 (1), 53-76. Retrived from http://www.jstor.org/stable/20461809
  • Kim, S., and Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. doi:10.1207/s15324818ame1904_7
  • Lee, Y. S. (2007). A comparison of methods for nonparametric estimation of item characteristic curves for binary items. Applied Psychological Measurement, 31 (2), 121-134. doi: 10.1177/0146621606290248.
  • Lee, G., and Fitzpatrick, A. R. (2008). A new approach to test score equating using item response theory with fixed c-parameters. Asia Pacific Education Review, 9(3), 248–261. doi:10.1007/bf03026714
  • Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
  • Li, Y. H., and Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. doi: 10.1177/01466216000242002.
  • Liou, M., Cheng, P. E., and Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21 (4), 349-369.doi: 10.1177/01466216970214005
  • Livingston, S. A., and Kim, S. (2010). Random-Groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175–185. doi:10.1111/j.1745-3984.2010.00107.x
  • Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1982.tb01306.x
  • Loyd, B. H., and Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.23338504.1982.tb01306.x
  • Mao, X., von Davier, A. A., and Rupp, S. (2006). Comparisons of the Kernel equating method with the traditional equating methods on PRAXISTM data (ETS RR-06-30, December). Princeton, NJ: Educational Testing Service.
  • Marco, G. L. (1977). Item Characteristic Curve Solutions to Three Intractable Testing Problems. Journal of Educational Measurement, 14 (2), 139-160. Retrived from http://www.jstor.org/stable/1434012
  • Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Michaelides, M. P. ( April, 2003). Sensitivity of IRT equating to the behavior of test equating items. Paper presented at the American Educational Research Association, Chicago, Illinois.
  • Mohandas, R. (1996). Test equating, problems and solutions: Equating English test forms for the Indonesian junior secondary schoool final examination administered in 1994. (Unpublished Doctoral Dissertation). Flinders University, South Australia.
  • Norman Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods.(Unpublished Doctoral Dissertation). University of Nebraska, Lincoln.
  • Nozawa, Y. (2009). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. (Unpublished Doctoral Dissertation), The University of Iowa, Iowa City.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review(Otaru University of Commerce),51(1), 1-23. Retrived from https://www.researchgate.net/publication/241025868_Asymptotic_Standard_Errors_of_IRT_Equating_Coefficients_Using_Moments
  • Partchev, I. (2016). Package 'irtoys'. (Version 0.2.0). Retrieved from https://cran.r-project.org/web/packages/irtoys/irtoys.pdf
  • Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1993). Scaling, norming and equating. In Linn, R. L. (Ed.) Educational measurement (pp. 221-262). USA: The Oryx.
  • Potenza, M. T., and Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19(1), 23-37. doi:10.1177/014662169501900104
  • Rizopoulos, D. (2015). Package 'ltm'. Retrieved from https://cran.r-project.org/web/packages/ltm/ltm.pdf
  • Ryan, J., and Brockmann, F. (2009). A prictitioner’s introduction to equating. Retrieved from https://files.eric.ed.gov/fulltext/ED544690.pdf
  • Sarkar, D. (2017). Package “lattice”. Retrieved from https://cran.r-project.org/web/packages/lattice/lattice.pdf
  • Skaggs, G. (1990). To match or not to match samples on ability for equating: A discussion of five articles. Applied Measurement in Education, 3 (1), 105-113. doi:10.1207/s15324818ame0301_8
  • Skagss, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(2).309–330 Retrived from https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.3102/00346543056004495
  • Speron, E. (2009). A comparison of metric linking procedures in item response theory. (Unpublished Doctoral Dissertation).University Of Illinois, Illinois.
  • Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory. (Unpublished Doctoral Dissertation), University of Florida, Florida.
  • Stocking, M.L., and Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. doi:10.1177/014662168300700208
  • Sinharay, S., and Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16, May). Princeton NJ: Educational Testing Service.
  • Qu, Y. (2007). The effect of weighting in Kernel equating using counter-balanced designs. (Unpublished Doctoral Dissertation). Michigan State University, East Lansing.
  • Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. Retrieved from http://www.jstor.org/stable/1435244
  • Tsai, T. H. (March,1997). Estimating minumum sample sizes in random groups equating. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Uysal, İ. (2014).Madde Tepki Kuramı'na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması. (Yayınlanmamış Yüksek Lisans Tezi). Abant İzzet Baysal Üniversitesi,Bolu.
  • von Davier, A.A. (2010). Statistical Models For Test Equating, Scaling and Linking. New York: Springer.
  • von Davier, A. A., and Wilson, C. (2007). IRT true-score test equating: A guide through assumptions and applications. Educational and Psychological Measurement, 67 (6), 940-957. doi:10.1177/0013164407301543
  • Way, W. D., and Tang, K. L. (April,1991,). A comparison of four logistic model equating methods. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, Illinois.
  • Weeks, J. P. (2010). Plink: An R package for linking mixed-format tests using IRT- based methods. Journal of Statistical Software, 35(12), 1-33. Retrieved from https://www.jstatsoft.org/article/view/v035i12
  • Woldbeck, T. (April, 1998). Basic concepts in modern methods of test equating. Paper presented at the Annual Meeting of the Southwest Psychological Association, New Orleans, Louisiana
  • Yang, W. L., and Houang, R. T. (April, 1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the Annual Meeting of the American Educational Research Asssociation, New York City, New York.
  • Zeng, L. (1991). Standard errors of linear equating for the single-group design. (ACT Research Report Series. 91–4, August). Iowa City, Iowa.
  • Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. (Unpublished Doctoral Dissertation). University of Massachusett, Amherst

Investigation of Scale Transformation Methods in True Score Equating Based on Item Response Theory

Yıl 2022, Cilt: 55 Sayı: 1, 1 - 36, 31.03.2022
https://doi.org/10.30964/auebfd.1001128

Öz

In this research, it was aimed to compare equating errors of scale transformation methods (mean-mean (MM), mean-sigma (MS), Heabera (HB) and Stocking-Lord (SL)) in true score equating based on item response theory (IRT) under different conditions. In line with the purpose of the study, 7200 dichotomous data sets which were consistent with two and three- parameter logistic model were generated with 50 replication under the conditions of sample size (500, 1,000, 3,000, 10,000), test length (40, 50, 80), rate of the common item (20%, 30%, 40%), type of model used in parameter estimation (two and three-parameter logistic models (2PLM and 3PLM)), and ability distribution of groups (similar (N(0-1) - N(0-1)), different (N(0-1) - N(0.5,1)) for the obtained performance of methods. Common item nonequivalent groups equating design was used. R software was used for data generation and analyses. The results obtained from the study were evaluated by using equating error (RMSD) criterion. As a result of the study, considering all the conditions, it was seen that the RMSD values of the SL method were higher than the other methods, but it was seen that the MM and MS methods produced similar RMSD values. In addition, when the RMSD values of the scale transformation methods are compared, similar results are obtained in cases where 2PLM and 3PLM are used, as the sample size and test length increase, equating errors of other methods except the SL method decrease, and It was observed that the methods had lower RMSD values in cases where the common item rate is 40% and the ability distribution of the groups is similar.

Kaynakça

  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd Eds.) (pp. 508–600). Washington, DC: American Council on Education.
  • Angoff, W. H. (1984). Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service.
  • Babcock, B., Albano, A., and Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608–628. doi:10.1177/0013164411428609
  • Baker, F. B., and Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162. Retrived from http://www.jstor.org/stable/1434796
  • Barnard, J. J. (1996). In search of equity in educational measurement: traditional versus modern equating methods. Paper presented at the ASEESA’s National Conference at the HSRC Conference Center, Pretoria, South Africa.
  • Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13-20. Retrived from http://www.jstor.org/stable/1434562
  • Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
  • Chon, K. H., Lee, W. C., and Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Casma Research Report, 26, November). Retrived from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-26.pdf
  • Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. USA: Harcourt Brace Jovanovich College.
  • Cui, Z.,and Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. doi: 10.1177/0146621607300854
  • Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3 (1), 3-17. doi:10.1207/s15324818ame0301_2
  • Dorans, N. J., and Holland P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37 (4), 281-306. doi:10.1111/j.1745-3984.2000.tb01088.x
  • Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating (ETS Research Report, 41, December). Retrived from https://www.ets.org/Media/Research/pdf/RR-10-29.pdf
  • Eid, G. K. (2005). The effects of sample size on the equating of test items. Education, 126 (1), 165-180. Retrived from https://www.thefreelibrary.com/The+effects+of+sample+size+on+the+equating+of+test+items.-a0136846803
  • Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Unpublished Doctoral Dissertation). The University of North Carolina, Chapel Hill.
  • González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1-30. Retrived from https://www.jstatsoft.org/index
  • Gök, B. ve Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. doi:10.17860/efd.78698
  • Felan, G. D. ( February,2002). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144-149. Retrived from https://doi.org/10.4992/psycholres1954.22.144
  • Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Hanson, B. A., and Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
  • Harris, D. J., and Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240. doi:10.1207/s15324818ame0603_3
  • Harwell, M., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20 (2), 101-125. Retrived from https://conservancy.umn.edu/bitstream/handle/11299/119086/1/v20n2p101.pdf
  • He, Q. (2010). Maintaining standards in on-demand testing using item response theory (Office of Qualifications and Examinations Regulation (OFQUAL), February). Retrived from.https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/605861/0210_QingpingHe_Maintaining-standards.pdf
  • He, Y. (2011). Evaluating equating properties for mixed-format tests. (Unpublished Doctoral Dissertation). University of Iowa, Iowa City.
  • Holland, P. W., and Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement (4th Eds) (pp. 187-220). Westport, CT: Praeger.
  • Holland, P. W., Dorans, N. J., and Petersen, N. S. (2007). Equating test scores. In C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (pp. 169-203). Oxford, UK: Elsevier
  • Hu, H., Rogers, T. W., and Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presenceof outlier common items. Applied Psychological Measurement, 32(4), 311-333. doi:10.1177/0146621606292215 Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia
  • Kang, T., and Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on Measurement in Education, San Diego, CA.
  • Karkee, T. B., and Wright, K. R. (2004). Evaluation of Linking Methods for Placing ThreeParameter Logistic Item Parameter Estimates onto a One-Parameter Scale. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.
  • Kaskowitz, G. S., and De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25 (1), 39-52. doi:10.1177/01466216010251003
  • Kolen, M. J. (1981). Comparison of Traditional and Item Response Theory Methods for Equating Tests. Journal of Educational Measurement, 18 (1), 1-11. Retrived from http://www.jstor.org/stable/1434813
  • Kolen, M. J. (1985). Standard errors of Tucker Equating. Applied Psychological Measurement, 9 (2), 209-223, doi: 10.1177/014662168500900209
  • Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7, 29-36. Retrived from https://eric.ed.gov/?id=EJ388096
  • Kolen, M. J., and Brennan, R. L. (1995). Test Equating: methods and practices. New York: SpringerVerlag
  • Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York:Springer.
  • Kilmen, S. (2010). Madde tepki kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması. (Yayınlanmamış doktora tezi). Ankara Üniversitesi, Ankara.
  • Kim, S., and Cohen, A. S. (1998). A comprasion of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131-143. doi:10.1177/01466216980222003
  • Kim, S., and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26 (3), 255-270. doi:10.1177/0146621602026003002
  • Kim, S., and Lee, W. C. (2004). IRT scale linking methods for mixed-format tests(ACT Research Report 5, December). Retrived from https://www.act.org/content/dam/act/unsecured/documents/ACT_RR2004-5.pdf
  • Kim, S., and Lee, W. (2006). An extension of four IRT linking methods for mixed- format tests. Journal of Educational Measurement, 43 (1), 53-76. Retrived from http://www.jstor.org/stable/20461809
  • Kim, S., and Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. doi:10.1207/s15324818ame1904_7
  • Lee, Y. S. (2007). A comparison of methods for nonparametric estimation of item characteristic curves for binary items. Applied Psychological Measurement, 31 (2), 121-134. doi: 10.1177/0146621606290248.
  • Lee, G., and Fitzpatrick, A. R. (2008). A new approach to test score equating using item response theory with fixed c-parameters. Asia Pacific Education Review, 9(3), 248–261. doi:10.1007/bf03026714
  • Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. (Unpublished Doctoral Dissertation).University of Maryland, Maryland.
  • Li, Y. H., and Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. doi: 10.1177/01466216000242002.
  • Liou, M., Cheng, P. E., and Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21 (4), 349-369.doi: 10.1177/01466216970214005
  • Livingston, S. A., and Kim, S. (2010). Random-Groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175–185. doi:10.1111/j.1745-3984.2010.00107.x
  • Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1982.tb01306.x
  • Loyd, B. H., and Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193. Retrived from https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.23338504.1982.tb01306.x
  • Mao, X., von Davier, A. A., and Rupp, S. (2006). Comparisons of the Kernel equating method with the traditional equating methods on PRAXISTM data (ETS RR-06-30, December). Princeton, NJ: Educational Testing Service.
  • Marco, G. L. (1977). Item Characteristic Curve Solutions to Three Intractable Testing Problems. Journal of Educational Measurement, 14 (2), 139-160. Retrived from http://www.jstor.org/stable/1434012
  • Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods. (Unpublished Doctoral Dissertation). University of Massachusetts, Amherst.
  • Michaelides, M. P. ( April, 2003). Sensitivity of IRT equating to the behavior of test equating items. Paper presented at the American Educational Research Association, Chicago, Illinois.
  • Mohandas, R. (1996). Test equating, problems and solutions: Equating English test forms for the Indonesian junior secondary schoool final examination administered in 1994. (Unpublished Doctoral Dissertation). Flinders University, South Australia.
  • Norman Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods.(Unpublished Doctoral Dissertation). University of Nebraska, Lincoln.
  • Nozawa, Y. (2009). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. (Unpublished Doctoral Dissertation), The University of Iowa, Iowa City.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review(Otaru University of Commerce),51(1), 1-23. Retrived from https://www.researchgate.net/publication/241025868_Asymptotic_Standard_Errors_of_IRT_Equating_Coefficients_Using_Moments
  • Partchev, I. (2016). Package 'irtoys'. (Version 0.2.0). Retrieved from https://cran.r-project.org/web/packages/irtoys/irtoys.pdf
  • Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1993). Scaling, norming and equating. In Linn, R. L. (Ed.) Educational measurement (pp. 221-262). USA: The Oryx.
  • Potenza, M. T., and Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19(1), 23-37. doi:10.1177/014662169501900104
  • Rizopoulos, D. (2015). Package 'ltm'. Retrieved from https://cran.r-project.org/web/packages/ltm/ltm.pdf
  • Ryan, J., and Brockmann, F. (2009). A prictitioner’s introduction to equating. Retrieved from https://files.eric.ed.gov/fulltext/ED544690.pdf
  • Sarkar, D. (2017). Package “lattice”. Retrieved from https://cran.r-project.org/web/packages/lattice/lattice.pdf
  • Skaggs, G. (1990). To match or not to match samples on ability for equating: A discussion of five articles. Applied Measurement in Education, 3 (1), 105-113. doi:10.1207/s15324818ame0301_8
  • Skagss, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(2).309–330 Retrived from https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.3102/00346543056004495
  • Speron, E. (2009). A comparison of metric linking procedures in item response theory. (Unpublished Doctoral Dissertation).University Of Illinois, Illinois.
  • Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory. (Unpublished Doctoral Dissertation), University of Florida, Florida.
  • Stocking, M.L., and Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. doi:10.1177/014662168300700208
  • Sinharay, S., and Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16, May). Princeton NJ: Educational Testing Service.
  • Qu, Y. (2007). The effect of weighting in Kernel equating using counter-balanced designs. (Unpublished Doctoral Dissertation). Michigan State University, East Lansing.
  • Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. Retrieved from http://www.jstor.org/stable/1435244
  • Tsai, T. H. (March,1997). Estimating minumum sample sizes in random groups equating. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Uysal, İ. (2014).Madde Tepki Kuramı'na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması. (Yayınlanmamış Yüksek Lisans Tezi). Abant İzzet Baysal Üniversitesi,Bolu.
  • von Davier, A.A. (2010). Statistical Models For Test Equating, Scaling and Linking. New York: Springer.
  • von Davier, A. A., and Wilson, C. (2007). IRT true-score test equating: A guide through assumptions and applications. Educational and Psychological Measurement, 67 (6), 940-957. doi:10.1177/0013164407301543
  • Way, W. D., and Tang, K. L. (April,1991,). A comparison of four logistic model equating methods. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, Illinois.
  • Weeks, J. P. (2010). Plink: An R package for linking mixed-format tests using IRT- based methods. Journal of Statistical Software, 35(12), 1-33. Retrieved from https://www.jstatsoft.org/article/view/v035i12
  • Woldbeck, T. (April, 1998). Basic concepts in modern methods of test equating. Paper presented at the Annual Meeting of the Southwest Psychological Association, New Orleans, Louisiana
  • Yang, W. L., and Houang, R. T. (April, 1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the Annual Meeting of the American Educational Research Asssociation, New York City, New York.
  • Zeng, L. (1991). Standard errors of linear equating for the single-group design. (ACT Research Report Series. 91–4, August). Iowa City, Iowa.
  • Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. (Unpublished Doctoral Dissertation). University of Massachusett, Amherst
Toplam 85 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Alan Eğitimleri
Bölüm Araştırma Makalesi
Yazarlar

Ömay Çokluk-bökeoglu 0000-0002-3879-9204

Arzu Uçar 0000-0002-0099-1348

Ebru Balta 0000-0002-2173-7189

Yayımlanma Tarihi 31 Mart 2022
Yayımlandığı Sayı Yıl 2022 Cilt: 55 Sayı: 1

Kaynak Göster

APA Çokluk-bökeoglu, Ö., Uçar, A., & Balta, E. (2022). Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi. Ankara University Journal of Faculty of Educational Sciences (JFES), 55(1), 1-36. https://doi.org/10.30964/auebfd.1001128
Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi (AÜEBFD), Ankara Üniversitesi Yayınevi'nin kurumsal dergisidir. 

Creative Commons License AUEBFD'nin tüm İçerikleri Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License kuralları çerçevesinde lisanslanmaktadır.

AUEBFD CC BY-NC-ND 4.0 lisansını kullanmaktadır.