Yıl 2014,
Cilt: 10 Sayı: 1, 120 - 136, 19.02.2014
Bilge Gök
,
Hülya Kelecioğlu
Öz
The purpose of this research was to equate the test for which were constructed in different conditions through scaling methods based on item response theory and to compare the results obtained from these methods. The research was conducted with using dichotomous simulated data which was consistent with two and three parameter logistic model. In order to equate two test forms “the commonitem nonequivalent groups” was used in this research. WINGEN3 program was utilized for data generation and 50 replication were done for 36 different condition used in this research. PARSCALE 4.1 was utilized for the prediction of item parameters and IRTEQ was utilized for test equating and scaling in separate calibration. The results obtained from this simulation study were evaluated based on equating error (RMSE) criterions. The results revealed that, when the conditions evaluated generally, the best equating occurred in 3000-subjects samples, 80-item tests, groups have similar ability distribution, using and mean-mean methods. Moreover, the results indicated that methods had less equating errors when large sample sizes together with long tests were used in groups which had similar ability distributions under the conditions considered in this research
Kaynakça
- Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162.
- Barnard, J. J. (1996). In search for equity in educational measurement: traditional versus modern equating methods. Paper presented at ASEESA’s national conference at the HSRC Conference Centre, Pretoria, South Africa.
- Bekci, B. (2007). Orta Öğretim Kurumları Öğrenci Seçme ve Yerleştirme Sınavı’nın değişen madde fonksiyonlarının cinsiyete ve okul türüne göre incelenmesi. Yüksek Lisans tezi, Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
- Bozdağ, S. & Kan, A. (2010). Şans başarısının eşitlemeye etkisi. H.Ü. Eğitim Fakültesi Dergisi, 39, 91- 108.
- Cao, L. (2008). Mixed format test equating: Effects of test dimensionality and common-item sets. Unpublished doctorate thesis, University of Maryland.
- Chu, K. L. & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Educational Research Association, Chicago.
- Chui, Z. & Kolen, M. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334- 347.
- Cook, L. L. & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods. Educational measurement: Issues and Practice. 10 (3), 37-45.
- Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
- Domaleski, C. S. (2006). Exploring the efficacy of pre-equating a large scale criterion-referenced assessment with respect to measurement equivalence. Unpublished doctorate thesis, Georgia State University, The College of Education, Atlanta, GA.
- Dongyang, L. (2009). Developing a common scale for testlet model parameter estimates under the common-item nonequivalent groups design. Unpublished doctorate thesis, University of Maryland.
- Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
- Han, K. T. (2007). WinGen2: Windows software that generates IRT parameters and item responses [computer program]. Amherst, MA: Center for Educational Assessment, University of Massachusetts Amherst.
- Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. Unpublished doctorate thesis, University of Massachusetts, Amherst.
- Hanson, B. A. & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24.
- Harwell, M., Stone, C. A., Hsu, T.-C. & Kirişci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125.
- He, Y. (2011). Evaluating equating properties for mixed-format tests. Unpublished doctoral dissertation, University of Iowa, Iowa City.
- Holland, P. W. & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (pp. 187–220). Westport, CT: Praeger Publishers.
- Hu, H., Rogers, T. W. & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32(4), 311-333.
- Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia.
- Kan, A. (2011). Test Eşitleme: OKS testlerinin istatistiksel eşitliğinin sınanması. Eğitim ve Bilim, 36(160), 38-51.
- Kang, T., & Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on measurement in education, San Diego, CA.
- Kaskowitz, G. S., & De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25, 39-52.
- Keller, L. A., & Keller, R. R. (2008). A comparison of transformation methods and calibration methods on the classification of students over time. Paper presented at the meeting of the Psychometric Society, Dover, NH.
- Keller, R. R., Kim, W., Nering, M. & Keller, L. A. (2007). What breaks the equating? A preliminary investigation into threats to a five-year equating chain. Paper presented at the 2007 AERA Annual Meeting, Chicago, IL.
- Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381.
- Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (ACT Research Report). Iowa City, IA: ACT, Inc.
- Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43(1), 53-76.
- Kolen, M. J. (1985). Standart errors of Tucker equating. Applied Psychological Measurement, 2, 209-223.
- Kolen, M. J. & Brennan, R. L. (1995). Test Equating: methods and practices. New York: Springer- Verlag.
- Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd. ed.). New York: Springer.
- Lee, G. & Fitzpatrick, A. R. (2008). A new approach to test score equating using ıtem response theory with fixed c-parameters. Asia Pacific Education Review, 3, 248–261.
- Lehman, R. S., & Bailey, D. E. (1968). Digital computing: Fortran IV and its applications in behavioral science. New York: John Wiley.
- Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. Unpublished doctorate thesis, University of Maryland.
- Li, Y. H. & Lissitz, R. W. (2000). An evaluation of multidimensional IRT linking. Applied Psychological Measurement, 24, 115-138
- Li, Y. H., Tam, H. P. & Tompkins, L. J. (2004). A comparison of using the fixed common-precalibrated parameter method and the matched characteristic curve method for linking multiple-test items. International Journal of Testing, 4(3), 267-293.
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N. J.: Lawrence Erlbaum.
- Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482
- Loyd, B. H. & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193.
- Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.
- Mislevy, R. J. & Stocking, M. L. (1989). A consumer’s guide to LOJIST and BILOG. Applied Psychological Measurement, 13, 57-75.
- Nozawa, Y. (2008). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. Unpublished doctorate thesis, The University of Iowa, Iowa City.
- Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
- Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53-67.
- Oh, S. (2000). Comparison of traditional and item response theory equating using arm and shoulder girdle muscular strength and endurance tests. Doctorate thesis, University of Georgia, Athens, Georgia.
- Öztürk, N. (2010). Akademik personel ve lisansüstü eğitimi giriş sınavı puanlarının eşitlenmesi üzerine bir çalışma. Yüksek Lisans Tezi, Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
- Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear scores equating models. In P. W. Holland & D. B. Rubin (Eds.) Test Equating, 71-136. New York: Academic Press.
- Sinharay, S. & Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16 Research report). Princeton NJ: Educational Testing Service.
- Skaggs, G. & Stevenson, J. (1989). A comparison of pseudo-bayesian and joint maximum likelihood procedures for estimating item parameters in the three-parameter IRT model. Applied Psychological Measurement, 13(4), 391-402.
- Speron, E. (2009). A comparison of metric linking procedures in Item Response Theory. Unpublished doctorate thesis, University of Illinois, Chicago, Illinois.
- Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
- Swaminathan, H. & Gifford, J. A. (1983). Estimation of parameters in the three- parameter latent trait model. In D. Weiss (Ed.), New horizons in testing. New York: Academic Press.
- Tsai, T. H. (1997). Estimating minimum sample sizes in random groups equating. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
- Way, W. D., & Tang, K. L. (1991, April). A comparison of four logistic model equating methods. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
- Wingersky, M. S. & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347–364.
- Wolkowitz, A. A. (2008). A comparison of classical test theory and item response theory methods for equating number-right scored to formula scored assessments. Unpublished doctorate thesis, University of Kansas.
- Yang, W. L. & Houang, R. T. (1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the annual meeting of the American Educational Research Asssociation, New York.
- Yen, W. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275-291.
- Zeng, L. (1991). Standard errors of linear equating for the single-group design. ACT Research Report Series. 91–4.
- Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Unpublished doctorate thesis, University of Massachusett.
Denk Olmayan Gruplarda Ortak Madde Deseni Kullanılarak Madde Tepki Kuramına Dayalı Eşitleme Yöntemlerinin Karşılaştırılması
Yıl 2014,
Cilt: 10 Sayı: 1, 120 - 136, 19.02.2014
Bilge Gök
,
Hülya Kelecioğlu
Öz
Özet: Bu araştırmada farklı koşullara göre türetilen test formlarını madde tepki kuramına dayalı kestirim yöntemlerini kullanarak eşitlemek ve bu yöntemlerden elde edilen sonuçları karşılaştırmak amaçlanmıştır. Araştırma iki ve üç parametreli lojistik modele uyumlu iki kategorili simülatif veriler kullanılarak yürütülmüştür. Eşitlemede “denk olmayan gruplarda ortak madde/test (NEAT) deseni” kullanılmıştır. Verilerin türetilmesinde WINGEN3 programından yararlanılmış ve araştırmada kullanılan 36 koşulun her biri için 50 tekrar yapılmıştır. Madde parametrelerinin kestirilmesi PARSCALE 4.1 ile ayrı kalibrasyon için test eşitleme ve ölçekleme IRTEQ ile yapılmıştır. Araştırmada elde edilen sonuçlar, eşitleme hatası (RMSE) ölçütüne göre değerlendirilmiştir. Araştırmanın sonunda, en düşük eşitleme hataları 3000 kişilik örneklemler, 80 maddelik testler, benzer yetenek dağılımına sahip gruplar ve ortalama-ortalama yönteminde elde edilmiştir. Ayrıca büyük örneklemler ile daha uzun testler kullanıldığında ve benzer yetenek dağılımına sahip gruplarda yöntemlerin daha az hatalı sonuç verdiği görülmüştür.
Kaynakça
- Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162.
- Barnard, J. J. (1996). In search for equity in educational measurement: traditional versus modern equating methods. Paper presented at ASEESA’s national conference at the HSRC Conference Centre, Pretoria, South Africa.
- Bekci, B. (2007). Orta Öğretim Kurumları Öğrenci Seçme ve Yerleştirme Sınavı’nın değişen madde fonksiyonlarının cinsiyete ve okul türüne göre incelenmesi. Yüksek Lisans tezi, Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
- Bozdağ, S. & Kan, A. (2010). Şans başarısının eşitlemeye etkisi. H.Ü. Eğitim Fakültesi Dergisi, 39, 91- 108.
- Cao, L. (2008). Mixed format test equating: Effects of test dimensionality and common-item sets. Unpublished doctorate thesis, University of Maryland.
- Chu, K. L. & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Educational Research Association, Chicago.
- Chui, Z. & Kolen, M. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334- 347.
- Cook, L. L. & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods. Educational measurement: Issues and Practice. 10 (3), 37-45.
- Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
- Domaleski, C. S. (2006). Exploring the efficacy of pre-equating a large scale criterion-referenced assessment with respect to measurement equivalence. Unpublished doctorate thesis, Georgia State University, The College of Education, Atlanta, GA.
- Dongyang, L. (2009). Developing a common scale for testlet model parameter estimates under the common-item nonequivalent groups design. Unpublished doctorate thesis, University of Maryland.
- Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
- Han, K. T. (2007). WinGen2: Windows software that generates IRT parameters and item responses [computer program]. Amherst, MA: Center for Educational Assessment, University of Massachusetts Amherst.
- Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. Unpublished doctorate thesis, University of Massachusetts, Amherst.
- Hanson, B. A. & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24.
- Harwell, M., Stone, C. A., Hsu, T.-C. & Kirişci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125.
- He, Y. (2011). Evaluating equating properties for mixed-format tests. Unpublished doctoral dissertation, University of Iowa, Iowa City.
- Holland, P. W. & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (pp. 187–220). Westport, CT: Praeger Publishers.
- Hu, H., Rogers, T. W. & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32(4), 311-333.
- Ironson, G.H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Ed.), Applications of item response theory (pp 155–174). Vancouver: Educational Research Institute of British Columbia.
- Kan, A. (2011). Test Eşitleme: OKS testlerinin istatistiksel eşitliğinin sınanması. Eğitim ve Bilim, 36(160), 38-51.
- Kang, T., & Petersen, N. S. (2009). Linking item parameters to a base scale. Paper presented at the National Council on measurement in education, San Diego, CA.
- Kaskowitz, G. S., & De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25, 39-52.
- Keller, L. A., & Keller, R. R. (2008). A comparison of transformation methods and calibration methods on the classification of students over time. Paper presented at the meeting of the Psychometric Society, Dover, NH.
- Keller, R. R., Kim, W., Nering, M. & Keller, L. A. (2007). What breaks the equating? A preliminary investigation into threats to a five-year equating chain. Paper presented at the 2007 AERA Annual Meeting, Chicago, IL.
- Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381.
- Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (ACT Research Report). Iowa City, IA: ACT, Inc.
- Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43(1), 53-76.
- Kolen, M. J. (1985). Standart errors of Tucker equating. Applied Psychological Measurement, 2, 209-223.
- Kolen, M. J. & Brennan, R. L. (1995). Test Equating: methods and practices. New York: Springer- Verlag.
- Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd. ed.). New York: Springer.
- Lee, G. & Fitzpatrick, A. R. (2008). A new approach to test score equating using ıtem response theory with fixed c-parameters. Asia Pacific Education Review, 3, 248–261.
- Lehman, R. S., & Bailey, D. E. (1968). Digital computing: Fortran IV and its applications in behavioral science. New York: John Wiley.
- Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common- item nonequivalent groups design. Unpublished doctorate thesis, University of Maryland.
- Li, Y. H. & Lissitz, R. W. (2000). An evaluation of multidimensional IRT linking. Applied Psychological Measurement, 24, 115-138
- Li, Y. H., Tam, H. P. & Tompkins, L. J. (2004). A comparison of using the fixed common-precalibrated parameter method and the matched characteristic curve method for linking multiple-test items. International Journal of Testing, 4(3), 267-293.
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N. J.: Lawrence Erlbaum.
- Lord, F.M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48, 477-482
- Loyd, B. H. & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193.
- Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.
- Mislevy, R. J. & Stocking, M. L. (1989). A consumer’s guide to LOJIST and BILOG. Applied Psychological Measurement, 13, 57-75.
- Nozawa, Y. (2008). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design. Unpublished doctorate thesis, The University of Iowa, Iowa City.
- Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
- Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53-67.
- Oh, S. (2000). Comparison of traditional and item response theory equating using arm and shoulder girdle muscular strength and endurance tests. Doctorate thesis, University of Georgia, Athens, Georgia.
- Öztürk, N. (2010). Akademik personel ve lisansüstü eğitimi giriş sınavı puanlarının eşitlenmesi üzerine bir çalışma. Yüksek Lisans Tezi, Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
- Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear scores equating models. In P. W. Holland & D. B. Rubin (Eds.) Test Equating, 71-136. New York: Academic Press.
- Sinharay, S. & Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (ETS RR-09-16 Research report). Princeton NJ: Educational Testing Service.
- Skaggs, G. & Stevenson, J. (1989). A comparison of pseudo-bayesian and joint maximum likelihood procedures for estimating item parameters in the three-parameter IRT model. Applied Psychological Measurement, 13(4), 391-402.
- Speron, E. (2009). A comparison of metric linking procedures in Item Response Theory. Unpublished doctorate thesis, University of Illinois, Chicago, Illinois.
- Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
- Swaminathan, H. & Gifford, J. A. (1983). Estimation of parameters in the three- parameter latent trait model. In D. Weiss (Ed.), New horizons in testing. New York: Academic Press.
- Tsai, T. H. (1997). Estimating minimum sample sizes in random groups equating. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
- Way, W. D., & Tang, K. L. (1991, April). A comparison of four logistic model equating methods. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
- Wingersky, M. S. & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347–364.
- Wolkowitz, A. A. (2008). A comparison of classical test theory and item response theory methods for equating number-right scored to formula scored assessments. Unpublished doctorate thesis, University of Kansas.
- Yang, W. L. & Houang, R. T. (1996). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at the annual meeting of the American Educational Research Asssociation, New York.
- Yen, W. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275-291.
- Zeng, L. (1991). Standard errors of linear equating for the single-group design. ACT Research Report Series. 91–4.
- Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Unpublished doctorate thesis, University of Massachusett.