Research Article
PDF BibTex RIS Cite

Year 2023, Volume: 14 Issue: 2, 106 - 117, 30.06.2023
https://doi.org/10.21031/epod.1210917

Abstract

References

  • Atılgan, H. (2008). Using generalizability theory to assess the score realibility of the special ability selection examinations for music education programmes in higher education. International Journal of Research and Method Education, 31(1), 63-76. https://doi.org/10.1080/17437270801919925.
  • Atılgan, H., Kan, A. & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme. (5. Baskı). Anı Yayıncılık.
  • Balbağ, M., Leblebicier, K., Karaer G., Sarıkahya E. & Erkan Ö. (2016). Türkiye'de fen eğitimi ve öğretimi sorunları. Eğitim ve Öğretim Araştırmaları Dergisi, 5(3), 1-12. http://www.jret.org/FileUpload/ks281142/File/02.m._zafer_balbag.pdf
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM
  • Bernardin, H. J. & Villanova, P. (2005). Research streams in rater self-efficacy. Group and Organizational Management, 30, 61-88. https://doi.org/10.1177/1059601104267675
  • Biemer, L. (1993). Trends-social studies /authentic assessment. Educational Leadership, 50 (8). https://www.ascd.org/el/articles/-authentic-assessment
  • Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
  • Demir, E. (2010). Uluslararası öğrenci değerlendirme programı (PISA) bilişsel alan testlerinde yer alan soru tiplerine göre Türkiye’de öğrenci başarıları (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi.
  • EARGED (2010). PISA 2009 projesi, ulusal ön raporu. 15 Mart 2011 tarihinde http://earged.meb.gov.tr/pdf/pisa2009rapor.pdf adresinden erişilmiştir.
  • Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercises Science, 5(1), 13-34. https://doi.org/10.1207/S15327841MPEE0501_2
  • Güler, N. (2013). Eğitimde ölçme ve değerlendirme (5. Baskı). Pegem Akademi.
  • Hathcoat, J. D., & Penn, J. D. (2012). Generalizability of student writing across multiple tasks: A challenge for authentic assessment. Research & Practice in Assessment, 7, 16-28. https://files.eric.ed.gov/fulltext/EJ1062689.pdf
  • Karasar, N. (1998). Araştırmalarda rapor hazırlama yöntemi. Pars Matbaacılık
  • Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task and scoring method contribution. Language Testing in Asia 11, Article 30 https://doi.org/10.1186/s40468-021-00134-5
  • Konak, Ö. A. (2010). Eğitim ve öğretim etkinlikleri üzerine. Cito Eğitim: Kuram ve Uygulama Dergisi, 10, 4-5.
  • Kutlu, Ö. (2006). Üst düzey zihinsel süreçleri belirleme yolları: Yeni durum belirleme yaklaşımları. Çağdaş Eğitim Dergisi, 31(335), 15-21. https://search.trdizin.gov.tr/tr/yayin/detay/74516/
  • Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. ETS. http://www.ets.org/Media/Research/pdf/RM-04-07.pdf
  • Mcbee, M., & Barnes, L. (1998), The generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194. https://doi.org/10.1207/s15324818ame1102_4
  • MEB (2017). Akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) projesi. 1 Eylül 2022 tarihinde http://abide.meb.gov.tr/proje-hakkinda.asp adresinden erişilmiştir.
  • Mushquash, C., & O’Connor, B.P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods 38, 542–547 https://doi.org/10.3758/BF03192810
  • Nalbantoğlu, F. & Gelbal, S. (2011). İletişim becerileri istasyonu örneğinde genellenebilirlik kuramıyla farklı desenlerin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 509-518. http://www.efdergi.hacettepe.edu.tr/shw_artcl-718.html
  • OECD (2012). PISA 2009 technical report, PISA, OECD Publishing. http://dx.doi.org/10.1787/9789264167872-en
  • OECD (2017), OECD (2017), PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, PISA, OECD Publishing http://dx.doi.org/10.1787/9789264281820-en
  • ÖSYM (2013). Açık uçlu sorularla deneme sınavı: Soru/cevap kitapçığının yayımlanması www.osym.gov.tr/belge/1-19413/acik-uclu-sorularla-deneme-sinavi-sorucevap-kitapcigini-.html adresinden erişim sağlanmıştır.
  • ÖSYM. (2017). Açık uçlu sorular hakkında bilgilendirme ve açık uçlu soru örnekleri. https://www.osym.gov.tr/TR,12909/2017-lisans-yerlestirme-sinavlari-2017-lys-acik-uclu-sorular-hakkinda-bilgilendirme-ve-acik-uclu-soru-ornekleri-05012017.html adresinden erişim sağlanmıştır.
  • Özçelik, D. A. (2010). Ölçme ve değerlendirme. Pegem Akademi.
  • Polat, M. & Turhan, N. (2021) Applying generalizability theory in language testing: Comparing nested and crossed scoring designs in the assessment of speaking skills, International Journal of Curriculum and Instruction,13(3), 3344–3358. https://ijci.globets.org/index.php/IJCI/article/view/825/409
  • Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94 (1), 31-37. https://doi.org/10.5951/MT.94.1.0031
  • Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing 22(1) 1-30. https://doi.org/10.1191/0265532205lt295oa
  • Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970 https://doi.org/10.1037/0021-9010.85.6.956
  • Sharma, F. & Weathers, D. (2003). Assessing generalizability of scales used in cross-national research. International Journal of Research in Marketing, 20, 287-295. http://dx.doi.org/10.1016/S0167-8116(03)00038-7
  • Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications
  • Smith, Teresa A. (1997 March 24-28). The Generalizability of Scoring TIMSS Open-Ended Items. (Report). Annual Meeting of the American Educational Research Association, Chicago, USA
  • Turgut, F. M. (1992) Eğitimde ölçme ve değerlendirme metotları. (9. Baskı). Saydam Matbaacılık.
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287. https://doi.org/10.1177/026553229801500205
  • Wexley, K. N. & Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275. https://psycnet.apa.org/doi/10.1111/j.2044-8325.1985.tb00200.x
  • Zorba, İ. (2020). Personel alımında kullanılan bir yazılı sınav sonucunun genellenebilirlik kuramındaki farklı desenlerle karşılaştırılması (Yayımlanmamış yüksek lisans tezi). Ankara Üniversitesi.

A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Year 2023, Volume: 14 Issue: 2, 106 - 117, 30.06.2023
https://doi.org/10.21031/epod.1210917

Abstract

This study compares the different designs obtained through four raters’ scoring the open-ended items used in PISA 2009 reading literacy altogether or alternately according to the Generalizability Theory. The sample of the research was composed of 362 students (out of 4996 students participating in PISA 2009) who responded to the items of reading skills and who were scored by more than one rater. Two designs were created so as to be used in generalizability theory in the study. One of them was the crossed design symbolized as “s x i x r” (student x item x rater), in which students are scored by each rater in terms of the same skills. The second was the nested design symbolized as “(r:s) x i”, where each rater scored only a group of students and raters are nested in students and the items were crossed with these variables. On comparing the s x i x r design with (r:s) x i design, it was found that the relative and absolute error variances estimated for (r:s) x i design were smaller than those for s x i x r design and that therefore the G and Phi coefficients took on bigger values. On increasing the number of raters in both designs, the G and Phi coefficients also increased in the D study. While acceptable values of G and Phi coefficients were reached on reducing the number of raters by half in Booklet 2, raising the number of raters seemed more appropriate in Booklet 8.

References

  • Atılgan, H. (2008). Using generalizability theory to assess the score realibility of the special ability selection examinations for music education programmes in higher education. International Journal of Research and Method Education, 31(1), 63-76. https://doi.org/10.1080/17437270801919925.
  • Atılgan, H., Kan, A. & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme. (5. Baskı). Anı Yayıncılık.
  • Balbağ, M., Leblebicier, K., Karaer G., Sarıkahya E. & Erkan Ö. (2016). Türkiye'de fen eğitimi ve öğretimi sorunları. Eğitim ve Öğretim Araştırmaları Dergisi, 5(3), 1-12. http://www.jret.org/FileUpload/ks281142/File/02.m._zafer_balbag.pdf
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM
  • Bernardin, H. J. & Villanova, P. (2005). Research streams in rater self-efficacy. Group and Organizational Management, 30, 61-88. https://doi.org/10.1177/1059601104267675
  • Biemer, L. (1993). Trends-social studies /authentic assessment. Educational Leadership, 50 (8). https://www.ascd.org/el/articles/-authentic-assessment
  • Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
  • Demir, E. (2010). Uluslararası öğrenci değerlendirme programı (PISA) bilişsel alan testlerinde yer alan soru tiplerine göre Türkiye’de öğrenci başarıları (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi.
  • EARGED (2010). PISA 2009 projesi, ulusal ön raporu. 15 Mart 2011 tarihinde http://earged.meb.gov.tr/pdf/pisa2009rapor.pdf adresinden erişilmiştir.
  • Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercises Science, 5(1), 13-34. https://doi.org/10.1207/S15327841MPEE0501_2
  • Güler, N. (2013). Eğitimde ölçme ve değerlendirme (5. Baskı). Pegem Akademi.
  • Hathcoat, J. D., & Penn, J. D. (2012). Generalizability of student writing across multiple tasks: A challenge for authentic assessment. Research & Practice in Assessment, 7, 16-28. https://files.eric.ed.gov/fulltext/EJ1062689.pdf
  • Karasar, N. (1998). Araştırmalarda rapor hazırlama yöntemi. Pars Matbaacılık
  • Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task and scoring method contribution. Language Testing in Asia 11, Article 30 https://doi.org/10.1186/s40468-021-00134-5
  • Konak, Ö. A. (2010). Eğitim ve öğretim etkinlikleri üzerine. Cito Eğitim: Kuram ve Uygulama Dergisi, 10, 4-5.
  • Kutlu, Ö. (2006). Üst düzey zihinsel süreçleri belirleme yolları: Yeni durum belirleme yaklaşımları. Çağdaş Eğitim Dergisi, 31(335), 15-21. https://search.trdizin.gov.tr/tr/yayin/detay/74516/
  • Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. ETS. http://www.ets.org/Media/Research/pdf/RM-04-07.pdf
  • Mcbee, M., & Barnes, L. (1998), The generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194. https://doi.org/10.1207/s15324818ame1102_4
  • MEB (2017). Akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) projesi. 1 Eylül 2022 tarihinde http://abide.meb.gov.tr/proje-hakkinda.asp adresinden erişilmiştir.
  • Mushquash, C., & O’Connor, B.P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods 38, 542–547 https://doi.org/10.3758/BF03192810
  • Nalbantoğlu, F. & Gelbal, S. (2011). İletişim becerileri istasyonu örneğinde genellenebilirlik kuramıyla farklı desenlerin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 509-518. http://www.efdergi.hacettepe.edu.tr/shw_artcl-718.html
  • OECD (2012). PISA 2009 technical report, PISA, OECD Publishing. http://dx.doi.org/10.1787/9789264167872-en
  • OECD (2017), OECD (2017), PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, PISA, OECD Publishing http://dx.doi.org/10.1787/9789264281820-en
  • ÖSYM (2013). Açık uçlu sorularla deneme sınavı: Soru/cevap kitapçığının yayımlanması www.osym.gov.tr/belge/1-19413/acik-uclu-sorularla-deneme-sinavi-sorucevap-kitapcigini-.html adresinden erişim sağlanmıştır.
  • ÖSYM. (2017). Açık uçlu sorular hakkında bilgilendirme ve açık uçlu soru örnekleri. https://www.osym.gov.tr/TR,12909/2017-lisans-yerlestirme-sinavlari-2017-lys-acik-uclu-sorular-hakkinda-bilgilendirme-ve-acik-uclu-soru-ornekleri-05012017.html adresinden erişim sağlanmıştır.
  • Özçelik, D. A. (2010). Ölçme ve değerlendirme. Pegem Akademi.
  • Polat, M. & Turhan, N. (2021) Applying generalizability theory in language testing: Comparing nested and crossed scoring designs in the assessment of speaking skills, International Journal of Curriculum and Instruction,13(3), 3344–3358. https://ijci.globets.org/index.php/IJCI/article/view/825/409
  • Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94 (1), 31-37. https://doi.org/10.5951/MT.94.1.0031
  • Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing 22(1) 1-30. https://doi.org/10.1191/0265532205lt295oa
  • Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970 https://doi.org/10.1037/0021-9010.85.6.956
  • Sharma, F. & Weathers, D. (2003). Assessing generalizability of scales used in cross-national research. International Journal of Research in Marketing, 20, 287-295. http://dx.doi.org/10.1016/S0167-8116(03)00038-7
  • Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications
  • Smith, Teresa A. (1997 March 24-28). The Generalizability of Scoring TIMSS Open-Ended Items. (Report). Annual Meeting of the American Educational Research Association, Chicago, USA
  • Turgut, F. M. (1992) Eğitimde ölçme ve değerlendirme metotları. (9. Baskı). Saydam Matbaacılık.
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287. https://doi.org/10.1177/026553229801500205
  • Wexley, K. N. & Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275. https://psycnet.apa.org/doi/10.1111/j.2044-8325.1985.tb00200.x
  • Zorba, İ. (2020). Personel alımında kullanılan bir yazılı sınav sonucunun genellenebilirlik kuramındaki farklı desenlerle karşılaştırılması (Yayımlanmamış yüksek lisans tezi). Ankara Üniversitesi.

Details

Primary Language English
Subjects Test Theories
Journal Section Articles
Authors

Meral ALKAN
GAZI UNIVERSITY
0000-0001-9497-3660
Türkiye


Nuri DOĞAN
HACETTEPE UNIVERSITY
0000-0001-6274-2016
Türkiye

Publication Date June 30, 2023
Acceptance Date June 12, 2023
Published in Issue Year 2023 Volume: 14 Issue: 2

Cite

APA
ALKAN, M., & DOĞAN, N. (2023). A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 14(2), 106-117. https://doi.org/10.21031/epod.1210917