Comparing Fully Crossed and Nested Designs Where Items Nested in Raters in Generalizability Theory

Celal Deha Doğan; Hatice Özlem Anadol

doi:10.24106/kefdergi.309180

Araştırma Makalesi

Comparing Fully Crossed and Nested Designs Where Items Nested in Raters in Generalizability Theory

Yıl 2017, Cilt: 25 Sayı: 1, 361 - 372, 15.01.2017

Celal Deha Doğan Hatice Özlem Anadol

Öz

This study aims to investigate the comparison of G and Phi coefficients calculated both

by a fully crossed (bxpxm) design (b: person; p: rater; m:item) and a nested (bx(m:p)) design

in which items are nested in raters but individuals are crossed with items and raters in the

process of grading English Composition Writing Skill. The study consists of students who

attend a private university and 3 raters. According to the results, G and Phi coefficients of the

fully crossed design were higher. In terms of sources of variance, while person has the highest

percentage in total variance in fully crossed design, occasion has relatively low percentage.

Findings indicate that fully crossed design yields more reliable results in classroom practices.

In this respect, fully crossed design is recommended for classroom practices.

Anahtar Kelimeler

Interater Reliability, Generalizability Theory, Fully crossed designs, Nested designs.

Kaynakça

Akın, Ö.,&Baştürk, R. (2010). Assessment of research assignment by many-facet Rasch measurement approach. Journal of Measurement and Evaluation in Education and Psychology, 1(1), 51–57.
Akın, Ö.,&Baştürk, R. (2012). The evaluation of the basic skills in violin training by many-facetrasch model. Pamukkale Üniversity Journal of Education, 31, 175–187.
Arter, J. A. Ve Mctighe, J. (2001). Scoring Rubrics in TheClassroom:Using Peformance Criteria for Assessing and Improving Student Performance, Thousand Oaks, CA: Corvin Press
Büyükkıdık, S., & Anıl, D. (2015). Investigation of reliability in generalizability theory with different designs on performance based assessment. Education and Science, 40(177), 285–296.
Doğan, C. D., &Yosmaoğlu, B. (2015). The effect of the analytical rubrics on the objectivity in physiotherapy practical examination.TurkiyeKlinikleri Journal of Sports Science, 7(1), 9–15.
Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93–112.
Engelhard, G., &Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series (1), i-60.
Goodrich, H. (1996).Students Self Assessment: At theintersection of metacognition and authentic assessment. Doctoraldisertation. Cambrdige, MA: HarwardUniversity
Güler, N., &Gelbal, S. (2010a). Studying reliability of open-ended mathematics items according to the classical test theory and generalizability. Educational Sciences: Theory & Practice, 10(2), 989–1019.
Güler, N., &Gelbal, S. (2010b). A study based on the classical test theory and many facet Rasch model. Egitim Eurasian Journal of Educational Research, 38, 108–125.
Güler, N, Uyanık,G. K., Teker, G. T. (2012). Genellenebilirlik Kuramı. Pegem Akademi, Ankara Türkiye
Grounlond, N. E. (1998).Assessment of StudentAchievement. USA: By Allyn & Bacon Viacom Company
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education, 13(4), 479–493.
Iramaneerat, C., Myford, C. M., Yudkowsky, R., & Lowenstein, T. (2009). Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents. Advances in Health Sciences Education, 14 (4), 575–594.
Kan, A. (2005a). The effect of using grading scale and response key to (same) grader’s reliability. Eurasian Journal of Educational Research, 166–167.
Kan, A. (2005b). The effect of using grading scale and response key to (different) grader’s reliability. Eurasian Journal of Educational Research, 207–219.
Kutlu, Ö., Doğan, D. ve Karakaya, İ. (2014). Ölçme ve Değerlendirme: Performansa ve Portfolyaya Dayalı Durum Belirleme. Ankara: Pegem Akademi Yayıncılık.
Linacre, J. M., Wright, B. D., &Lunz, M. E. (1990). A facets model for judgmental scoring. MESA Memo, 61. Chicago, IL: MESA.
Lynch, B. K.,&McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speakingskills of immigrants. Language Testing, 15(2), 158-180.
Macmillan, P. D. (2000). Classical, generalizability, andmultifacetedRaschdetection of interratervariability in large, sparsedatasets. TheJournal of experimentaleducation, 68(2), 167-190
Nakamura, Y. (2002). Teacher assessment and peer assessment in practice. Educational Studies,44., 203-215
Özel, S. & Acar, T. (2014, 11-13 Haziran).Okullarda Sınıf İçi Ölçmelerde G Katsayısı, IV. Ulusal Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresinde Sözlü Bildiri olarak sunulmuştur, Hacettepe Üniversitesi.
Parlak, B., & Doğan, N. (2014). Comparison of answer key and scoring rubric for the evaluation of student performances. Hacettepe.University Journal of Education 29(2), 189–197.
Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239–261.
Stenlund, T. (2013). Agreement in assessment of prior learning related to higher education: An examination of inter-rater and intra-rater reliability. International Journal of Lifelong Education 32(4), 535–547.

GENELLENEBİLİRLİK KURAMINDA TÜMÜYLE ÇAPRAZLANMIŞ ve MADDELERİN PUANLAYICILARA YUVALANDIĞI DESENLERİN KARŞILAŞTIRILMASI

Yıl 2017, Cilt: 25 Sayı: 1, 361 - 372, 15.01.2017

Celal Deha Doğan Hatice Özlem Anadol

Öz

Bu çalışmada İngilizce Kompozisyon Yazma Becerisinin puanlanması sürecinde tümüyle çaprazlanmış desenin (bxpxm) ve maddelerin puanlayıcıya yuvalandığı ancak bireylerin maddeler ve puanlayıcılar ile çaprazlanmış olduğu yuvalanmış desenin (bx(m:p)) kullanıldığı durumlarda elde edilen G ve Phi katsayılarının karşılaştırılması amaçlanmıştır. Çalışmaya bir vakıf üniversitesi hazırlık okulunda öğrenim gören ve 3 puanlayıcı dahil olmuştur. Çalışma sonucunda tümüyle çaprazlamış desen ile elde edilen G ve Phi katsayıları daha yüksek çıkmıştır. Değişkenlik kaynaklarına göre varyans bileşenleri incelendiğinde birey ana etkisine ilişkin varyans tümüyle çaprazlanmış desen için daha yüksekken kalan etkiye ilişkin varyans değeri daha düşüktür. Bu bulgular sınıf içi uygulamalarda tümüyle çaprazlanmış desenin daha güvenilir sonuçlar verdiğini göstermektedir. Bu bağlamda sınıf içi uygulamalarda zorunlu kalınmadıkça tümüyle çaprazlanmış desenin kullanılması önerilmektedir.

Anahtar Kelimeler

Puanlayıcılar arası Güvenirlik, Genellenebilirlik Kuramı, Tümüyle Çaprazlanmış Desenler, Yuvalanmış Desenler

Kaynakça

Akın, Ö.,&Baştürk, R. (2010). Assessment of research assignment by many-facet Rasch measurement approach. Journal of Measurement and Evaluation in Education and Psychology, 1(1), 51–57.
Akın, Ö.,&Baştürk, R. (2012). The evaluation of the basic skills in violin training by many-facetrasch model. Pamukkale Üniversity Journal of Education, 31, 175–187.
Arter, J. A. Ve Mctighe, J. (2001). Scoring Rubrics in TheClassroom:Using Peformance Criteria for Assessing and Improving Student Performance, Thousand Oaks, CA: Corvin Press
Büyükkıdık, S., & Anıl, D. (2015). Investigation of reliability in generalizability theory with different designs on performance based assessment. Education and Science, 40(177), 285–296.
Doğan, C. D., &Yosmaoğlu, B. (2015). The effect of the analytical rubrics on the objectivity in physiotherapy practical examination.TurkiyeKlinikleri Journal of Sports Science, 7(1), 9–15.
Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93–112.
Engelhard, G., &Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series (1), i-60.
Goodrich, H. (1996).Students Self Assessment: At theintersection of metacognition and authentic assessment. Doctoraldisertation. Cambrdige, MA: HarwardUniversity
Güler, N., &Gelbal, S. (2010a). Studying reliability of open-ended mathematics items according to the classical test theory and generalizability. Educational Sciences: Theory & Practice, 10(2), 989–1019.
Güler, N., &Gelbal, S. (2010b). A study based on the classical test theory and many facet Rasch model. Egitim Eurasian Journal of Educational Research, 38, 108–125.
Güler, N, Uyanık,G. K., Teker, G. T. (2012). Genellenebilirlik Kuramı. Pegem Akademi, Ankara Türkiye
Grounlond, N. E. (1998).Assessment of StudentAchievement. USA: By Allyn & Bacon Viacom Company
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education, 13(4), 479–493.
Iramaneerat, C., Myford, C. M., Yudkowsky, R., & Lowenstein, T. (2009). Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents. Advances in Health Sciences Education, 14 (4), 575–594.
Kan, A. (2005a). The effect of using grading scale and response key to (same) grader’s reliability. Eurasian Journal of Educational Research, 166–167.
Kan, A. (2005b). The effect of using grading scale and response key to (different) grader’s reliability. Eurasian Journal of Educational Research, 207–219.
Kutlu, Ö., Doğan, D. ve Karakaya, İ. (2014). Ölçme ve Değerlendirme: Performansa ve Portfolyaya Dayalı Durum Belirleme. Ankara: Pegem Akademi Yayıncılık.
Linacre, J. M., Wright, B. D., &Lunz, M. E. (1990). A facets model for judgmental scoring. MESA Memo, 61. Chicago, IL: MESA.
Lynch, B. K.,&McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speakingskills of immigrants. Language Testing, 15(2), 158-180.
Macmillan, P. D. (2000). Classical, generalizability, andmultifacetedRaschdetection of interratervariability in large, sparsedatasets. TheJournal of experimentaleducation, 68(2), 167-190
Nakamura, Y. (2002). Teacher assessment and peer assessment in practice. Educational Studies,44., 203-215
Özel, S. & Acar, T. (2014, 11-13 Haziran).Okullarda Sınıf İçi Ölçmelerde G Katsayısı, IV. Ulusal Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresinde Sözlü Bildiri olarak sunulmuştur, Hacettepe Üniversitesi.
Parlak, B., & Doğan, N. (2014). Comparison of answer key and scoring rubric for the evaluation of student performances. Hacettepe.University Journal of Education 29(2), 189–197.
Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239–261.
Stenlund, T. (2013). Agreement in assessment of prior learning related to higher education: An examination of inter-rater and intra-rater reliability. International Journal of Lifelong Education 32(4), 535–547.

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Konular	Eğitim Üzerine Çalışmalar
Bölüm	Derleme Makale
Yazarlar	Celal Deha Doğan Bu kişi benim Hatice Özlem Anadol Bu kişi benim
Yayımlanma Tarihi	15 Ocak 2017
Kabul Tarihi	25 Nisan 2016
Yayımlandığı Sayı	Yıl 2017 Cilt: 25 Sayı: 1

Kaynak Göster

APA	Doğan, C. D., & Anadol, H. Ö. (2017). Comparing Fully Crossed and Nested Designs Where Items Nested in Raters in Generalizability Theory. Kastamonu Education Journal, 25(1), 361-372. https://doi.org/10.24106/kefdergi.309180

Makale Dosyaları

Tam Metin