Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments

Adnan Kan; Okan Bulut

doi:10.14689/ejer.2014.57.4

Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments

Yıl 2014, Sayı: 57, 1 - 28, 03.01.2015

Adnan Kan , Okan Bulut

https://doi.org/10.14689/ejer.2014.57.4

Öz

Performance assessments have emerged as an
alternative method to measure what a student knows and can do. One of
the shortcomings of performance assessments is the subjectivity and
inconsistency of raters in scoring. A common criticism of performance
assessments is the subjective nature of scoring procedures. The
effectiveness of the performance assessment procedure depends highly on
the quality and coordination of teacher and rubric. To gain a better
understanding of the interaction between teachers and performance
assessments, it is crucial to examine the effects of teacher-related factors
and how teachers interact with scoring rubrics when grading performance
assessments. One of these factors is teachers’ work and scoring experience.
When grading performance assessments, the experienced teachers may be
expected to grade student performances more objectively through their
experience in instruction and evaluation than the teachers with less
teaching and scoring experience.

Anahtar Kelimeler

Performance assessment , rubric , teaching experience , reliability , rater effects , crossed random effects model

Kaynakça

Ang-Aw, H.T. & Goh, C.C. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RELC Journal, 42(1), 31-51.
Baird, J., & Mac, Q. (1999). How should examiner adjustments be calculated? - A discussion paper. AEB Research Report, RC13.
Bates, D. M. (2005). Fitting linear mixed models in R. R News, 5, 27–30.
Bates, D., Maechler, M., & Bolker, B. (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42. http://CRAN.R- project.org/package=lme4.
Brennan, R. L. (2003). CASMA research report: Coefficients and indices in generalizability theory (Research Report No. 1). Iowa City: The University of Iowa Center for Advanced Studies in Measurement and Assessment.
Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.
Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339-353.
Brualdi, A. (1998). Implementing Performance Assessment in the classroom. Practical Assessment, Research & Evaluation, 6(2). Retrieved August 29, 2012, from http://PAREonline.net/getvn.asp?v=6&n=2
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.
Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Jovanovich College Publishers: Philadelphia.
Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and Teacher Education, 16, 523–545.
Dochy, F., Gijbels, D., & Segers, M. (2006). Learning and the emerging new assessment culture. In L. Verschaffel, F. Dochy, M. Boekaerts, & S. Vosniadou (Eds.), Instructional psychology: Past, present, and future trends (pp. 191-206). Oxford, Amsterdam: Elsevier.
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.
Eliasziw, M., Young, S. L., Woodbury, M.G., & Fryday-Field, K. (1994). Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using goniometric measurements as an example. Physical Therapy, 74, 777-788.
Farr, R., & Tone, B. (1998). Le portfolio, au service de l’apprentissage et de l’évaluation. Montréal/Toronto: Chenelière/McGraw-Hill.
Fox, J. (2002). An R and S-PLUS companion to applied regression. Thousand Oaks, CA: Sage.
Friel, S. N., Curcio, F., & Bright, G. W., (2001). Making sense of graphs: critical factors influencing comprehension and instructional implications. Journal for Research in mathematics Education, 32(2), 124-158.
Hadden, B. L. (1991). Teacher and nonteacher perceptions of second-language communication. Language Learning 41(1): 1-20.
Hafner, J., & Hafner, P., (2003). Quantitative analysis of the rubric as an assessment tool: An empirical study of student peer-group rating. International Journal of Science Education, 25(12), 1509-1528.
Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241-278). Norwood, NJ: Ablex.
Jolliffe, F. R. (1991). Assessment of the understanding of statistical concepts. In D. Vere-Jones (Ed.), Proceedings of the third international conference on teaching statistics (Vol. 1, pp. 461–466). Voorburg, The Netherlands: International Statistical Institute.
Kulm, G., & Malcolm, S. (1991). Science assessment in the service of reform. Washington, D. C.: American Association for the Advancement of Science.
Lumley, T (1998). Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency. English for Specific Purposes,17(4), 347-67.
Messick, S. (1996). Validity of performance assessments. In G. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1–18). Washington, DC: National Center for Education Statistics.
Meyer, L. (2000). Lingering doubt examiners: Results of pilot modeling analyses, summer 2000: AEB Research Report.
Myford, C.M., & Mislevy, R. J. (1994) Monitoring and improving a portfolio assessment system. Princeton, NJ: Educational Testing Service.
O'Neil, J. (1992). Putting performance assessment to the test. Educational Leadership, 49, 14-19.
Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical Assessment, Research & Evaluation, 13(4), 1-11. Retrieved on September 29, 2012, from http://pareonline.net/pdf/v13n4.pdf
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. New York: Springer.
Pinot de Moira, A. (2003). Examiner background and the effect on marking reliability. AQA Research Report, RC218.
R Development Core Team (2012). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, http://www.R- project.org.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage Publications.
Royal-Dawson, L., & Baird, J. (2009). Is teaching experience necessary for reliable scoring of extended English questions? Educational Measurement: Issues and Practice, 28(2), 2-8.
Routman, R. (1991). Invitations. Portsmouth, NH: Heinemann.
Schafer, W., Swanson, G., Bene, N., & Newberry, G. (2001). Effects of teacher knowledge of rubrics on student achievement in four content areas. Applied Measurement in Education, 14(2), 151-170.
Shavelson, R. J., & Webb, N. M. (2005). Generalizability theory. In Green, J. L., Camilli, G. & Elmore, P. B. (Eds.), Complementary Methods for Research in Education. (3rd ed.) Washington, DC: AERA.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Shohamy, E., Gordon, C., & Kramer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. Modern Language Journal, 76, 1, 27-33.
Slater, T. F., & Ryan, J. M. (1993). Laboratory performance assessment. The Physics Teacher, 31(5): 306-309.
Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, http://PAREonline.net/getvn.asp?v=9&n=4 on 9(4). Retrieved September 11, 2012, from
Stuhlmann, J., Daniel, C., Dellinger, A., Denny, R. K., & Powers, T. (1999). A generalizability study of the effects of training on teachers’ abilities to rate children’s writing using a rubric. Journal of Reading Psychology, 20, 107–127.
Wainer, H. (1992). Understanding graphs and tables. Educational Researcher, 21(1), 14– 23.
Ward, D. G. (1986). Factor indeterminacy in generalizability theory. Applied Psychological Measurement, 10, 159-165.
Weigle, S. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative & qualitative approaches. Assessing Writing, 6(2), 145-178.
West, B. T., Welch, K. B., & Gallechki, A. T. (2007). Linear mixed models. A practical guide using statistical software. Boca Raton: Chapman & Hall/CRC.
Wiener, R. B., & Cohen, J. H. (1997). Literacy portfolios: Using assessment to guide instruction. New Jersey: Prentice Hall.
Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17(2), 77-103.
Wood, R. (1968). Objectives in the teaching of mathematics. Educational Research, 10, 83–98.
Çapraz Random Etki Modelleme: Rubrik kullanımı ve Öğretmen deneyiminin
Performans Değerleme üzerindeki Etkisinin İncelenmesi

Yıl 2014, Sayı: 57, 1 - 28, 03.01.2015

Adnan Kan , Okan Bulut

https://doi.org/10.14689/ejer.2014.57.4

Öz

Kaynakça

Ang-Aw, H.T. & Goh, C.C. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RELC Journal, 42(1), 31-51.
Baird, J., & Mac, Q. (1999). How should examiner adjustments be calculated? - A discussion paper. AEB Research Report, RC13.
Bates, D. M. (2005). Fitting linear mixed models in R. R News, 5, 27–30.
Bates, D., Maechler, M., & Bolker, B. (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42. http://CRAN.R- project.org/package=lme4.
Brennan, R. L. (2003). CASMA research report: Coefficients and indices in generalizability theory (Research Report No. 1). Iowa City: The University of Iowa Center for Advanced Studies in Measurement and Assessment.
Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.
Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339-353.
Brualdi, A. (1998). Implementing Performance Assessment in the classroom. Practical Assessment, Research & Evaluation, 6(2). Retrieved August 29, 2012, from http://PAREonline.net/getvn.asp?v=6&n=2
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.
Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Jovanovich College Publishers: Philadelphia.
Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and Teacher Education, 16, 523–545.
Dochy, F., Gijbels, D., & Segers, M. (2006). Learning and the emerging new assessment culture. In L. Verschaffel, F. Dochy, M. Boekaerts, & S. Vosniadou (Eds.), Instructional psychology: Past, present, and future trends (pp. 191-206). Oxford, Amsterdam: Elsevier.
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.
Eliasziw, M., Young, S. L., Woodbury, M.G., & Fryday-Field, K. (1994). Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using goniometric measurements as an example. Physical Therapy, 74, 777-788.
Farr, R., & Tone, B. (1998). Le portfolio, au service de l’apprentissage et de l’évaluation. Montréal/Toronto: Chenelière/McGraw-Hill.
Fox, J. (2002). An R and S-PLUS companion to applied regression. Thousand Oaks, CA: Sage.
Friel, S. N., Curcio, F., & Bright, G. W., (2001). Making sense of graphs: critical factors influencing comprehension and instructional implications. Journal for Research in mathematics Education, 32(2), 124-158.
Hadden, B. L. (1991). Teacher and nonteacher perceptions of second-language communication. Language Learning 41(1): 1-20.
Hafner, J., & Hafner, P., (2003). Quantitative analysis of the rubric as an assessment tool: An empirical study of student peer-group rating. International Journal of Science Education, 25(12), 1509-1528.
Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241-278). Norwood, NJ: Ablex.
Jolliffe, F. R. (1991). Assessment of the understanding of statistical concepts. In D. Vere-Jones (Ed.), Proceedings of the third international conference on teaching statistics (Vol. 1, pp. 461–466). Voorburg, The Netherlands: International Statistical Institute.
Kulm, G., & Malcolm, S. (1991). Science assessment in the service of reform. Washington, D. C.: American Association for the Advancement of Science.
Lumley, T (1998). Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency. English for Specific Purposes,17(4), 347-67.
Messick, S. (1996). Validity of performance assessments. In G. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1–18). Washington, DC: National Center for Education Statistics.
Meyer, L. (2000). Lingering doubt examiners: Results of pilot modeling analyses, summer 2000: AEB Research Report.
Myford, C.M., & Mislevy, R. J. (1994) Monitoring and improving a portfolio assessment system. Princeton, NJ: Educational Testing Service.
O'Neil, J. (1992). Putting performance assessment to the test. Educational Leadership, 49, 14-19.
Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical Assessment, Research & Evaluation, 13(4), 1-11. Retrieved on September 29, 2012, from http://pareonline.net/pdf/v13n4.pdf
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. New York: Springer.
Pinot de Moira, A. (2003). Examiner background and the effect on marking reliability. AQA Research Report, RC218.
R Development Core Team (2012). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, http://www.R- project.org.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage Publications.
Royal-Dawson, L., & Baird, J. (2009). Is teaching experience necessary for reliable scoring of extended English questions? Educational Measurement: Issues and Practice, 28(2), 2-8.
Routman, R. (1991). Invitations. Portsmouth, NH: Heinemann.
Schafer, W., Swanson, G., Bene, N., & Newberry, G. (2001). Effects of teacher knowledge of rubrics on student achievement in four content areas. Applied Measurement in Education, 14(2), 151-170.
Shavelson, R. J., & Webb, N. M. (2005). Generalizability theory. In Green, J. L., Camilli, G. & Elmore, P. B. (Eds.), Complementary Methods for Research in Education. (3rd ed.) Washington, DC: AERA.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Shohamy, E., Gordon, C., & Kramer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. Modern Language Journal, 76, 1, 27-33.
Slater, T. F., & Ryan, J. M. (1993). Laboratory performance assessment. The Physics Teacher, 31(5): 306-309.
Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, http://PAREonline.net/getvn.asp?v=9&n=4 on 9(4). Retrieved September 11, 2012, from
Stuhlmann, J., Daniel, C., Dellinger, A., Denny, R. K., & Powers, T. (1999). A generalizability study of the effects of training on teachers’ abilities to rate children’s writing using a rubric. Journal of Reading Psychology, 20, 107–127.
Wainer, H. (1992). Understanding graphs and tables. Educational Researcher, 21(1), 14– 23.
Ward, D. G. (1986). Factor indeterminacy in generalizability theory. Applied Psychological Measurement, 10, 159-165.
Weigle, S. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative & qualitative approaches. Assessing Writing, 6(2), 145-178.
West, B. T., Welch, K. B., & Gallechki, A. T. (2007). Linear mixed models. A practical guide using statistical software. Boca Raton: Chapman & Hall/CRC.
Wiener, R. B., & Cohen, J. H. (1997). Literacy portfolios: Using assessment to guide instruction. New Jersey: Prentice Hall.
Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17(2), 77-103.
Wood, R. (1968). Objectives in the teaching of mathematics. Educational Research, 10, 83–98.
Çapraz Random Etki Modelleme: Rubrik kullanımı ve Öğretmen deneyiminin
Performans Değerleme üzerindeki Etkisinin İncelenmesi

Toplam 50 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Yazarlar	Adnan Kan Okan Bulut
Yayımlanma Tarihi	3 Ocak 2015
Yayımlandığı Sayı	Yıl 2014 Sayı: 57

Kaynak Göster

APA	Kan, A., & Bulut, O. (2015). Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments. Eurasian Journal of Educational Research, 57, 1-28. https://doi.org/10.14689/ejer.2014.57.4
AMA	1.Kan A, Bulut O. Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments. Eurasian Journal of Educational Research. 2015;(57):1-28. doi:10.14689/ejer.2014.57.4
Chicago	Kan, Adnan, ve Okan Bulut. 2015. “Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments”. Eurasian Journal of Educational Research, sy 57: 1-28. https://doi.org/10.14689/ejer.2014.57.4.
EndNote	Kan A, Bulut O (01 Ocak 2015) Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments. Eurasian Journal of Educational Research 57 1–28.
IEEE	[1]A. Kan ve O. Bulut, “Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments”, Eurasian Journal of Educational Research, sy 57, ss. 1–28, Oca. 2015, doi: 10.14689/ejer.2014.57.4.
ISNAD	Kan, Adnan - Bulut, Okan. “Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments”. Eurasian Journal of Educational Research. 57 (01 Ocak 2015): 1-28. https://doi.org/10.14689/ejer.2014.57.4.
JAMA	1.Kan A, Bulut O. Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments. Eurasian Journal of Educational Research. 2015;:1–28.
MLA	Kan, Adnan, ve Okan Bulut. “Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments”. Eurasian Journal of Educational Research, sy 57, Ocak 2015, ss. 1-28, doi:10.14689/ejer.2014.57.4.
Vancouver	1.Kan A, Bulut O. Crossed random-effect modeling: examining the effects of teacher experience and rubric use in performance assessments. Eurasian Journal of Educational Research [Internet]. 01 Ocak 2015;(57):1-28. Erişim adresi: https://izlik.org/JA45DB96MA

Makale Dosyaları

Tam Metin