BibTex RIS Kaynak Göster

Measuring Essay Assessment: Intra-rater and Inter-rater Reliability

Yıl 2014, Sayı: 57, 113 - 136, 03.01.2015


There have been many attempts to research the effective
assessment of writing ability, and many proposals for how this might be
done. In this sense, rater reliability plays a crucial role for making vital
decisions about testees in different turning points of both educational and
professional life. Intra-rater and inter-rater reliability of essay assessments
made by using different assessing tools should also be discussed with the
assessment processes.


  • Blok, H. (1985). Estimating the reliability, validity, and invalidity of essay ratings.
  • Journal of Educational Measurement, 22(1), 41-52. Bowen, K. and Cali, K. (2004). Teaching the features of effective writing. Retrieved November 21, 2004, from 00445C5A?OpenDocument.
  • Breland, H. (1983). The direct assessment of writing skill: A measurement review
  • (Technical Report No.83-6). Princeton, NJ: College Entrance Examination Board. Celce-Murcia, M. (2001). Teaching English as a second or foreign language.
  • Massachusetts: Heinle and Heinle. Chase, C. I. (1983). Essay test scores and reading difficulty. Journal of Educational Measurement, 20(3), 293-297.
  • Chase, C. I. (1968). The impact of some obvious variables on essay test scores. Journal of Educational Measurement, 2(4), 315-318.
  • Cherry, R. and Meyer, P. (1993). Reliability issues in holistic assessment. In M.
  • Williamson and B. Huot (Ed.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (pp. 109-141). Cresskill, NJ: Hampton. Darus, S. (2006). Identifying dimensions and attributes of writing proficiency: development of a framework of a computer-based essay marking system for
  • Malaysian ESL learners. Internet Journal of e-Learning and Teaching, 3(1), 1-25. Dempsey, M. S., PytlikZillig, L. M., and Bruning, R. G. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a web-based environment. Assessing Writing, 14, 38–61.
  • East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14, 88-115.
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted rasch model. Journal of Educational Measurement, 31(2), 93-112.
  • Erkuş A. (2003). Psikometri üzerine yazilar: ölçme ve psikometrinin tarihsel kökenleri, güvenirlik, geçerlik, madde analizi, tutumlar; bileşenleri ve ölçülmesi [Writings on
  • Pscychometrics: historical basis for measurement and pscyhometrics, reliability, validity, item analysis, attitudes; components and measurement]. 1. baskı, Ankara. Türk Psikologlar Derneği Yayınları. Fisher, R., Brooks, G., and Lewis, M. (2002). Raising standards in literacy. New York: Routledge.
  • Glesne, C. (1999). Becoming qualitative researchers: An introduction. New York: Longman.
  • Gyagenda, I. S. and Engelhard, G. (1998a). Rater, domain, and gender influences on the assessed quality of student writing using weighted and unweighted scoring. Annual Meeting of the American Educational Research Association. San Diego.
  • Gyagenda, I. S. and Engelhard, G. (1998b). Applying the Rasch model to explore rater influences on the assessed quality of students’ writing ability. Annual Meeting of the American Educational Research Association. San Diego.
  • Hamp-Lyons, L. (1991). The writer’s knowledge and our knowledge of the writer. In
  • L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (p. 15-36). Norwood, NJ: Ablex. Hamp-Lyons, L. (1992). Holistic writing assessment for LEP students. Second National
  • Research Symposium on Limited English Proficient Student Studies: Focus on Evaluation and Measurement. Washington. Hawkey, R. and Barker, F. (2004). Developing a common scale for the assessment of writing. Assessing Writing, 9, 122-159.
  • Herrington, A. and Moran, C. (2001). What happens when machines read our students’writing?. College English, 63, 480-499.
  • Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large-scale ESL writing assessment. Assessing Writing 17, 123-139.
  • Hughes, D. and Keeling, B. (1984). The use of model essays to reduce context effects in essay scoring. Journal of Educational Measurement, 21(3), 277-281.
  • Hughes, D., Keeling, B., and Tuck, B. F. (1983). Effects of achievement expectations and handwriting quality on scoring essays. Journal of Educational Measurement, 20(1), 65-70.
  • Hughes, D., Keeling, B., and Tuck, B. F. (1980). The influence of context position and scoring method on essay scoring. Journal of Educational Measurement, 17(2), 131
  • Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60, 237-263. IELTS (2007). IELTS handbook. Retrieved January 19, 2008 from Handbook_2007 .pdf#.
  • Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., and Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.
  • Johnson, R., Penny, J., and Gordon, B. (2001). Score resolution and the interrater reliability of holistic scores in rating essays. Written Communication, 18(2), 229- 2
  • Kan, A. (2005). Yazılı yoklamaların puanlanmasında puanlama cetveli ve yanıt anahtarı kullanımının (aynı) puanlayıcı güvenirliğine etkisi [The effect of checklist and answer key use in writing assessment on rater reliability]. Eğitim
  • Araştırmaları Dergisi, 5(20), 166-177. Kan, A. (2007). Performans değerlendirme sürecine katkıları açısından yeni program anlayışı içerisinde kullanılabilecek bir değerlendirme yaklaşımı: Rubrik puanlama yönergeleri [An evaluation approach to be used for the new curriculum considering the contributions to performance evaluation process:
  • Rubrics]. Kuram ve Uygulamada Eğitim Bilimleri, 7(1), 129-152. Kayapinar, U. (2010). A study on assessment tools and evaluation of essay writing skill in foreign language education. Unpublished PhD Dissertation, Mersin
  • University. Yenisehir Campus: Turkey. Klein, S. and Hart, F. M. (1968). Chance and systematic factors affecting essay grades.
  • Journal of Educational Measurement, 5(3), 197-206. Klein, J. and Taub, D. (2005). The effect of variations in handwriting and print on evaluation of student essays. Assessing Writing, 10, 134-148.
  • Kline, P. (1986). A handbook of test construction: introduction to psychometric design. London: Methuen.
  • Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26, 275-304.
  • Marshall, J. C. and Powers, J. M. (1969). Writing neatness, composition errors, and essay grades. Journal of Educational Measurement, 6(2), 97-101.
  • Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Practical Assessment,
  • Research and Evaluation, 7(25). Retrieved October 11, 2007 from ?v=7andn=25.
  • Miles, M. B. and Huberman, A. M. (1994). Qualitative data analysis. California: Sage Publications.
  • Moskal, B. M. (2000). Scoring rubrics: what, when, and how?. Practical Assessment,
  • Research, And Evaluation, 7(3). Retrieved October 11, 2007 from
  • Murphy, K. R. and Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of
  • Applied Psychology, 74(4), 619-624. Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, NJ: Merrill.
  • Norton, L. S. (1990). Essay-writing: What really counts. Higher Education, 20(4), 411- 4
  • Patton, M. Q. (2002). Qualitative research and evaluation methods. California: Sage Publications.
  • Raimes, A. (1983). Techniques in teaching writing. Oxford: Oxford University Press.
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493.
  • Slomp, D. H. (2012). Challenges in assessing the development of writing ability:
  • Theories, constructs and methods. Assessing Writing 17, 81-91. Strauss, A. and Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory. California: Sage Publications.
  • Sulsky, L. M. and Balzer W. K. (1988). Meaning and measurement of performance rating accuracy: Some methodological and theoretical concerns. Journal of Applied Psychology, 73,497-506.
  • Vaughan C. (1991). Holistic assessment: What goes on in the rater’s mind? L. Hamp
  • Lyons (Ed.), In Assessing Second Language Writing in Academic Contexts (p. 111- 126). Norwood, NJ: Ablex. Weir, C. J. (1990). Communicative language testing. Wiltshire: Prentice Hall.
  • Wexley, K.N. and Youtz, M.A. (1985). Rater beliefs about others: Their effects on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275.
  • Woehr, D. J. and Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67, 189-20
Yıl 2014, Sayı: 57, 113 - 136, 03.01.2015



  • Blok, H. (1985). Estimating the reliability, validity, and invalidity of essay ratings.
  • Journal of Educational Measurement, 22(1), 41-52. Bowen, K. and Cali, K. (2004). Teaching the features of effective writing. Retrieved November 21, 2004, from 00445C5A?OpenDocument.
  • Breland, H. (1983). The direct assessment of writing skill: A measurement review
  • (Technical Report No.83-6). Princeton, NJ: College Entrance Examination Board. Celce-Murcia, M. (2001). Teaching English as a second or foreign language.
  • Massachusetts: Heinle and Heinle. Chase, C. I. (1983). Essay test scores and reading difficulty. Journal of Educational Measurement, 20(3), 293-297.
  • Chase, C. I. (1968). The impact of some obvious variables on essay test scores. Journal of Educational Measurement, 2(4), 315-318.
  • Cherry, R. and Meyer, P. (1993). Reliability issues in holistic assessment. In M.
  • Williamson and B. Huot (Ed.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (pp. 109-141). Cresskill, NJ: Hampton. Darus, S. (2006). Identifying dimensions and attributes of writing proficiency: development of a framework of a computer-based essay marking system for
  • Malaysian ESL learners. Internet Journal of e-Learning and Teaching, 3(1), 1-25. Dempsey, M. S., PytlikZillig, L. M., and Bruning, R. G. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a web-based environment. Assessing Writing, 14, 38–61.
  • East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14, 88-115.
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted rasch model. Journal of Educational Measurement, 31(2), 93-112.
  • Erkuş A. (2003). Psikometri üzerine yazilar: ölçme ve psikometrinin tarihsel kökenleri, güvenirlik, geçerlik, madde analizi, tutumlar; bileşenleri ve ölçülmesi [Writings on
  • Pscychometrics: historical basis for measurement and pscyhometrics, reliability, validity, item analysis, attitudes; components and measurement]. 1. baskı, Ankara. Türk Psikologlar Derneği Yayınları. Fisher, R., Brooks, G., and Lewis, M. (2002). Raising standards in literacy. New York: Routledge.
  • Glesne, C. (1999). Becoming qualitative researchers: An introduction. New York: Longman.
  • Gyagenda, I. S. and Engelhard, G. (1998a). Rater, domain, and gender influences on the assessed quality of student writing using weighted and unweighted scoring. Annual Meeting of the American Educational Research Association. San Diego.
  • Gyagenda, I. S. and Engelhard, G. (1998b). Applying the Rasch model to explore rater influences on the assessed quality of students’ writing ability. Annual Meeting of the American Educational Research Association. San Diego.
  • Hamp-Lyons, L. (1991). The writer’s knowledge and our knowledge of the writer. In
  • L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (p. 15-36). Norwood, NJ: Ablex. Hamp-Lyons, L. (1992). Holistic writing assessment for LEP students. Second National
  • Research Symposium on Limited English Proficient Student Studies: Focus on Evaluation and Measurement. Washington. Hawkey, R. and Barker, F. (2004). Developing a common scale for the assessment of writing. Assessing Writing, 9, 122-159.
  • Herrington, A. and Moran, C. (2001). What happens when machines read our students’writing?. College English, 63, 480-499.
  • Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large-scale ESL writing assessment. Assessing Writing 17, 123-139.
  • Hughes, D. and Keeling, B. (1984). The use of model essays to reduce context effects in essay scoring. Journal of Educational Measurement, 21(3), 277-281.
  • Hughes, D., Keeling, B., and Tuck, B. F. (1983). Effects of achievement expectations and handwriting quality on scoring essays. Journal of Educational Measurement, 20(1), 65-70.
  • Hughes, D., Keeling, B., and Tuck, B. F. (1980). The influence of context position and scoring method on essay scoring. Journal of Educational Measurement, 17(2), 131
  • Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60, 237-263. IELTS (2007). IELTS handbook. Retrieved January 19, 2008 from Handbook_2007 .pdf#.
  • Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., and Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.
  • Johnson, R., Penny, J., and Gordon, B. (2001). Score resolution and the interrater reliability of holistic scores in rating essays. Written Communication, 18(2), 229- 2
  • Kan, A. (2005). Yazılı yoklamaların puanlanmasında puanlama cetveli ve yanıt anahtarı kullanımının (aynı) puanlayıcı güvenirliğine etkisi [The effect of checklist and answer key use in writing assessment on rater reliability]. Eğitim
  • Araştırmaları Dergisi, 5(20), 166-177. Kan, A. (2007). Performans değerlendirme sürecine katkıları açısından yeni program anlayışı içerisinde kullanılabilecek bir değerlendirme yaklaşımı: Rubrik puanlama yönergeleri [An evaluation approach to be used for the new curriculum considering the contributions to performance evaluation process:
  • Rubrics]. Kuram ve Uygulamada Eğitim Bilimleri, 7(1), 129-152. Kayapinar, U. (2010). A study on assessment tools and evaluation of essay writing skill in foreign language education. Unpublished PhD Dissertation, Mersin
  • University. Yenisehir Campus: Turkey. Klein, S. and Hart, F. M. (1968). Chance and systematic factors affecting essay grades.
  • Journal of Educational Measurement, 5(3), 197-206. Klein, J. and Taub, D. (2005). The effect of variations in handwriting and print on evaluation of student essays. Assessing Writing, 10, 134-148.
  • Kline, P. (1986). A handbook of test construction: introduction to psychometric design. London: Methuen.
  • Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26, 275-304.
  • Marshall, J. C. and Powers, J. M. (1969). Writing neatness, composition errors, and essay grades. Journal of Educational Measurement, 6(2), 97-101.
  • Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Practical Assessment,
  • Research and Evaluation, 7(25). Retrieved October 11, 2007 from ?v=7andn=25.
  • Miles, M. B. and Huberman, A. M. (1994). Qualitative data analysis. California: Sage Publications.
  • Moskal, B. M. (2000). Scoring rubrics: what, when, and how?. Practical Assessment,
  • Research, And Evaluation, 7(3). Retrieved October 11, 2007 from
  • Murphy, K. R. and Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of
  • Applied Psychology, 74(4), 619-624. Nitko, A. J. (2001). Educational assessment of students (3rd ed.). Upper Saddle River, NJ: Merrill.
  • Norton, L. S. (1990). Essay-writing: What really counts. Higher Education, 20(4), 411- 4
  • Patton, M. Q. (2002). Qualitative research and evaluation methods. California: Sage Publications.
  • Raimes, A. (1983). Techniques in teaching writing. Oxford: Oxford University Press.
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493.
  • Slomp, D. H. (2012). Challenges in assessing the development of writing ability:
  • Theories, constructs and methods. Assessing Writing 17, 81-91. Strauss, A. and Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory. California: Sage Publications.
  • Sulsky, L. M. and Balzer W. K. (1988). Meaning and measurement of performance rating accuracy: Some methodological and theoretical concerns. Journal of Applied Psychology, 73,497-506.
  • Vaughan C. (1991). Holistic assessment: What goes on in the rater’s mind? L. Hamp
  • Lyons (Ed.), In Assessing Second Language Writing in Academic Contexts (p. 111- 126). Norwood, NJ: Ablex. Weir, C. J. (1990). Communicative language testing. Wiltshire: Prentice Hall.
  • Wexley, K.N. and Youtz, M.A. (1985). Rater beliefs about others: Their effects on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275.
  • Woehr, D. J. and Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67, 189-20
Toplam 53 adet kaynakça vardır.


Birincil Dil İngilizce
Bölüm Makaleler

Ulaş Kayapınar Bu kişi benim

Yayımlanma Tarihi 3 Ocak 2015
Yayımlandığı Sayı Yıl 2014 Sayı: 57

Kaynak Göster

APA Kayapınar, U. (2015). Measuring Essay Assessment: Intra-rater and Inter-rater Reliability. Eurasian Journal of Educational Research(57), 113-136.
AMA Kayapınar U. Measuring Essay Assessment: Intra-rater and Inter-rater Reliability. Eurasian Journal of Educational Research. Ocak 2015;(57):113-136. doi:10.14689/ejer.2014.57.2
Chicago Kayapınar, Ulaş. “Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability”. Eurasian Journal of Educational Research, sy. 57 (Ocak 2015): 113-36.
EndNote Kayapınar U (01 Ocak 2015) Measuring Essay Assessment: Intra-rater and Inter-rater Reliability. Eurasian Journal of Educational Research 57 113–136.
IEEE U. Kayapınar, “Measuring Essay Assessment: Intra-rater and Inter-rater Reliability”, Eurasian Journal of Educational Research, sy. 57, ss. 113–136, Ocak 2015, doi: 10.14689/ejer.2014.57.2.
ISNAD Kayapınar, Ulaş. “Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability”. Eurasian Journal of Educational Research 57 (Ocak 2015), 113-136.
JAMA Kayapınar U. Measuring Essay Assessment: Intra-rater and Inter-rater Reliability. Eurasian Journal of Educational Research. 2015;:113–136.
MLA Kayapınar, Ulaş. “Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability”. Eurasian Journal of Educational Research, sy. 57, 2015, ss. 113-36, doi:10.14689/ejer.2014.57.2.
Vancouver Kayapınar U. Measuring Essay Assessment: Intra-rater and Inter-rater Reliability. Eurasian Journal of Educational Research. 2015(57):113-36.