Research Article
BibTex RIS Cite

Exploring Variability Sources in Student Evaluation of Teaching via Many-Facet Rasch Model

Year 2017, Volume: 8 Issue: 1, 15 - 33, 03.04.2017
https://doi.org/10.21031/epod.298462

Abstract

Evaluating quality of teaching is
important in nearly every higher education institute. The most common way of
assessing teaching effectiveness takes place through students. Student
Evaluation of Teaching (SET) is used to gather information about students’
experiences with a course and instructor’s performance at some point of
semester.  SET can be considered as a
type of rater mediated performance assessment where students are the raters and
instructors are the examinees. When performance assessment becomes a rater
mediated assessment process, extra measures need to be taken into consideration
in order to create more reliable and fair assessment practices. The study has
two main purposes; (a) to examine the extent to which the facets (instructor,
student, and rating items) contribute to instructors’ score variance and (b) to
examine the students’ judging behavior in order to detect any potential source
of bias in student evaluation of teaching by using the Many-Facet Rasch Model. The data set includes one thousand 235 students’ responses
from 254 courses.  The results show that a)
students greatly differ in the severity while rating instructors, b) students were
fairly consistent in their ratings, c) students as a group and individual level
are tend to display halo effect in their ratings, d) students are clustered at
the highest two categories of the scale and e) the variation in item measures
is fairly low. The findings have practical implications for the SET practices
by improving the psychometric quality of measurement.

References

  • Abrami, P. (2001). Improving judgments about teaching effectiveness using teacher rating forms. New Directions for Institutional Research, 2001(109), 59-87. http://dx.doi.org/10.1002/ir.4
  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 357-74.
  • Barnes, D. C., Engelland, B. T., Matherine, C. F., Martin, W. C., Orgeron, C. P., Ring, J. K., et al. (2008). Developing a psychometrically sound measure of collegiate teaching proficiency. College Student Journal, 42(1), 199-213.
  • Basow, S. A., & Martin, J. L. (2012). Bias in student ratings. In M.E. Kite (Ed.), Effective evaluation of teaching: A guide for faculty and administrators. Retrieved from the Society for the Teaching of Psychology web site: http://teachpsych.org/ebooks/evals2012/index.php
  • Beran, T., Violato, C., & Kline, D. (2007). What’s the ‘use’ of student ratings of instruction for administrators? One university’s experience. Canadian Journal of Higher Education, 17(1), 27-43.
  • Cashin, W. E. (1995). Student ratings of teaching: The research revisited. IDEA Paper No. 32. Retrieved from http://www.faculty.umb.edu/pjt/cashin95.pdf
  • Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco, CA: Jossey-Bass.
  • Dodeen, H. (2013). Validity, reliability, and potential bias of short forms of students’ evaluation of teaching: The case of UAE University. Educational Assessment, 18, 235-250.
  • Eckes, T. (2005). Examining rater effects in Testdaf writing and speaking performance assessments: A Many-Facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221.
  • Eckes, T. (2009). Many-Facet Rasch measurement. In S. Takala (Ed.), Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H). Strasbourg, France: Council of Europe/Language Policy Division.
  • Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement. Analyzing and Evaluating Rater-Mediated Assessments. Frankfurt am Main: Peter Lang.
  • Ginns, P., Prosser, M., & Barrie, S. (2007). Students’ perceptions of teaching quality in higher education: The perspective of currently enrolled students. Studies in Higher Education, 32(5), 603–615.
  • Gravestock, P. & Gregor-Greenleaf, E. (2008). Student course evaluations: Research, models, and trends. Toronto: Higher Education Quality Council of Ontario.
  • Gursoy, D., & Umbreit, W. T. (2005). Exploring students’ evaluations of teaching effectiveness: What factors are important? Journal of Hospitality & Tourism Research, 29, 91-109.
  • Haladyna, T., & Hess, R. K. (1994). The detection and correction of bias in student ratings of instruction. Research in Higher Education, 35, 1209–1217.
  • Hoyt, D. P., Chen, Y., Pallett, W. H., & Gross, A. B. (1999). IDEA Technical Report No. 11: Revising the IDEA systems for obtaining student ratings of instructor and courses. Kansas State University, Manhattan, KS. The IDEA Center. Retrieved from http://ideaedu.org/wp-content/uploads/ 2014/ 11/techreport-11.pdf.
  • Hoyt, W. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?. Psychological Methods, 5(1), 64-86.
  • Hoyt, W., & Kerns, M. (1999). Magnitude and moderators of bias in observer ratings: A meta-analysis. Psychological Methods, 4(4), 403-424.
  • Keeley, J. (2012). Course and instructor evaluation. In W. Buskist & V. A. Benassi (Eds.), Effective college and university teaching. Strategies and tactics for the new professoriate (pp. 173-180). Thousand Oaks, CA: Sage.
  • Koh, H., & Tan, T. (1997). Empirical investigation of factor affecting SET results. International Journal of Educational Management, 11, 170-208.
  • Lane, S., & Stone, C. A. (2006). Performance Assessments. In B. Brennan (Ed.), Educational Measurement. Westport, CT: American Council on Education & Praeger
  • Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA.
  • Linacre, J. M. (1994). Constructing measurement with a many-facet Rasch model. In M. Wilson (Ed.) Objective measurement: Theory in practice (Vol. 2, pp. 129-144) Norwood, NJ: Abex.
  • Linacre, J. M. (2009). FACETS (Computer program, version 3.66.1). Chicago: MESA.
  • Linacre, J. M., & Wright, B. D. (2002). Understanding Rasch measurement: Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486-512.
  • Lunz, M., Wright, B., & Linacre, J. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345.
  • Marks, R. (2000). Determinants of student evaluations of global measures of instructor and course value. Journal of Marketing Education, 22(2), 108-119.
  • Marsh, H. (1982). SEEQ: A reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(1), 77-95.
  • Marsh, H. W. (1984). Students’ evaluation of university teaching: Dimensionality, reliability, validity, potential bias, and utility. Journal of Educational Psychology, 76, 707–754.
  • Marsh, H. (1987). Students' evaluations of University teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11(3), 253-388.
  • Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52, 1187–1197.
  • Messick, S. (1998). Test validity: A matter of consequences. Social Indicators Research, 45, 35-44.
  • Moore, S., & Kuol, N. (2005). Students evaluating teachers: Exploring the importance of faculty reaction to feedback on teaching. Teaching in Higher Education, 10(1), 57-73.
  • Mortelmans, D., & Spooren, P. (2009). A revalidation of the SET37 questionnaire for student evaluations of teaching. Educational Studies, 35, 547–52.
  • Mulqueen, C., Baker, D. P., & Key Dismukes, R. (2002). Pilot instructor training: The utility of the multifacet Item Response Theory model. International Journal of Aviation Psychology, 12, 287-303.
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
  • Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Nelson, J. P., & Lynch, K. A. (1984). Grade inflation, real income, simultaneity, and teaching evaluation. Journal of Economic Education, 15, 21–39.
  • Ory, J. C., & Ryan, K. (2001). How do student ratings measure up to a new validity framework? New Directions for Teaching and Learning, 5, 27-44.
  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
  • Penny, A. R. (2003) Changing the agenda for research into students' views about university teaching: Four shortcomings of SRT research. Teaching in Higher Education, 8(3), 399-411.
  • Seldin, P. (1993). The use and abuse of student ratings of professors. The Chronicle of Higher Education, 39(46), A40.
  • Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642.
  • Sudweeks, R., Reeve, S., & Bradshaw, W. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239-261.
  • Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23(2), 191-211.
  • Williams, W. M., & Ceci, S. (1997). “How'm i doing?” Problems with student ratings of instructors and courses. Change: The Magazine of Higher Learning, 29(5), 12-23.
  • Wright, R. (1996). A study of the acquisition of verbs of motion by Grade 4/5 early French immersion students. The Canadian Modern Language Review, 53(1), 257-280.
  • Wright, B., & Stone, M. (1979). Best test design. Chicago: Mesa Press.
  • Zabaleta, F. (2007). The use and misuse of student evaluations of teaching. Teaching In Higher Education, 12(1), 55-76.
  • Zangenehzadeh, H. (1988). Grade inflation: A way out. Journal of Economic Education, 19, 217–230.
  • Zhao, J. Z., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: Exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227-235.
Year 2017, Volume: 8 Issue: 1, 15 - 33, 03.04.2017
https://doi.org/10.21031/epod.298462

Abstract

References

  • Abrami, P. (2001). Improving judgments about teaching effectiveness using teacher rating forms. New Directions for Institutional Research, 2001(109), 59-87. http://dx.doi.org/10.1002/ir.4
  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 357-74.
  • Barnes, D. C., Engelland, B. T., Matherine, C. F., Martin, W. C., Orgeron, C. P., Ring, J. K., et al. (2008). Developing a psychometrically sound measure of collegiate teaching proficiency. College Student Journal, 42(1), 199-213.
  • Basow, S. A., & Martin, J. L. (2012). Bias in student ratings. In M.E. Kite (Ed.), Effective evaluation of teaching: A guide for faculty and administrators. Retrieved from the Society for the Teaching of Psychology web site: http://teachpsych.org/ebooks/evals2012/index.php
  • Beran, T., Violato, C., & Kline, D. (2007). What’s the ‘use’ of student ratings of instruction for administrators? One university’s experience. Canadian Journal of Higher Education, 17(1), 27-43.
  • Cashin, W. E. (1995). Student ratings of teaching: The research revisited. IDEA Paper No. 32. Retrieved from http://www.faculty.umb.edu/pjt/cashin95.pdf
  • Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco, CA: Jossey-Bass.
  • Dodeen, H. (2013). Validity, reliability, and potential bias of short forms of students’ evaluation of teaching: The case of UAE University. Educational Assessment, 18, 235-250.
  • Eckes, T. (2005). Examining rater effects in Testdaf writing and speaking performance assessments: A Many-Facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221.
  • Eckes, T. (2009). Many-Facet Rasch measurement. In S. Takala (Ed.), Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H). Strasbourg, France: Council of Europe/Language Policy Division.
  • Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement. Analyzing and Evaluating Rater-Mediated Assessments. Frankfurt am Main: Peter Lang.
  • Ginns, P., Prosser, M., & Barrie, S. (2007). Students’ perceptions of teaching quality in higher education: The perspective of currently enrolled students. Studies in Higher Education, 32(5), 603–615.
  • Gravestock, P. & Gregor-Greenleaf, E. (2008). Student course evaluations: Research, models, and trends. Toronto: Higher Education Quality Council of Ontario.
  • Gursoy, D., & Umbreit, W. T. (2005). Exploring students’ evaluations of teaching effectiveness: What factors are important? Journal of Hospitality & Tourism Research, 29, 91-109.
  • Haladyna, T., & Hess, R. K. (1994). The detection and correction of bias in student ratings of instruction. Research in Higher Education, 35, 1209–1217.
  • Hoyt, D. P., Chen, Y., Pallett, W. H., & Gross, A. B. (1999). IDEA Technical Report No. 11: Revising the IDEA systems for obtaining student ratings of instructor and courses. Kansas State University, Manhattan, KS. The IDEA Center. Retrieved from http://ideaedu.org/wp-content/uploads/ 2014/ 11/techreport-11.pdf.
  • Hoyt, W. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?. Psychological Methods, 5(1), 64-86.
  • Hoyt, W., & Kerns, M. (1999). Magnitude and moderators of bias in observer ratings: A meta-analysis. Psychological Methods, 4(4), 403-424.
  • Keeley, J. (2012). Course and instructor evaluation. In W. Buskist & V. A. Benassi (Eds.), Effective college and university teaching. Strategies and tactics for the new professoriate (pp. 173-180). Thousand Oaks, CA: Sage.
  • Koh, H., & Tan, T. (1997). Empirical investigation of factor affecting SET results. International Journal of Educational Management, 11, 170-208.
  • Lane, S., & Stone, C. A. (2006). Performance Assessments. In B. Brennan (Ed.), Educational Measurement. Westport, CT: American Council on Education & Praeger
  • Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA.
  • Linacre, J. M. (1994). Constructing measurement with a many-facet Rasch model. In M. Wilson (Ed.) Objective measurement: Theory in practice (Vol. 2, pp. 129-144) Norwood, NJ: Abex.
  • Linacre, J. M. (2009). FACETS (Computer program, version 3.66.1). Chicago: MESA.
  • Linacre, J. M., & Wright, B. D. (2002). Understanding Rasch measurement: Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486-512.
  • Lunz, M., Wright, B., & Linacre, J. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345.
  • Marks, R. (2000). Determinants of student evaluations of global measures of instructor and course value. Journal of Marketing Education, 22(2), 108-119.
  • Marsh, H. (1982). SEEQ: A reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(1), 77-95.
  • Marsh, H. W. (1984). Students’ evaluation of university teaching: Dimensionality, reliability, validity, potential bias, and utility. Journal of Educational Psychology, 76, 707–754.
  • Marsh, H. (1987). Students' evaluations of University teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11(3), 253-388.
  • Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52, 1187–1197.
  • Messick, S. (1998). Test validity: A matter of consequences. Social Indicators Research, 45, 35-44.
  • Moore, S., & Kuol, N. (2005). Students evaluating teachers: Exploring the importance of faculty reaction to feedback on teaching. Teaching in Higher Education, 10(1), 57-73.
  • Mortelmans, D., & Spooren, P. (2009). A revalidation of the SET37 questionnaire for student evaluations of teaching. Educational Studies, 35, 547–52.
  • Mulqueen, C., Baker, D. P., & Key Dismukes, R. (2002). Pilot instructor training: The utility of the multifacet Item Response Theory model. International Journal of Aviation Psychology, 12, 287-303.
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
  • Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Nelson, J. P., & Lynch, K. A. (1984). Grade inflation, real income, simultaneity, and teaching evaluation. Journal of Economic Education, 15, 21–39.
  • Ory, J. C., & Ryan, K. (2001). How do student ratings measure up to a new validity framework? New Directions for Teaching and Learning, 5, 27-44.
  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
  • Penny, A. R. (2003) Changing the agenda for research into students' views about university teaching: Four shortcomings of SRT research. Teaching in Higher Education, 8(3), 399-411.
  • Seldin, P. (1993). The use and abuse of student ratings of professors. The Chronicle of Higher Education, 39(46), A40.
  • Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642.
  • Sudweeks, R., Reeve, S., & Bradshaw, W. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239-261.
  • Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23(2), 191-211.
  • Williams, W. M., & Ceci, S. (1997). “How'm i doing?” Problems with student ratings of instructors and courses. Change: The Magazine of Higher Learning, 29(5), 12-23.
  • Wright, R. (1996). A study of the acquisition of verbs of motion by Grade 4/5 early French immersion students. The Canadian Modern Language Review, 53(1), 257-280.
  • Wright, B., & Stone, M. (1979). Best test design. Chicago: Mesa Press.
  • Zabaleta, F. (2007). The use and misuse of student evaluations of teaching. Teaching In Higher Education, 12(1), 55-76.
  • Zangenehzadeh, H. (1988). Grade inflation: A way out. Journal of Economic Education, 19, 217–230.
  • Zhao, J. Z., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: Exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227-235.
There are 51 citations in total.

Details

Journal Section Articles
Authors

Bengu Borkan

Publication Date April 3, 2017
Acceptance Date February 28, 2017
Published in Issue Year 2017 Volume: 8 Issue: 1

Cite

APA Borkan, B. (2017). Exploring Variability Sources in Student Evaluation of Teaching via Many-Facet Rasch Model. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 15-33. https://doi.org/10.21031/epod.298462