Araştırma Makalesi
BibTex RIS Kaynak Göster

An Empirical Study for the Statistical Adjustment of Rater Bias

Yıl 2019, Cilt: 6 Sayı: 2, 193 - 201, 15.07.2019
https://doi.org/10.21449/ijate.533517

Öz

This
study investigated the effectiveness of statistical adjustments applied to
rater bias in many-facet Rasch analysis. Some changes were first made in the
dataset that did not include
rater ×
examinee
bias to cause to have rater
× examinee
bias. Later, bias adjustment was applied to rater bias included
in the data file, and the effectiveness of the statistical adjustment was
further examined. The outcomes pertaining to the datasets with and without
bias, and to which the bias adjustment was applied, were compared. It was
concluded that diversities created by
rater
× examinee
bias in examinees’ ability estimation, item difficulty indices
and measures of rater severity and leniency were, to a large extent, eliminated
by bias adjustment. This result indicates that the bias adjustment using
many-facet Rasch analysis is a viable way to control rater bias.

Kaynakça

  • Aubin, A. S., St-Onge, C., & Renaud, J. S. (2018). Detecting rater bias using a person-fit statistic: A Monte Carlo simulation study. Perspectives on Medical Education, 7(2), 83-92. http://dx.doi.org/10.1007/s40037-017-0391-8
  • Bailey, K. (1994). Methods of social research. New York: The Free.
  • Bennett, R. E. (1991). On the meanings of constructed response. ETS Research Report Series, 2, 1-46. http://dx.doi.org/10.1002/j.2333-8504.1991.tb01429.x
  • Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed response items. ETS Research Report Series, 1, 1 - 29. http://dx.doi.org/10.1002/j.2333-8504.1990.tb01348.x
  • Connaway, L. S., & Powell, R. R. (2010). Basic research methods for librarians. Santa Barbara, CA: Libraries Unlimited.
  • DeMars, C. (2010). Item response theory. Oxford, UK: Oxford University.
  • Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. http://dx.doi.org/10.1207/s15434311laq0203_2
  • Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1-16. Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf
  • Güler, N., İlhan, M., Güneyli, A., & Demir, S. (2017). An evaluation of the psychometric properties of three different forms of Daly and Miller’s writing apprehension test through Rasch analysis. Educational Sciences: Theory & Practice, 17(3), 721-744. http://dx.doi.org/10.12738/estp.2017.3.0051
  • Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87 - 102. Retrieved from http://www.celea.org.cn/teic/90/10060807.pdf
  • Haladyana, T. M. (1997). Writing test items to evaluate higher order thinking. Needham Heights, MA: Allyn & Bacon.
  • Hogan, T. P., & Murphy, G. (2007) Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. http://dx.doi.org/10.1080/08957340701580736
  • Houston, W. M., Raymond, M.R., & Svec, J. C. (1991). Adjustments for rater effects in performance assessment. Applied Psychological Measurement, 15(4), 409-421. http://dx.doi.org/10.1177/014662169101500411
  • Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64–86. http://dx.doi.org/10.1037/1082-989X.5.1.64
  • İlhan, M. (2015). The identification of rater effects on open-ended math questions rated through standard rubrics and rubrics based on the SOLO taxonomy in reference to the many facet Rasch model. Doctoral dissertation, Gaziantep University, Gaziantep, Turkey. Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • İlhan, M. (2016). Comparison of the ability estimations of classical test theory and the many facet Rasch model in measurements with open-ended questions. Hacettepe University Journal of Education, 31(2), 346–368. http://dx.doi.org/10.16986/HUJE.2016015182
  • Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43. http://dx.doi.org/10.1016/j.asw.2007.04.001
  • Kondo Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3 - 31. https://doi.org/10.1191/0265532202lt218oa
  • Kumar, DSP D. (2005). Performance appraisal: The importance of rater training. Journal of the Kuala Lumpur Royal Malaysia Police College, 4, 1 - 15. Retrieved from http://rmpckl.rmp.gov.my/Journal/BI/performanceappraisal.pdf
  • Lee, M., Peterson, J. J., & Dixon, A. (2010). Rasch calibration of physical activity self-efficacy and social support scale for persons with intellectual disabilities. Research in Developmental Disabilities, 31(4), 903-913. http://dxdoi.org/10.1016/j.ridd.2010.02.010
  • Linacre, J. M. (2012). Many-facet Rasch measurement: Facets tutorial. Retrieved from http://www.winsteps.com/a/ftutorial2.pdf
  • Linacre, J. M. (2018). A user's guide to FACETS Rasch-model computer programs. Retrieved from https://www.winsteps.com/manuals.htm
  • McNamara, J. F., Erlandson, D. A., & McNamara, M. (2013). Measurement and evaluation: Strategies for school improvement. New York, NY: Routledge.
  • Myford, C. M., & Wolfe, E. W. (2004). Detecting and Measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applıed Measurement, 5(2), 189-227. Retrieved from http://jimelwood.net/students/grips/tables_figures/myford_wolfe_2004.pdf
  • Nandakumar, R., & Ackerman, T. A. (2004). Test modeling. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 93-105). Thousand Oaks, CA: Sage.
  • Raymond, M. R., & Houston, W. M. (1990). Detecting and correcting for rater effects inperformance assessment (ACT Research Rep. No. 90-14). Iowa City, American College Testing. Retrieved from http://www.act.org/content/dam/act/unsecured/documents/ACT_RR90-14.pdf
  • Raymond, M. R., & Viswesvaran, C. (1993). Least squares models to correct for rater effects in performance assessment. Journal of Educational Measurement, 30(3), 253-268. http://dx.doi.org/10.1111/j.1745-3984.1993.tb00426.x
  • Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428. http://dx.doi.org/10.1037/0033-2909.88.2.413
  • Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370-371. Retrieved from https://www.rasch.org/rmt/rmt83b.htm

An Empirical Study for the Statistical Adjustment of Rater Bias

Yıl 2019, Cilt: 6 Sayı: 2, 193 - 201, 15.07.2019
https://doi.org/10.21449/ijate.533517

Öz

This study investigated the effectiveness of statistical adjustments applied to rater bias in many-facet Rasch analysis. Some changes were first made in the dataset that did not include rater × examinee bias to cause to have rater × examinee bias. Later, bias adjustment was applied to rater bias included in the data file, and the effectiveness of the statistical adjustment was further examined. The outcomes pertaining to the datasets with and without bias, and to which the bias adjustment was applied, were compared. It was concluded that diversities created by rater × examinee bias in examinees’ ability estimation, item difficulty indices and measures of rater severity and leniency were, to a large extent, eliminated by bias adjustment. This result indicates that the bias adjustment using many-facet Rasch analysis is a viable way to control rater bias.

Kaynakça

  • Aubin, A. S., St-Onge, C., & Renaud, J. S. (2018). Detecting rater bias using a person-fit statistic: A Monte Carlo simulation study. Perspectives on Medical Education, 7(2), 83-92. http://dx.doi.org/10.1007/s40037-017-0391-8
  • Bailey, K. (1994). Methods of social research. New York: The Free.
  • Bennett, R. E. (1991). On the meanings of constructed response. ETS Research Report Series, 2, 1-46. http://dx.doi.org/10.1002/j.2333-8504.1991.tb01429.x
  • Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed response items. ETS Research Report Series, 1, 1 - 29. http://dx.doi.org/10.1002/j.2333-8504.1990.tb01348.x
  • Connaway, L. S., & Powell, R. R. (2010). Basic research methods for librarians. Santa Barbara, CA: Libraries Unlimited.
  • DeMars, C. (2010). Item response theory. Oxford, UK: Oxford University.
  • Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. http://dx.doi.org/10.1207/s15434311laq0203_2
  • Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1-16. Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf
  • Güler, N., İlhan, M., Güneyli, A., & Demir, S. (2017). An evaluation of the psychometric properties of three different forms of Daly and Miller’s writing apprehension test through Rasch analysis. Educational Sciences: Theory & Practice, 17(3), 721-744. http://dx.doi.org/10.12738/estp.2017.3.0051
  • Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87 - 102. Retrieved from http://www.celea.org.cn/teic/90/10060807.pdf
  • Haladyana, T. M. (1997). Writing test items to evaluate higher order thinking. Needham Heights, MA: Allyn & Bacon.
  • Hogan, T. P., & Murphy, G. (2007) Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. http://dx.doi.org/10.1080/08957340701580736
  • Houston, W. M., Raymond, M.R., & Svec, J. C. (1991). Adjustments for rater effects in performance assessment. Applied Psychological Measurement, 15(4), 409-421. http://dx.doi.org/10.1177/014662169101500411
  • Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64–86. http://dx.doi.org/10.1037/1082-989X.5.1.64
  • İlhan, M. (2015). The identification of rater effects on open-ended math questions rated through standard rubrics and rubrics based on the SOLO taxonomy in reference to the many facet Rasch model. Doctoral dissertation, Gaziantep University, Gaziantep, Turkey. Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • İlhan, M. (2016). Comparison of the ability estimations of classical test theory and the many facet Rasch model in measurements with open-ended questions. Hacettepe University Journal of Education, 31(2), 346–368. http://dx.doi.org/10.16986/HUJE.2016015182
  • Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43. http://dx.doi.org/10.1016/j.asw.2007.04.001
  • Kondo Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3 - 31. https://doi.org/10.1191/0265532202lt218oa
  • Kumar, DSP D. (2005). Performance appraisal: The importance of rater training. Journal of the Kuala Lumpur Royal Malaysia Police College, 4, 1 - 15. Retrieved from http://rmpckl.rmp.gov.my/Journal/BI/performanceappraisal.pdf
  • Lee, M., Peterson, J. J., & Dixon, A. (2010). Rasch calibration of physical activity self-efficacy and social support scale for persons with intellectual disabilities. Research in Developmental Disabilities, 31(4), 903-913. http://dxdoi.org/10.1016/j.ridd.2010.02.010
  • Linacre, J. M. (2012). Many-facet Rasch measurement: Facets tutorial. Retrieved from http://www.winsteps.com/a/ftutorial2.pdf
  • Linacre, J. M. (2018). A user's guide to FACETS Rasch-model computer programs. Retrieved from https://www.winsteps.com/manuals.htm
  • McNamara, J. F., Erlandson, D. A., & McNamara, M. (2013). Measurement and evaluation: Strategies for school improvement. New York, NY: Routledge.
  • Myford, C. M., & Wolfe, E. W. (2004). Detecting and Measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applıed Measurement, 5(2), 189-227. Retrieved from http://jimelwood.net/students/grips/tables_figures/myford_wolfe_2004.pdf
  • Nandakumar, R., & Ackerman, T. A. (2004). Test modeling. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 93-105). Thousand Oaks, CA: Sage.
  • Raymond, M. R., & Houston, W. M. (1990). Detecting and correcting for rater effects inperformance assessment (ACT Research Rep. No. 90-14). Iowa City, American College Testing. Retrieved from http://www.act.org/content/dam/act/unsecured/documents/ACT_RR90-14.pdf
  • Raymond, M. R., & Viswesvaran, C. (1993). Least squares models to correct for rater effects in performance assessment. Journal of Educational Measurement, 30(3), 253-268. http://dx.doi.org/10.1111/j.1745-3984.1993.tb00426.x
  • Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428. http://dx.doi.org/10.1037/0033-2909.88.2.413
  • Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370-371. Retrieved from https://www.rasch.org/rmt/rmt83b.htm
Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitim Üzerine Çalışmalar
Bölüm Makaleler
Yazarlar

Mustafa İlhan 0000-0003-1804-002X

Yayımlanma Tarihi 15 Temmuz 2019
Gönderilme Tarihi 28 Şubat 2019
Yayımlandığı Sayı Yıl 2019 Cilt: 6 Sayı: 2

Kaynak Göster

APA İlhan, M. (2019). An Empirical Study for the Statistical Adjustment of Rater Bias. International Journal of Assessment Tools in Education, 6(2), 193-201. https://doi.org/10.21449/ijate.533517

23823             23825             23824