Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories

Ayfer Ezgi Yılmaz

doi:10.2339/politeknik.1568563

Research Article

Çoklu Değerlendiriciye Sahip Sıralanabilir Düzeyli Uyum Çalışmalarında Ağırlıklandırma Şemalarının Ağırlıklı Kappa Katsayıları Üzerindeki Etkisi

Year 2025, Volume: 28 Issue: 5, 1375 - 1397

Ayfer Ezgi Yılmaz

https://doi.org/10.2339/politeknik.1568563

Abstract

Ağırlıklı kappa ve kappa benzeri katsayılar, değerlendiricilerin gözlemleri sıralanabilir düzeyler halinde sınıflandırdığı durumlarda, değerlendiriciler arası uyumun hesaplanmasında kullanılır. Ağırlıklı kappa katsayıları, çoklu değerlendiriciye sahip çalışmalarda kullanılmak üzere genişletilmiştir. Katsayının değerini doğrudan etkileyebileceğinden dolayı, uygun ağırlıklandırma şemalarının seçilmesi çok önemlidir. Bu çalışmada, çoklu değerlendiriciye sahip sıralanabilir düzeyli çalışmalarda ağırlıklı kappa katsayılarının doğruluğu ve doğrusal, karesel, ridit tipi ve üstel tipi ağırlıklandırma şemalarının bu katsayılara etkisi tartışılmıştır. Katsayıların doğruluğu, örnek bir veri ve simülasyon çalışması üzerinden araştırılmıştır.

Keywords

Değerlendiriciler arası uyum , Çoklu değerlendiriciler , Sıralanabilir ölçekler , Ağırlıklı kappa , Ağırlıklandırma şemaları

References

[1] Cohen J., “A coefficient of agreement for nominal scales”, Educational and Psychological Measurement, 20(1): 37–46, (1960).
[2] Aydin E.A., “EEG sinyalleri kullanılarak zihinsel iş yükü seviyelerinin sınıflandırılması”, Politeknik Dergisi, 24(2): 681–689, (2021).
[3] Cohen J., “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit”, Psychological Bulletin, 70(4): 213–220, (1968).
[4] Conger A.J., “Integration and generalization of kappas for multiple raters”, Psychological Bulletin, 88(2): 322–328, (1960).
[5] Warrens M.J., “A family of multi-rater kappas that can always be increased and decreased by combining categories”, Statistical Methodology, 9(3): 330–340, (2012).
[6] Warrens M.J., “Equivalences of weighted kappas for multiple raters”, Statistical Methodology, 9(3): 407–422, (2012).
[7] Moss J., “Measures of agreement with multiple raters: Fréchet variances and inference”, Psychometrika, 89(2): 517–541, (2024).
[8] Light R.J., “Measures of response agreement for qualitative data: some generalizations and Alternatives”, Psychological Bulletin, 76(5): 365–377, (1971).
[9] Hubert L., “Kappa revisited”, Psychological Bulletin, 84(2): 289–297, (1977).
[10] Mielke P.W., Berry K.J. and Johnston J.E., “The exact variance of weighted kappa with multiple raters”, Psychological Reports, 101(2): 655–660, (2007).
[11] Abraira V. and de Vargas A.P., “Generalization of the kappa coeficient for ordinal categorical data, multiple observers and incomplete designs”, Questiio, 23(3): 561–571, (1999).
[12] Schuster C. and Smith D.A., “Dispersion-weighted kappa: An integrative framework for metric and nominal scale agreement coefficients”, Psychometrika, 70(1): 135–146, (2005).
[13] Vanbelle S. and Albert A., “Agreement between an isolated rater and a group of raters”, Statistica Neerlandica, 63(1): 82–100, (2009).
[14] Kvalseth T.O., “An alternative interpretation of the linearly weighted Kappa coefficients for ordinal data”, Psychometrika, 83(3): 618–627, (2018).
[15] Yilmaz A.E. and Aktas S., “Ridit and exponential type scores for estimating the kappa statistic”, Kuwait Journal of Science, 45 (1): 89-99, (2018).
[16] Warrens M.J., “Conditional inequalities between Cohen’s kappa and weighted kappas”, Statistical Methodology, 10(1): 14–22, (2013).
[17] Tran D., Dolgun A. and Demirhan H., “Weighted inter-rater agreement measures for ordinal outcomes”, Communications in Statistics-Simulation and Computation, 49(4): 989–1003, (2020).
[18] Vanbelle S., Engelhart C.H. and Blix E., “A comprehensive guide to study the agreement and reliability of multi-observer ordinal data”, BMC Medical Research Methodology, 24(1): 1–14, (2024).
[19] Demirhan H. and Yilmaz A. E., “Detection of grey zones in inter-rater agreement studies”, BMC Medical Research Methodology, 23(1): 1–15, (2023).
[20] Yilmaz A.E. and Demirhan H., “Weighted kappa measures for ordinal multi-class classification performance”, Applied Soft Computing, 134(110020): 1–16, (2023).
[21] de Raadt A., Warrens M.J., Bosker R.J. and Kiers H.A., “A comparison of reliability coefficients for ordinal rating scales”, Journal of Classification, 38(3): 519–543, (2021).
[22] Mielke P.W., Berry K.J. and Johnston J.E., “Resampling probability values for weighted kappa with multiple raters”, Psychological Reports, 102(2): 606–613, (2008).
[23] [23] Mielke P.W. and Berry, K.J., “A note on Cohen’s weighted kappa coefficient of agreement with linear weights”, Statistical Methodology, 6(5): 439–446, (2009).
[24] Warrens M.J., “Corrected Zegers-ten Berge coefficients are special cases of Cohen’s weighted kappa”, Journal of Classification, 31(2): 179–193, (2014).
[25] Warrens M.J., “Cohen’s linearly weighted kappa is a weighted average of 2x2 kappas”, Psychometrika, 76(3): 471–486, (2011).
[26] Landis J.R. and Koch G.G., “The measurement of observer agreement for categorical data.”, Biometrics, 33(1): 159–174, (1977).
[27] Altman D.G., “Practical Statistics for Medical Research”, Chapman & Hall, London, (1991).
[28] Fleiss J.L., Levin B. and Paik M.C., “Statistical Methods for Rates & Proportions”, Wiley & Sons, New York, (2003).
[29] Yilmaz A.E. and Saracbasi T., “Assessing agreement between raters from the point of coefficients and log-linear models”, Journal of Data Science, 15(1), 1–24, (2017).
[30] Cicchetti D.V. and Allison T., “A new procedure for assessing reliability of scoring EEG sleep recordings”, American Journal of EEG Technology, 11(3): 101–110, (1971).
[31] Fleiss J.L. and Cohen J., “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability”, Educational and Psychological Measurement, 33(3): 613–619, (1973).
[32] Bross I.D.J., “How to use ridit analysis”, Journal of Applied Statistics, 14 (1): 18-38, (1958).
[33] Iki K., Tahata K. and Tomizawa S., “Ridit score type quasi-symmetry and decomposition of symmetry for square contingency tables with ordered categories”, Austrian Journal of Statistics, 38(3): 183–192, (2009).
[34] Bagheban A.A. and Zayeri F., “A generalization of the uniform association model for assessing rater agreement in ordinal scales”, Journal of Applied Statistics, 37(8): 1265–1273, (2010).
[35] Weinberger M., Ferguson J.A., Westmoreland G., Mamlin L.A., Segar D.S., Eckert G.J., Greene J.Y., Martin D.K. and Tierney W.M., “Can raters consistently evaluate the content of focus groups?”, Social Science & Medicine, 46(7): 929–933, (1998).
[36] Muthen B., “A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators”, Psychometrika, 49(1): 115–132, (1984).
[37] Sertdemir Y., Burgut H.R., Alparslan Z.N., Unal I. and Gunasti S., “Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data”, Journal of Applied Statistics, 40(7): 1506–1519, (2013).
[38] Sarkar D., “Lattice: Multivariate Data Visualization with R”, (2008).

Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories

Year 2025, Volume: 28 Issue: 5, 1375 - 1397

Ayfer Ezgi Yılmaz

https://doi.org/10.2339/politeknik.1568563

Abstract

Weighted kappa and kappa-like coefficients are used for the calculation of inter-rater agreement in cases where raters classify objects into ordinal categories. Weighted kappa coefficients are extended for use in studies with multiple raters. It is crucial to select appropriate weighting schemes as they can significantly impact the value of the coefficient. In this study, the accuracy of weighted kappa coefficients and the effects of linear, quadratic, ridit type, and exponential type weighting schemes on these coefficients are discussed in the multi-rater agreement studies with ordinal categories. The accuracy of the coefficients is investigated by an illustrative data and a simulation study.

Keywords

Inter-rater agreement , Multi-raters , Ordinal scales , Weighted kappa , Weighting schemes.

References

[1] Cohen J., “A coefficient of agreement for nominal scales”, Educational and Psychological Measurement, 20(1): 37–46, (1960).
[2] Aydin E.A., “EEG sinyalleri kullanılarak zihinsel iş yükü seviyelerinin sınıflandırılması”, Politeknik Dergisi, 24(2): 681–689, (2021).
[3] Cohen J., “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit”, Psychological Bulletin, 70(4): 213–220, (1968).
[4] Conger A.J., “Integration and generalization of kappas for multiple raters”, Psychological Bulletin, 88(2): 322–328, (1960).
[5] Warrens M.J., “A family of multi-rater kappas that can always be increased and decreased by combining categories”, Statistical Methodology, 9(3): 330–340, (2012).
[6] Warrens M.J., “Equivalences of weighted kappas for multiple raters”, Statistical Methodology, 9(3): 407–422, (2012).
[7] Moss J., “Measures of agreement with multiple raters: Fréchet variances and inference”, Psychometrika, 89(2): 517–541, (2024).
[8] Light R.J., “Measures of response agreement for qualitative data: some generalizations and Alternatives”, Psychological Bulletin, 76(5): 365–377, (1971).
[9] Hubert L., “Kappa revisited”, Psychological Bulletin, 84(2): 289–297, (1977).
[10] Mielke P.W., Berry K.J. and Johnston J.E., “The exact variance of weighted kappa with multiple raters”, Psychological Reports, 101(2): 655–660, (2007).
[11] Abraira V. and de Vargas A.P., “Generalization of the kappa coeficient for ordinal categorical data, multiple observers and incomplete designs”, Questiio, 23(3): 561–571, (1999).
[12] Schuster C. and Smith D.A., “Dispersion-weighted kappa: An integrative framework for metric and nominal scale agreement coefficients”, Psychometrika, 70(1): 135–146, (2005).
[13] Vanbelle S. and Albert A., “Agreement between an isolated rater and a group of raters”, Statistica Neerlandica, 63(1): 82–100, (2009).
[14] Kvalseth T.O., “An alternative interpretation of the linearly weighted Kappa coefficients for ordinal data”, Psychometrika, 83(3): 618–627, (2018).
[15] Yilmaz A.E. and Aktas S., “Ridit and exponential type scores for estimating the kappa statistic”, Kuwait Journal of Science, 45 (1): 89-99, (2018).
[16] Warrens M.J., “Conditional inequalities between Cohen’s kappa and weighted kappas”, Statistical Methodology, 10(1): 14–22, (2013).
[17] Tran D., Dolgun A. and Demirhan H., “Weighted inter-rater agreement measures for ordinal outcomes”, Communications in Statistics-Simulation and Computation, 49(4): 989–1003, (2020).
[18] Vanbelle S., Engelhart C.H. and Blix E., “A comprehensive guide to study the agreement and reliability of multi-observer ordinal data”, BMC Medical Research Methodology, 24(1): 1–14, (2024).
[19] Demirhan H. and Yilmaz A. E., “Detection of grey zones in inter-rater agreement studies”, BMC Medical Research Methodology, 23(1): 1–15, (2023).
[20] Yilmaz A.E. and Demirhan H., “Weighted kappa measures for ordinal multi-class classification performance”, Applied Soft Computing, 134(110020): 1–16, (2023).
[21] de Raadt A., Warrens M.J., Bosker R.J. and Kiers H.A., “A comparison of reliability coefficients for ordinal rating scales”, Journal of Classification, 38(3): 519–543, (2021).
[22] Mielke P.W., Berry K.J. and Johnston J.E., “Resampling probability values for weighted kappa with multiple raters”, Psychological Reports, 102(2): 606–613, (2008).
[23] [23] Mielke P.W. and Berry, K.J., “A note on Cohen’s weighted kappa coefficient of agreement with linear weights”, Statistical Methodology, 6(5): 439–446, (2009).
[24] Warrens M.J., “Corrected Zegers-ten Berge coefficients are special cases of Cohen’s weighted kappa”, Journal of Classification, 31(2): 179–193, (2014).
[25] Warrens M.J., “Cohen’s linearly weighted kappa is a weighted average of 2x2 kappas”, Psychometrika, 76(3): 471–486, (2011).
[26] Landis J.R. and Koch G.G., “The measurement of observer agreement for categorical data.”, Biometrics, 33(1): 159–174, (1977).
[27] Altman D.G., “Practical Statistics for Medical Research”, Chapman & Hall, London, (1991).
[28] Fleiss J.L., Levin B. and Paik M.C., “Statistical Methods for Rates & Proportions”, Wiley & Sons, New York, (2003).
[29] Yilmaz A.E. and Saracbasi T., “Assessing agreement between raters from the point of coefficients and log-linear models”, Journal of Data Science, 15(1), 1–24, (2017).
[30] Cicchetti D.V. and Allison T., “A new procedure for assessing reliability of scoring EEG sleep recordings”, American Journal of EEG Technology, 11(3): 101–110, (1971).
[31] Fleiss J.L. and Cohen J., “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability”, Educational and Psychological Measurement, 33(3): 613–619, (1973).
[32] Bross I.D.J., “How to use ridit analysis”, Journal of Applied Statistics, 14 (1): 18-38, (1958).
[33] Iki K., Tahata K. and Tomizawa S., “Ridit score type quasi-symmetry and decomposition of symmetry for square contingency tables with ordered categories”, Austrian Journal of Statistics, 38(3): 183–192, (2009).
[34] Bagheban A.A. and Zayeri F., “A generalization of the uniform association model for assessing rater agreement in ordinal scales”, Journal of Applied Statistics, 37(8): 1265–1273, (2010).
[35] Weinberger M., Ferguson J.A., Westmoreland G., Mamlin L.A., Segar D.S., Eckert G.J., Greene J.Y., Martin D.K. and Tierney W.M., “Can raters consistently evaluate the content of focus groups?”, Social Science & Medicine, 46(7): 929–933, (1998).
[36] Muthen B., “A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators”, Psychometrika, 49(1): 115–132, (1984).
[37] Sertdemir Y., Burgut H.R., Alparslan Z.N., Unal I. and Gunasti S., “Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data”, Journal of Applied Statistics, 40(7): 1506–1519, (2013).
[38] Sarkar D., “Lattice: Multivariate Data Visualization with R”, (2008).

There are 38 citations in total.

Details

Primary Language	English
Subjects	Numerical and Computational Mathematics (Other)
Journal Section	Research Article
Authors	Ayfer Ezgi Yılmaz 0000-0002-6214-8014
Early Pub Date	February 27, 2025
Publication Date	October 14, 2025
Submission Date	October 16, 2024
Acceptance Date	February 20, 2025
Published in Issue	Year 2025 Volume: 28 Issue: 5

Cite

APA	Yılmaz, A. E. (n.d.). Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories. Politeknik Dergisi, 28(5), 1375-1397. https://doi.org/10.2339/politeknik.1568563
AMA	Yılmaz AE. Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories. Politeknik Dergisi. 28(5):1375-1397. doi:10.2339/politeknik.1568563
Chicago	Yılmaz, Ayfer Ezgi. “Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies With Ordinal Categories”. Politeknik Dergisi 28, no. 5 n.d.: 1375-97. https://doi.org/10.2339/politeknik.1568563.
EndNote	Yılmaz AE Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories. Politeknik Dergisi 28 5 1375–1397.
IEEE	A. E. Yılmaz, “Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories”, Politeknik Dergisi, vol. 28, no. 5, pp. 1375–1397, doi: 10.2339/politeknik.1568563.
ISNAD	Yılmaz, Ayfer Ezgi. “Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies With Ordinal Categories”. Politeknik Dergisi 28/5 (n.d.), 1375-1397. https://doi.org/10.2339/politeknik.1568563.
JAMA	Yılmaz AE. Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories. Politeknik Dergisi.;28:1375–1397.
MLA	Yılmaz, Ayfer Ezgi. “Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies With Ordinal Categories”. Politeknik Dergisi, vol. 28, no. 5, pp. 1375-97, doi:10.2339/politeknik.1568563.
Vancouver	Yılmaz AE. Effect of Weighting Schemes on Weighted Kappa Coefficients in Multi-Rater Agreement Studies with Ordinal Categories. Politeknik Dergisi. 28(5):1375-97.

Article Files

Full Text

download This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.