An Illustration of a Latent Class Analysis for Interrater Agreement: Identifying Subpopulations with Different Agreement Levels
Yıl 2023,
Cilt: 14 Sayı: 4, 492 - 507, 31.12.2023
Ömer Emre Can Alagöz
,
Yılmaz Orhun Gürlük
,
Mediha Kormaz
,
Gizem Cömert
Öz
This study proposes a latent class analysis (LCA) approach to investigate interrater agreement based on rating patterns. LCA identifies which subjects are rated similarly or differently by raters, providing a new perspective for investigating agreement. Using an empirical dataset of parents and teachers evaluating pupils, the study found two latent classes of respondents, one belonging to a moderate agreement pattern and one belonging to low agreement pattern. We calculated raw agreement coefficient (RAC) per behaviour in the whole sample and each latent class. When RAC was calculated in the whole sample, many behaviour had low/moderate RAC values. However, LCA showed that these items had higher RAC values in the high agreement and lower RAC values in the low agreement class.
Destekleyen Kurum
Deutsche Forschungsgemeinschaft (DFG)
Teşekkür
We would like to thank Rukiye KIZILTEPE, Duygu ESLEK, Türkan YILMAZ IRMAK, Duygu GÜNGÖR CULHA for sharing the research data with us.
Kaynakça
- Agresti, A. (1992). Modelling patterns of agreement and disagreement. Statistical Methods in Medical Research, 1, 201-218. https://doi.org/10.1177/096228029200100205
- Ato, M., López, J. J., & Benavente, A. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402.
- Basten, M., Tienmeier H., Althoff, R., van de Schoot, R., Jaddoe, V. W. V., Hofman, A., Hudziak, J. J., Verhulst, F. C. & Van der Ende, J. (2015). The stability of problem behavior across the preschool years: an empirical approach in general population, Journal of Abnormal Child Psychology, 44(2), 393-404. https://doi.org/10.1007/s10802-015-9993-y
- Bıkmaz Bilgen, Ö. ve Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması, Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
- Cohen (1960). A coefficient of rater agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
- De Los Reyes, A., Henry, D. B., Tolan, P. H. T. & Wakschlag, L. S. (2009). Linking informant discrepancies to observed variations in young children’s disruptive behavior, Journal of Abnormal Psychology, 37(5), 637-652. https://doi.org/10.1007/s10802-009-9307-3
- Feinstein, A. R. & Cicchetti, D. V. (1990). High agreement but low kappa: I. the problems of two paradoxes, Journal of Clinical Epidemiology, 43(6), 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
- Fleiss, J. L. (1971). Measuring agreement for multinomial data. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
- Forster, A. J., O'Rourke, K., Shojania, K. G., & van Walraven, C. (2007). Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. Journal of clinical epidemiology, 60(9), 892-901.
- Gisev, N., Simon Bell, J. & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9, 330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
- Göktaş, A. & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37. https://doi.org/10.51936/milh5641
- Hallgren, K. (2012). Computimg inter-rater reliability for observational data: an overview and tutorial, Tutorials in Quantitative Methods for Psychology, 8(1), 23-34. https://doi.org/10.20982/tqmp.08.1.p023
- Hayes, A. F. & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding, Communication Methods and Measures, 1(1), 77-89. https://doi.org/10.1080/19312450709336664
- Jiang, Z. (2019). Using the iterative latent-class analysis approach to improve attribute accuracy in diagnostis classification models. Behavior Research Method, 51, 1075-1084. https://doi.org/10.3758/s13428-018-01191-0
- Kızıltepe R., Eslek, D., Yılmaz Irmak, T. & Güngör, D. (2022). I am learning to protect myself with Mika:” a teacher-based child sexual abuse prevention program in Turkey. Journal of Interpersonal Violence, 37(11-12), 1-25. https://doi.org/10.1177/0886260520986272
- Konstantinidis, M., Le, L. W., & Gao, X. (2022). An empirical comparative assessment of inter-rater agreement of binary outcomes and multiple raters. Symmetry, 14(2), 262. https://doi.org/10.3390/sym14020262
- Kottner, J.,Audige, L., Brorson, S., Donner, A., Gajewski, B., Hrobjartsson, A., Roberts, C., Shoukri, M. & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, 96-106. https://doi.org/10.1016/j.ijnurstu.2011.01.016
- Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310
- Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking about the same person here?” Interrater agreement in judgments of personality varies dramatically with how much the perceivers like the targets. Social Psychological and Personality Science, 4(4), 468-474. https://doi.org/10.1177/1948550612462414
- Major, S., Seabra-Santos, M. J. & Martin, R. P. (2018). Latent profile analysis: another approach to look at parent-teacher agreement on preschoolers’ behavior problems. European Early Childhood Education Research Journal, 26(5), 701-717. https://doi.org/10.1080/1350293X.2018.1522743
- Miller, W. E. (2011). A latent class method for the selection of prototypes using expert ratings. Statistics in Medicine, 31(1), 80-92.
- Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
- Raykov, T., Dimitrov, D. M., von Eye, A. & Marcoulides, G. A. (2013). Interrater Agreement Evaluation: a latent variable modeling approach. Educational and Psychological Measurement, 20(10). 1-20. https://doi.org/10.1177/0013164412449016
- Schuster, C. & Smith, D. A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 7(3), 384-395. https://doi.org/10.1037/1082-989X.7.3.384
- Sertdemir, Y., Burgut, H. R., Alparslan, Z. N., Unal, I., & Gunasti, S. (2013). Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data. Journal of Applied Statistics, 40(7), 1506-1519. https://doi.org/10.1080/02664763.2013.788617
- Shaffer, D., Schwab-Stone, M., Fisher, P., Cohen, P., Placentini, J., Davies, M. & Regier, D. (1993). The diagnostic interview schedule for children-revised version (DISC-R): I. Preparation, field testing, interrater reliability, and acceptability. Journal of the American Academy of Child & Adolescent Psychiatry, 32(3), 643-650. https://doi.org/10.1097/00004583-199305000-00023
- Tanner, M. A. & Young, M. A. (1985). Modelling agreemet among raters. Journal of the American Statistical Association, 80(389), 175-180. https://doi.org/10.1080/01621459.1985.10477157
- Thompson, D. M. (2003). Comparing SAS-based applications of latent class analysis using simulated patient classification data. The University of Oklahoma Health Sciences Center.
- Uebersax, J. S. & Grove, W. M. (1990). Latent class analysis of diagnostic agreement. Statistisc in Medicine, 9(5), 559-572. https://doi.org/10.1002/sim.4780090509
- Viera, A. J. & Garrett, J. M. (2005). Understanding interobserver agreement: the kappa statistics, Family Medicine, 37(5), 360-363. PMID: 15883903
- Von Eye, A. & Mun, E. Y. (2005). Analyzing rater agreement manifest variable methods (1st ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410611024
- Yarnold, P. R. (2016). ODA vs. π and κ: paradoxes of kappa, Optimal Data Analysis, 5, 160-161. Accessed at: https://www.researchgate.net/publication/309681250_ODA_vs_p_and_k_Paradoxes_of_Kappa, 23.03.2023
- Yilmaz, A. E. & Saracbasi, T. (2017). Assessing agreement between raters from the point of coefficients and log-linear models. Journal of Data Science, 15, 1-24. https://doi.org/10.6339/JDS.201701_15(1).0001
- Yilmaz, A. E. & Saracbasi, T. (2019). Agreement and adjusted degree of distinguishability for square contingency tables. Hacettepe Journal of Mathematics and Statistics, 48(2), 592-604. https://doi.org/10.15672/hjms.2018.620
- Zapf,A.,Castell, S., Morawietz, L., & Karch, A. (2016). Measuring inter-rater reliability for nominal data–which coefficients and confidence intervals are appropriate?. BMC medical research methodology, 16, 1-10. https://doi.org/10.1186/s12874-016-0200-9
Yıl 2023,
Cilt: 14 Sayı: 4, 492 - 507, 31.12.2023
Ömer Emre Can Alagöz
,
Yılmaz Orhun Gürlük
,
Mediha Kormaz
,
Gizem Cömert
Kaynakça
- Agresti, A. (1992). Modelling patterns of agreement and disagreement. Statistical Methods in Medical Research, 1, 201-218. https://doi.org/10.1177/096228029200100205
- Ato, M., López, J. J., & Benavente, A. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402.
- Basten, M., Tienmeier H., Althoff, R., van de Schoot, R., Jaddoe, V. W. V., Hofman, A., Hudziak, J. J., Verhulst, F. C. & Van der Ende, J. (2015). The stability of problem behavior across the preschool years: an empirical approach in general population, Journal of Abnormal Child Psychology, 44(2), 393-404. https://doi.org/10.1007/s10802-015-9993-y
- Bıkmaz Bilgen, Ö. ve Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması, Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
- Cohen (1960). A coefficient of rater agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
- De Los Reyes, A., Henry, D. B., Tolan, P. H. T. & Wakschlag, L. S. (2009). Linking informant discrepancies to observed variations in young children’s disruptive behavior, Journal of Abnormal Psychology, 37(5), 637-652. https://doi.org/10.1007/s10802-009-9307-3
- Feinstein, A. R. & Cicchetti, D. V. (1990). High agreement but low kappa: I. the problems of two paradoxes, Journal of Clinical Epidemiology, 43(6), 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
- Fleiss, J. L. (1971). Measuring agreement for multinomial data. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
- Forster, A. J., O'Rourke, K., Shojania, K. G., & van Walraven, C. (2007). Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. Journal of clinical epidemiology, 60(9), 892-901.
- Gisev, N., Simon Bell, J. & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9, 330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
- Göktaş, A. & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37. https://doi.org/10.51936/milh5641
- Hallgren, K. (2012). Computimg inter-rater reliability for observational data: an overview and tutorial, Tutorials in Quantitative Methods for Psychology, 8(1), 23-34. https://doi.org/10.20982/tqmp.08.1.p023
- Hayes, A. F. & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding, Communication Methods and Measures, 1(1), 77-89. https://doi.org/10.1080/19312450709336664
- Jiang, Z. (2019). Using the iterative latent-class analysis approach to improve attribute accuracy in diagnostis classification models. Behavior Research Method, 51, 1075-1084. https://doi.org/10.3758/s13428-018-01191-0
- Kızıltepe R., Eslek, D., Yılmaz Irmak, T. & Güngör, D. (2022). I am learning to protect myself with Mika:” a teacher-based child sexual abuse prevention program in Turkey. Journal of Interpersonal Violence, 37(11-12), 1-25. https://doi.org/10.1177/0886260520986272
- Konstantinidis, M., Le, L. W., & Gao, X. (2022). An empirical comparative assessment of inter-rater agreement of binary outcomes and multiple raters. Symmetry, 14(2), 262. https://doi.org/10.3390/sym14020262
- Kottner, J.,Audige, L., Brorson, S., Donner, A., Gajewski, B., Hrobjartsson, A., Roberts, C., Shoukri, M. & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, 96-106. https://doi.org/10.1016/j.ijnurstu.2011.01.016
- Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310
- Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking about the same person here?” Interrater agreement in judgments of personality varies dramatically with how much the perceivers like the targets. Social Psychological and Personality Science, 4(4), 468-474. https://doi.org/10.1177/1948550612462414
- Major, S., Seabra-Santos, M. J. & Martin, R. P. (2018). Latent profile analysis: another approach to look at parent-teacher agreement on preschoolers’ behavior problems. European Early Childhood Education Research Journal, 26(5), 701-717. https://doi.org/10.1080/1350293X.2018.1522743
- Miller, W. E. (2011). A latent class method for the selection of prototypes using expert ratings. Statistics in Medicine, 31(1), 80-92.
- Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
- Raykov, T., Dimitrov, D. M., von Eye, A. & Marcoulides, G. A. (2013). Interrater Agreement Evaluation: a latent variable modeling approach. Educational and Psychological Measurement, 20(10). 1-20. https://doi.org/10.1177/0013164412449016
- Schuster, C. & Smith, D. A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 7(3), 384-395. https://doi.org/10.1037/1082-989X.7.3.384
- Sertdemir, Y., Burgut, H. R., Alparslan, Z. N., Unal, I., & Gunasti, S. (2013). Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data. Journal of Applied Statistics, 40(7), 1506-1519. https://doi.org/10.1080/02664763.2013.788617
- Shaffer, D., Schwab-Stone, M., Fisher, P., Cohen, P., Placentini, J., Davies, M. & Regier, D. (1993). The diagnostic interview schedule for children-revised version (DISC-R): I. Preparation, field testing, interrater reliability, and acceptability. Journal of the American Academy of Child & Adolescent Psychiatry, 32(3), 643-650. https://doi.org/10.1097/00004583-199305000-00023
- Tanner, M. A. & Young, M. A. (1985). Modelling agreemet among raters. Journal of the American Statistical Association, 80(389), 175-180. https://doi.org/10.1080/01621459.1985.10477157
- Thompson, D. M. (2003). Comparing SAS-based applications of latent class analysis using simulated patient classification data. The University of Oklahoma Health Sciences Center.
- Uebersax, J. S. & Grove, W. M. (1990). Latent class analysis of diagnostic agreement. Statistisc in Medicine, 9(5), 559-572. https://doi.org/10.1002/sim.4780090509
- Viera, A. J. & Garrett, J. M. (2005). Understanding interobserver agreement: the kappa statistics, Family Medicine, 37(5), 360-363. PMID: 15883903
- Von Eye, A. & Mun, E. Y. (2005). Analyzing rater agreement manifest variable methods (1st ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410611024
- Yarnold, P. R. (2016). ODA vs. π and κ: paradoxes of kappa, Optimal Data Analysis, 5, 160-161. Accessed at: https://www.researchgate.net/publication/309681250_ODA_vs_p_and_k_Paradoxes_of_Kappa, 23.03.2023
- Yilmaz, A. E. & Saracbasi, T. (2017). Assessing agreement between raters from the point of coefficients and log-linear models. Journal of Data Science, 15, 1-24. https://doi.org/10.6339/JDS.201701_15(1).0001
- Yilmaz, A. E. & Saracbasi, T. (2019). Agreement and adjusted degree of distinguishability for square contingency tables. Hacettepe Journal of Mathematics and Statistics, 48(2), 592-604. https://doi.org/10.15672/hjms.2018.620
- Zapf,A.,Castell, S., Morawietz, L., & Karch, A. (2016). Measuring inter-rater reliability for nominal data–which coefficients and confidence intervals are appropriate?. BMC medical research methodology, 16, 1-10. https://doi.org/10.1186/s12874-016-0200-9