An Illustration of a Latent Class Analysis for Interrater Agreement: Identifying Subpopulations with Different Agreement Levels

Ömer Emre Can Alagöz; Yılmaz Orhun Gürlük; Mediha Kormaz; Gizem Cömert

doi:10.21031/epod.1308732

Research Article

An Illustration of a Latent Class Analysis for Interrater Agreement: Identifying Subpopulations with Different Agreement Levels

Year 2023, Volume: 14 Issue: 4, 492 - 507, 31.12.2023

Ömer Emre Can Alagöz , Yılmaz Orhun Gürlük , Mediha Kormaz , Gizem Cömert

https://doi.org/10.21031/epod.1308732

Abstract

This study proposes a latent class analysis (LCA) approach to investigate interrater agreement based on rating patterns. LCA identifies which subjects are rated similarly or differently by raters, providing a new perspective for investigating agreement. Using an empirical dataset of parents and teachers evaluating pupils, the study found two latent classes of respondents, one belonging to a moderate agreement pattern and one belonging to low agreement pattern. We calculated raw agreement coefficient (RAC) per behaviour in the whole sample and each latent class. When RAC was calculated in the whole sample, many behaviour had low/moderate RAC values. However, LCA showed that these items had higher RAC values in the high agreement and lower RAC values in the low agreement class.

Keywords

Interrater Agreement, Latent Class Analysis, Raw Agreement Coefficient, Agreement Methods, Mixture Modelling

Supporting Institution

Deutsche Forschungsgemeinschaft (DFG)

Project Number

2277

Thanks

We would like to thank Rukiye KIZILTEPE, Duygu ESLEK, Türkan YILMAZ IRMAK, Duygu GÜNGÖR CULHA for sharing the research data with us.

References

Agresti, A. (1992). Modelling patterns of agreement and disagreement. Statistical Methods in Medical Research, 1, 201-218. https://doi.org/10.1177/096228029200100205
Ato, M., López, J. J., & Benavente, A. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402.
Basten, M., Tienmeier H., Althoff, R., van de Schoot, R., Jaddoe, V. W. V., Hofman, A., Hudziak, J. J., Verhulst, F. C. & Van der Ende, J. (2015). The stability of problem behavior across the preschool years: an empirical approach in general population, Journal of Abnormal Child Psychology, 44(2), 393-404. https://doi.org/10.1007/s10802-015-9993-y
Bıkmaz Bilgen, Ö. ve Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması, Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Cohen (1960). A coefficient of rater agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
De Los Reyes, A., Henry, D. B., Tolan, P. H. T. & Wakschlag, L. S. (2009). Linking informant discrepancies to observed variations in young children’s disruptive behavior, Journal of Abnormal Psychology, 37(5), 637-652. https://doi.org/10.1007/s10802-009-9307-3
Feinstein, A. R. & Cicchetti, D. V. (1990). High agreement but low kappa: I. the problems of two paradoxes, Journal of Clinical Epidemiology, 43(6), 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
Fleiss, J. L. (1971). Measuring agreement for multinomial data. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
Forster, A. J., O'Rourke, K., Shojania, K. G., & van Walraven, C. (2007). Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. Journal of clinical epidemiology, 60(9), 892-901.
Gisev, N., Simon Bell, J. & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9, 330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
Göktaş, A. & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37. https://doi.org/10.51936/milh5641
Hallgren, K. (2012). Computimg inter-rater reliability for observational data: an overview and tutorial, Tutorials in Quantitative Methods for Psychology, 8(1), 23-34. https://doi.org/10.20982/tqmp.08.1.p023
Hayes, A. F. & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding, Communication Methods and Measures, 1(1), 77-89. https://doi.org/10.1080/19312450709336664
Jiang, Z. (2019). Using the iterative latent-class analysis approach to improve attribute accuracy in diagnostis classification models. Behavior Research Method, 51, 1075-1084. https://doi.org/10.3758/s13428-018-01191-0
Kızıltepe R., Eslek, D., Yılmaz Irmak, T. & Güngör, D. (2022). I am learning to protect myself with Mika:” a teacher-based child sexual abuse prevention program in Turkey. Journal of Interpersonal Violence, 37(11-12), 1-25. https://doi.org/10.1177/0886260520986272
Konstantinidis, M., Le, L. W., & Gao, X. (2022). An empirical comparative assessment of inter-rater agreement of binary outcomes and multiple raters. Symmetry, 14(2), 262. https://doi.org/10.3390/sym14020262
Kottner, J.,Audige, L., Brorson, S., Donner, A., Gajewski, B., Hrobjartsson, A., Roberts, C., Shoukri, M. & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, 96-106. https://doi.org/10.1016/j.ijnurstu.2011.01.016
Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310
Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking about the same person here?” Interrater agreement in judgments of personality varies dramatically with how much the perceivers like the targets. Social Psychological and Personality Science, 4(4), 468-474. https://doi.org/10.1177/1948550612462414
Major, S., Seabra-Santos, M. J. & Martin, R. P. (2018). Latent profile analysis: another approach to look at parent-teacher agreement on preschoolers’ behavior problems. European Early Childhood Education Research Journal, 26(5), 701-717. https://doi.org/10.1080/1350293X.2018.1522743
Miller, W. E. (2011). A latent class method for the selection of prototypes using expert ratings. Statistics in Medicine, 31(1), 80-92.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
Raykov, T., Dimitrov, D. M., von Eye, A. & Marcoulides, G. A. (2013). Interrater Agreement Evaluation: a latent variable modeling approach. Educational and Psychological Measurement, 20(10). 1-20. https://doi.org/10.1177/0013164412449016
Schuster, C. & Smith, D. A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 7(3), 384-395. https://doi.org/10.1037/1082-989X.7.3.384
Sertdemir, Y., Burgut, H. R., Alparslan, Z. N., Unal, I., & Gunasti, S. (2013). Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data. Journal of Applied Statistics, 40(7), 1506-1519. https://doi.org/10.1080/02664763.2013.788617
Shaffer, D., Schwab-Stone, M., Fisher, P., Cohen, P., Placentini, J., Davies, M. & Regier, D. (1993). The diagnostic interview schedule for children-revised version (DISC-R): I. Preparation, field testing, interrater reliability, and acceptability. Journal of the American Academy of Child & Adolescent Psychiatry, 32(3), 643-650. https://doi.org/10.1097/00004583-199305000-00023
Tanner, M. A. & Young, M. A. (1985). Modelling agreemet among raters. Journal of the American Statistical Association, 80(389), 175-180. https://doi.org/10.1080/01621459.1985.10477157
Thompson, D. M. (2003). Comparing SAS-based applications of latent class analysis using simulated patient classification data. The University of Oklahoma Health Sciences Center.
Uebersax, J. S. & Grove, W. M. (1990). Latent class analysis of diagnostic agreement. Statistisc in Medicine, 9(5), 559-572. https://doi.org/10.1002/sim.4780090509
Viera, A. J. & Garrett, J. M. (2005). Understanding interobserver agreement: the kappa statistics, Family Medicine, 37(5), 360-363. PMID: 15883903
Von Eye, A. & Mun, E. Y. (2005). Analyzing rater agreement manifest variable methods (1st ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410611024
Yarnold, P. R. (2016). ODA vs. π and κ: paradoxes of kappa, Optimal Data Analysis, 5, 160-161. Accessed at: https://www.researchgate.net/publication/309681250_ODA_vs_p_and_k_Paradoxes_of_Kappa, 23.03.2023
Yilmaz, A. E. & Saracbasi, T. (2017). Assessing agreement between raters from the point of coefficients and log-linear models. Journal of Data Science, 15, 1-24. https://doi.org/10.6339/JDS.201701_15(1).0001
Yilmaz, A. E. & Saracbasi, T. (2019). Agreement and adjusted degree of distinguishability for square contingency tables. Hacettepe Journal of Mathematics and Statistics, 48(2), 592-604. https://doi.org/10.15672/hjms.2018.620
Zapf,A.,Castell, S., Morawietz, L., & Karch, A. (2016). Measuring inter-rater reliability for nominal data–which coefficients and confidence intervals are appropriate?. BMC medical research methodology, 16, 1-10. https://doi.org/10.1186/s12874-016-0200-9

Year 2023, Volume: 14 Issue: 4, 492 - 507, 31.12.2023

Ömer Emre Can Alagöz , Yılmaz Orhun Gürlük , Mediha Kormaz , Gizem Cömert

https://doi.org/10.21031/epod.1308732

Abstract

Project Number

2277

References

Agresti, A. (1992). Modelling patterns of agreement and disagreement. Statistical Methods in Medical Research, 1, 201-218. https://doi.org/10.1177/096228029200100205
Ato, M., López, J. J., & Benavente, A. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402.
Basten, M., Tienmeier H., Althoff, R., van de Schoot, R., Jaddoe, V. W. V., Hofman, A., Hudziak, J. J., Verhulst, F. C. & Van der Ende, J. (2015). The stability of problem behavior across the preschool years: an empirical approach in general population, Journal of Abnormal Child Psychology, 44(2), 393-404. https://doi.org/10.1007/s10802-015-9993-y
Bıkmaz Bilgen, Ö. ve Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması, Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Cohen (1960). A coefficient of rater agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
De Los Reyes, A., Henry, D. B., Tolan, P. H. T. & Wakschlag, L. S. (2009). Linking informant discrepancies to observed variations in young children’s disruptive behavior, Journal of Abnormal Psychology, 37(5), 637-652. https://doi.org/10.1007/s10802-009-9307-3
Feinstein, A. R. & Cicchetti, D. V. (1990). High agreement but low kappa: I. the problems of two paradoxes, Journal of Clinical Epidemiology, 43(6), 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
Fleiss, J. L. (1971). Measuring agreement for multinomial data. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
Forster, A. J., O'Rourke, K., Shojania, K. G., & van Walraven, C. (2007). Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. Journal of clinical epidemiology, 60(9), 892-901.
Gisev, N., Simon Bell, J. & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9, 330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
Göktaş, A. & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Metodoloski Zvezki, 8(1), 17-37. https://doi.org/10.51936/milh5641
Hallgren, K. (2012). Computimg inter-rater reliability for observational data: an overview and tutorial, Tutorials in Quantitative Methods for Psychology, 8(1), 23-34. https://doi.org/10.20982/tqmp.08.1.p023
Hayes, A. F. & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding, Communication Methods and Measures, 1(1), 77-89. https://doi.org/10.1080/19312450709336664
Jiang, Z. (2019). Using the iterative latent-class analysis approach to improve attribute accuracy in diagnostis classification models. Behavior Research Method, 51, 1075-1084. https://doi.org/10.3758/s13428-018-01191-0
Kızıltepe R., Eslek, D., Yılmaz Irmak, T. & Güngör, D. (2022). I am learning to protect myself with Mika:” a teacher-based child sexual abuse prevention program in Turkey. Journal of Interpersonal Violence, 37(11-12), 1-25. https://doi.org/10.1177/0886260520986272
Konstantinidis, M., Le, L. W., & Gao, X. (2022). An empirical comparative assessment of inter-rater agreement of binary outcomes and multiple raters. Symmetry, 14(2), 262. https://doi.org/10.3390/sym14020262
Kottner, J.,Audige, L., Brorson, S., Donner, A., Gajewski, B., Hrobjartsson, A., Roberts, C., Shoukri, M. & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, 96-106. https://doi.org/10.1016/j.ijnurstu.2011.01.016
Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310
Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking about the same person here?” Interrater agreement in judgments of personality varies dramatically with how much the perceivers like the targets. Social Psychological and Personality Science, 4(4), 468-474. https://doi.org/10.1177/1948550612462414
Major, S., Seabra-Santos, M. J. & Martin, R. P. (2018). Latent profile analysis: another approach to look at parent-teacher agreement on preschoolers’ behavior problems. European Early Childhood Education Research Journal, 26(5), 701-717. https://doi.org/10.1080/1350293X.2018.1522743
Miller, W. E. (2011). A latent class method for the selection of prototypes using expert ratings. Statistics in Medicine, 31(1), 80-92.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
Raykov, T., Dimitrov, D. M., von Eye, A. & Marcoulides, G. A. (2013). Interrater Agreement Evaluation: a latent variable modeling approach. Educational and Psychological Measurement, 20(10). 1-20. https://doi.org/10.1177/0013164412449016
Schuster, C. & Smith, D. A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 7(3), 384-395. https://doi.org/10.1037/1082-989X.7.3.384
Sertdemir, Y., Burgut, H. R., Alparslan, Z. N., Unal, I., & Gunasti, S. (2013). Comparing the methods of measuring multi-rater agreement on an ordinal rating scale: a simulation study with an application to real data. Journal of Applied Statistics, 40(7), 1506-1519. https://doi.org/10.1080/02664763.2013.788617
Shaffer, D., Schwab-Stone, M., Fisher, P., Cohen, P., Placentini, J., Davies, M. & Regier, D. (1993). The diagnostic interview schedule for children-revised version (DISC-R): I. Preparation, field testing, interrater reliability, and acceptability. Journal of the American Academy of Child & Adolescent Psychiatry, 32(3), 643-650. https://doi.org/10.1097/00004583-199305000-00023
Tanner, M. A. & Young, M. A. (1985). Modelling agreemet among raters. Journal of the American Statistical Association, 80(389), 175-180. https://doi.org/10.1080/01621459.1985.10477157
Thompson, D. M. (2003). Comparing SAS-based applications of latent class analysis using simulated patient classification data. The University of Oklahoma Health Sciences Center.
Uebersax, J. S. & Grove, W. M. (1990). Latent class analysis of diagnostic agreement. Statistisc in Medicine, 9(5), 559-572. https://doi.org/10.1002/sim.4780090509
Viera, A. J. & Garrett, J. M. (2005). Understanding interobserver agreement: the kappa statistics, Family Medicine, 37(5), 360-363. PMID: 15883903
Von Eye, A. & Mun, E. Y. (2005). Analyzing rater agreement manifest variable methods (1st ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410611024
Yarnold, P. R. (2016). ODA vs. π and κ: paradoxes of kappa, Optimal Data Analysis, 5, 160-161. Accessed at: https://www.researchgate.net/publication/309681250_ODA_vs_p_and_k_Paradoxes_of_Kappa, 23.03.2023
Yilmaz, A. E. & Saracbasi, T. (2017). Assessing agreement between raters from the point of coefficients and log-linear models. Journal of Data Science, 15, 1-24. https://doi.org/10.6339/JDS.201701_15(1).0001
Yilmaz, A. E. & Saracbasi, T. (2019). Agreement and adjusted degree of distinguishability for square contingency tables. Hacettepe Journal of Mathematics and Statistics, 48(2), 592-604. https://doi.org/10.15672/hjms.2018.620
Zapf,A.,Castell, S., Morawietz, L., & Karch, A. (2016). Measuring inter-rater reliability for nominal data–which coefficients and confidence intervals are appropriate?. BMC medical research methodology, 16, 1-10. https://doi.org/10.1186/s12874-016-0200-9

There are 35 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Ömer Emre Can Alagöz 0000-0003-3305-6564 Yılmaz Orhun Gürlük 0000-0002-1134-3776 Mediha Kormaz 0000-0001-6504-5822 Gizem Cömert 0000-0001-7555-6378
Project Number	2277
Publication Date	December 31, 2023
Acceptance Date	November 23, 2023
Published in Issue	Year 2023 Volume: 14 Issue: 4

Cite

APA	Alagöz, Ö. E. C., Gürlük, Y. O., Kormaz, M., Cömert, G. (2023). An Illustration of a Latent Class Analysis for Interrater Agreement: Identifying Subpopulations with Different Agreement Levels. Journal of Measurement and Evaluation in Education and Psychology, 14(4), 492-507. https://doi.org/10.21031/epod.1308732

Download Cover Image

Article Files

Full Text