Research Article
BibTex RIS Cite

Impact of aberrant responses on Item Response Theory based model estimations

Year 2021, , 1024 - 1033, 31.12.2021
https://doi.org/10.24106/kefdergi.836241

Abstract

Score validity can be examined at both the score level and the individual level because the test score is not only a function of the items or stimuli, but is also influenced by the respondent's specifications. It is the responsibility of the test user to identify individuals who do not fit the basic model or who respond differently from the rest of the sample group. Checking the validity of the test results at the individual level can be done through a person-fit analysis. Misfit individuals can bias model results at both the test and item levels. Given the importance of detecting aberrant responses, the purpose of this study was to examine the effect of aberrant responses on item response theory-based model estimates. This study is a descriptive research and simulated data were used. For this purpose, data were collected from 1104 university students enrolled in 8 different universities in Turkey using Generalized Anxiety Disorder -7 scale. After parameter estimation based on the item response theory model, 100 different datasets were simulated using the item and person parameters obtained from these estimations. By this way, it was aimed to increase the generalizability of the findings obtained. The data were analyzed with R program using "PerFit" and "mirt" packages. Misfit persons were identified with Lz, U3, G and norm-based G person fit statistics. The findings showed that misfit persons had an effect on the model fit statistics, item fit statistics, item discrimination values, the amount of information provided by the items, the total amount of information provided by the scale, and empirical reliability levels across different levels of ability trait. In addition, in order to improve the results based on the item response theory, it was observed that removing the misfit persons detected based on the Lz technique from the dataset was the least effective among the existing techniques. On the other hand, G fit statistic has been identified as the most effective technique. The obtained results should be interpreted with caution because the simulated data was used in this study which are based on parameters representing the dataset collected with a measurement tool aimed at measuring anxiety, and these results may not be generalizable to the measurement of different traits.

References

  • American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
  • American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders (4th ed., Text Revision). Washington, DC.
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: https://doi.org/10.18637/jss.v048.i06
  • Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement, 41(3), 178-194. doi: https://doi.org/10.1177/0146621616677520
  • Conijn, J. M., Emons, W. H., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. doi: https://doi.org/10.1177/0146621613497568
  • Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010). Screening for atypical suicide risk with person fit statistics among people presenting to alcohol and other drug treatment. Drug and Alcohol Dependence, 106(2-3), 92-100. doi: https://doi.org/10.1016/j.drugalcdep.2009.07.023
  • Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (p. 577–636). Consulting Psychologists Press.
  • Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. doi: https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
  • Emons, W. H. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224-247. doi: https://doi.org/10.1177/0146621607302479
  • Engelhard Jr, G. (2009). Using item response theory and model—data fit to conceptualize differential item and person functioning for students with disabilities. Educational and Psychological Measurement, 69(4), 585-602. 10.1177/0013164408323240
  • Gorsuch, R. L. (2003). Factor analysis. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology, Vol. 2 (p. 143–164). John Wiley & Sons Inc.
  • Guttman, L. (1944). A basis for scaling qualitative data. American sociological review, 9(2), 139-150. doi: https://doi.org/10.2307/2086306
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage publications.
  • Jordan, P., Shedden-Mora, M. C., & Löwe, B. (2017). Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory. PloS one, 12(8), e0182162.
  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. doi: https://doi.org/10.1207/S15324818AME1604_2
  • Karasar, N. (2005). Bilimsel araştırma yöntemi. Nobel Yayın Dağıtım
  • Kline, R. B. (2015). Principles and practice of structural equation modeling. New York: Guilford publications.
  • Konkan, R., ŞENORMANCIŞenormancı, Ö., Güçlü, O., Aydin, E., & Sungur, M. Z. (2013). Yaygın Anksiyete Bozukluğu-7 (YAB-7) Testi Türkçe Uyarlaması, Geçerlik ve Güvenirliği. Archives of Neuropsychiatry/Noropsikiatri Arsivi, 50(1), 53-59. doi: https://doi.org/10.4274/npa.y6308
  • Liu, T., Sun, Y., Li, Z., & Xin, T. (2019). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. doi: https://doi.org/10.1080/15366367.2019.1584848
  • Meijer, R. R. (1996). Person-fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R. R., & Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21(4), 321-336. doi: https://doi.org/10.1177/01466216970214003
  • Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied psychological measurement, 25(2), 107-135. doi: https://doi.org/10.1177/01466210122031957
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741. doi: https://doi.org/10.1037/0003-066X.50.9.741
  • Miguel, J. P., Silva, J. T., & Prieto, G. (2013). Career decision self-efficacy scale—short form: a Rasch analysis of the Portuguese version. Journal of Vocational Behavior, 82(2), 116-123. https://doi.org/10.1016/j.jvb.2012.12.001
  • Molenaar, I. W. (1997). Nonparametric Models for Polytomous Responses. In W.J. van der Linden & R.K.
  • Hambleton (Eds.), Handbook of modern item response theory, 369-380. Springer.
  • Morizot J., Ainsworth A.T., & Krueger S.P. (2009). Toward modern psychometrics: Application of item response theory models in personality research: In Robins R.W., Fraley R.C., Krueger RF (editorsEds.). Handbook of Research Methods in Personality Psychology. New York: Guilford Press.
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30. doi: https://doi.org/10.1177/014662169201600206
  • Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003
  • R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of educational statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207
  • Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of statistical software, 48(2), 1-36. doi: https://doi.org/10.18637/jss.v048.i02
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. Psychometrika, 34: 1-97. doi: https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
  • Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23(1), 41-53. Doi: Https://doi.org/10.1177/01466219922031176
  • Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). Sage publications.
  • Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of internal medicine, 166(10), 1092-1097. doi: https://doi.org/10.1001/archinte.166.10.1092
  • Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259. doi: https://doi.org/10.1111/jedm.12046
  • Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1-27. doi: https://doi.org/10.18637/jss.v074.i05
  • Van Der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267-298. doi: https://doi.org/10.1177/0022002182013003001
  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi: https://doi.org/doi.org/10.1177/014662168400800201

Normal olmayan yanıtların Madde Tepki Kuramına dayalı model kestirimleri üzerindeki etkisi

Year 2021, , 1024 - 1033, 31.12.2021
https://doi.org/10.24106/kefdergi.836241

Abstract

Puan geçerliliği hem puan düzeyinde hem de kişi düzeyde incelenebilir çünkü test puanları yalnızca maddelerin veya uyaranın bir işlevi değil, aynı zamanda yanıt veren bireylerin özelliklerinden de etkilenmektedir. Temel modele uymayan veya örneklem grubunun geri kalanına kıyasla farklı tepkiler veren bireyleri belirlemek test kullanıcılarının sorumluluğundadır. Bireysel düzeyde test puanı geçerliliğini kontrol etmek, kişi uyum analizi ile yapılabilir. Uyumsuz bireyler, model sonuçlarını hem test hem de madde düzeyinde bozabilir. Anormal yanıtların tespit edilmesinin önemi göz önünde alındığında, gerçekleştirilen bu çalışmanın amacı anormal yanıtın madde madde tepki kuramına dayalı kestirimler üzerindeki etkisinin incelenmesi olarak belirlenmiştir. Gerçekleştirilen bu çalışma betimsel araştırmadır ve türetilmiş veriler kullanılmıştır. Bu amaçla Türkiye genelinde 8 farklı üniversiteye kayıtlı 1104 üniversite öğrencisinden Yaygın Kaygı Bozukluğu-7 Ölçeği kullanılarak toplanmış ve madde tepki kuramı modeline dayalı parametre kestirimleri gerçekleştirildikten sonra elde edilen madde ve kişi parametreleri kullanılarak 100 adet very seti türetilmiştir. Bu sayede elde edilen bulguların genellenebilirliğinin arttırılması amaçlanmıştır. Veriler R ortamında “perfit” ve “mirt” paketleri kullanılarak analiz edilmiştir. Uyumsuz kişiler Lz, U3, G ve norma dayalı G kişi uyumu istatistikleri ile tespit edilmiştir. Elde edilen bulgular, uyumsuz kişilerin model uyumu istatistikleri, madde uyumu istatistikleri, madde ayırt edicilik değerleri, maddeler tarafından sağlanan bilgi miktarı, ölçeğin verdiği toplam bilgi miktarı ve kaygı özelliğinin farklı düzeyleri boyunca görgül güvenilirlik düzeyi üzerinde etkisi olduğunu göstermiştir. Ayrıca, madde tepki kuramına dayalı sonuçları iyileştirmek için lzpoly tekniğine dayalı belirlenen uyum göstermeyen bireyleri very setinden uzaklaştırmanın mevcut teknikler içierisinde en az etkilisi olduğu görülmüştür. Diğer taraftan, G istatistiği ise en etkili teknik olarak belirlenmiştir. Elde edilen sonuçlar dikkatle yorumlanmalıdır çünkü gerçekleştirilen bu çalışmada kullanılan türetilmiş veriler kaygının ölçümünü amaçlayan bir ölçüm aracı ile elde edilen veri setini temsil eden parametrelere dayalıdır ve elde edilen sonuçlar farklı özelliklerin ölçümüne genellenemeyebilir.

References

  • American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
  • American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders (4th ed., Text Revision). Washington, DC.
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: https://doi.org/10.18637/jss.v048.i06
  • Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement, 41(3), 178-194. doi: https://doi.org/10.1177/0146621616677520
  • Conijn, J. M., Emons, W. H., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. doi: https://doi.org/10.1177/0146621613497568
  • Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010). Screening for atypical suicide risk with person fit statistics among people presenting to alcohol and other drug treatment. Drug and Alcohol Dependence, 106(2-3), 92-100. doi: https://doi.org/10.1016/j.drugalcdep.2009.07.023
  • Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (p. 577–636). Consulting Psychologists Press.
  • Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. doi: https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
  • Emons, W. H. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224-247. doi: https://doi.org/10.1177/0146621607302479
  • Engelhard Jr, G. (2009). Using item response theory and model—data fit to conceptualize differential item and person functioning for students with disabilities. Educational and Psychological Measurement, 69(4), 585-602. 10.1177/0013164408323240
  • Gorsuch, R. L. (2003). Factor analysis. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology, Vol. 2 (p. 143–164). John Wiley & Sons Inc.
  • Guttman, L. (1944). A basis for scaling qualitative data. American sociological review, 9(2), 139-150. doi: https://doi.org/10.2307/2086306
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage publications.
  • Jordan, P., Shedden-Mora, M. C., & Löwe, B. (2017). Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory. PloS one, 12(8), e0182162.
  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. doi: https://doi.org/10.1207/S15324818AME1604_2
  • Karasar, N. (2005). Bilimsel araştırma yöntemi. Nobel Yayın Dağıtım
  • Kline, R. B. (2015). Principles and practice of structural equation modeling. New York: Guilford publications.
  • Konkan, R., ŞENORMANCIŞenormancı, Ö., Güçlü, O., Aydin, E., & Sungur, M. Z. (2013). Yaygın Anksiyete Bozukluğu-7 (YAB-7) Testi Türkçe Uyarlaması, Geçerlik ve Güvenirliği. Archives of Neuropsychiatry/Noropsikiatri Arsivi, 50(1), 53-59. doi: https://doi.org/10.4274/npa.y6308
  • Liu, T., Sun, Y., Li, Z., & Xin, T. (2019). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. doi: https://doi.org/10.1080/15366367.2019.1584848
  • Meijer, R. R. (1996). Person-fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R. R., & Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21(4), 321-336. doi: https://doi.org/10.1177/01466216970214003
  • Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied psychological measurement, 25(2), 107-135. doi: https://doi.org/10.1177/01466210122031957
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741. doi: https://doi.org/10.1037/0003-066X.50.9.741
  • Miguel, J. P., Silva, J. T., & Prieto, G. (2013). Career decision self-efficacy scale—short form: a Rasch analysis of the Portuguese version. Journal of Vocational Behavior, 82(2), 116-123. https://doi.org/10.1016/j.jvb.2012.12.001
  • Molenaar, I. W. (1997). Nonparametric Models for Polytomous Responses. In W.J. van der Linden & R.K.
  • Hambleton (Eds.), Handbook of modern item response theory, 369-380. Springer.
  • Morizot J., Ainsworth A.T., & Krueger S.P. (2009). Toward modern psychometrics: Application of item response theory models in personality research: In Robins R.W., Fraley R.C., Krueger RF (editorsEds.). Handbook of Research Methods in Personality Psychology. New York: Guilford Press.
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30. doi: https://doi.org/10.1177/014662169201600206
  • Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003
  • R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of educational statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207
  • Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of statistical software, 48(2), 1-36. doi: https://doi.org/10.18637/jss.v048.i02
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. Psychometrika, 34: 1-97. doi: https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
  • Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23(1), 41-53. Doi: Https://doi.org/10.1177/01466219922031176
  • Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). Sage publications.
  • Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of internal medicine, 166(10), 1092-1097. doi: https://doi.org/10.1001/archinte.166.10.1092
  • Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259. doi: https://doi.org/10.1111/jedm.12046
  • Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1-27. doi: https://doi.org/10.18637/jss.v074.i05
  • Van Der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267-298. doi: https://doi.org/10.1177/0022002182013003001
  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi: https://doi.org/doi.org/10.1177/014662168400800201
There are 41 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Research Article
Authors

Akif Avcu 0000-0003-1977-7592

Publication Date December 31, 2021
Acceptance Date June 4, 2021
Published in Issue Year 2021

Cite

APA Avcu, A. (2021). Impact of aberrant responses on Item Response Theory based model estimations. Kastamonu Education Journal, 29(5), 1024-1033. https://doi.org/10.24106/kefdergi.836241

10037