Impact of aberrant responses on Item Response Theory based model estimations

Akif Avcu

doi:10.24106/kefdergi.836241

Araştırma Makalesi

Impact of aberrant responses on Item Response Theory based model estimations

Yıl 2021, Cilt: 29 Sayı: 5, 1024 - 1033, 31.12.2021

Akif Avcu

https://doi.org/10.24106/kefdergi.836241

Öz

Score validity can be examined at both the score level and the individual level because the test score is not only a function of the items or stimuli, but is also influenced by the respondent's specifications. It is the responsibility of the test user to identify individuals who do not fit the basic model or who respond differently from the rest of the sample group. Checking the validity of the test results at the individual level can be done through a person-fit analysis. Misfit individuals can bias model results at both the test and item levels. Given the importance of detecting aberrant responses, the purpose of this study was to examine the effect of aberrant responses on item response theory-based model estimates. This study is a descriptive research and simulated data were used. For this purpose, data were collected from 1104 university students enrolled in 8 different universities in Turkey using Generalized Anxiety Disorder -7 scale. After parameter estimation based on the item response theory model, 100 different datasets were simulated using the item and person parameters obtained from these estimations. By this way, it was aimed to increase the generalizability of the findings obtained. The data were analyzed with R program using "PerFit" and "mirt" packages. Misfit persons were identified with Lz, U3, G and norm-based G person fit statistics. The findings showed that misfit persons had an effect on the model fit statistics, item fit statistics, item discrimination values, the amount of information provided by the items, the total amount of information provided by the scale, and empirical reliability levels across different levels of ability trait. In addition, in order to improve the results based on the item response theory, it was observed that removing the misfit persons detected based on the Lz technique from the dataset was the least effective among the existing techniques. On the other hand, G fit statistic has been identified as the most effective technique. The obtained results should be interpreted with caution because the simulated data was used in this study which are based on parameters representing the dataset collected with a measurement tool aimed at measuring anxiety, and these results may not be generalizable to the measurement of different traits.

Anahtar Kelimeler

person fit

Kaynakça

American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders (4th ed., Text Revision). Washington, DC.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: https://doi.org/10.18637/jss.v048.i06
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement, 41(3), 178-194. doi: https://doi.org/10.1177/0146621616677520
Conijn, J. M., Emons, W. H., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. doi: https://doi.org/10.1177/0146621613497568
Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010). Screening for atypical suicide risk with person fit statistics among people presenting to alcohol and other drug treatment. Drug and Alcohol Dependence, 106(2-3), 92-100. doi: https://doi.org/10.1016/j.drugalcdep.2009.07.023
Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (p. 577–636). Consulting Psychologists Press.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. doi: https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Emons, W. H. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224-247. doi: https://doi.org/10.1177/0146621607302479
Engelhard Jr, G. (2009). Using item response theory and model—data fit to conceptualize differential item and person functioning for students with disabilities. Educational and Psychological Measurement, 69(4), 585-602. 10.1177/0013164408323240
Gorsuch, R. L. (2003). Factor analysis. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology, Vol. 2 (p. 143–164). John Wiley & Sons Inc.
Guttman, L. (1944). A basis for scaling qualitative data. American sociological review, 9(2), 139-150. doi: https://doi.org/10.2307/2086306
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage publications.
Jordan, P., Shedden-Mora, M. C., & Löwe, B. (2017). Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory. PloS one, 12(8), e0182162.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. doi: https://doi.org/10.1207/S15324818AME1604_2
Karasar, N. (2005). Bilimsel araştırma yöntemi. Nobel Yayın Dağıtım
Kline, R. B. (2015). Principles and practice of structural equation modeling. New York: Guilford publications.
Konkan, R., ŞENORMANCIŞenormancı, Ö., Güçlü, O., Aydin, E., & Sungur, M. Z. (2013). Yaygın Anksiyete Bozukluğu-7 (YAB-7) Testi Türkçe Uyarlaması, Geçerlik ve Güvenirliği. Archives of Neuropsychiatry/Noropsikiatri Arsivi, 50(1), 53-59. doi: https://doi.org/10.4274/npa.y6308
Liu, T., Sun, Y., Li, Z., & Xin, T. (2019). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. doi: https://doi.org/10.1080/15366367.2019.1584848
Meijer, R. R. (1996). Person-fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
Meijer, R. R., & Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21(4), 321-336. doi: https://doi.org/10.1177/01466216970214003
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied psychological measurement, 25(2), 107-135. doi: https://doi.org/10.1177/01466210122031957
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741. doi: https://doi.org/10.1037/0003-066X.50.9.741
Miguel, J. P., Silva, J. T., & Prieto, G. (2013). Career decision self-efficacy scale—short form: a Rasch analysis of the Portuguese version. Journal of Vocational Behavior, 82(2), 116-123. https://doi.org/10.1016/j.jvb.2012.12.001
Molenaar, I. W. (1997). Nonparametric Models for Polytomous Responses. In W.J. van der Linden & R.K.
Hambleton (Eds.), Handbook of modern item response theory, 369-380. Springer.
Morizot J., Ainsworth A.T., & Krueger S.P. (2009). Toward modern psychometrics: Application of item response theory models in personality research: In Robins R.W., Fraley R.C., Krueger RF (editorsEds.). Handbook of Research Methods in Personality Psychology. New York: Guilford Press.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30. doi: https://doi.org/10.1177/014662169201600206
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of educational statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of statistical software, 48(2), 1-36. doi: https://doi.org/10.18637/jss.v048.i02
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. Psychometrika, 34: 1-97. doi: https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23(1), 41-53. Doi: Https://doi.org/10.1177/01466219922031176
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). Sage publications.
Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of internal medicine, 166(10), 1092-1097. doi: https://doi.org/10.1001/archinte.166.10.1092
Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259. doi: https://doi.org/10.1111/jedm.12046
Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1-27. doi: https://doi.org/10.18637/jss.v074.i05
Van Der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267-298. doi: https://doi.org/10.1177/0022002182013003001
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi: https://doi.org/doi.org/10.1177/014662168400800201

Normal olmayan yanıtların Madde Tepki Kuramına dayalı model kestirimleri üzerindeki etkisi

Yıl 2021, Cilt: 29 Sayı: 5, 1024 - 1033, 31.12.2021

Akif Avcu

https://doi.org/10.24106/kefdergi.836241

Öz

Puan geçerliliği hem puan düzeyinde hem de kişi düzeyde incelenebilir çünkü test puanları yalnızca maddelerin veya uyaranın bir işlevi değil, aynı zamanda yanıt veren bireylerin özelliklerinden de etkilenmektedir. Temel modele uymayan veya örneklem grubunun geri kalanına kıyasla farklı tepkiler veren bireyleri belirlemek test kullanıcılarının sorumluluğundadır. Bireysel düzeyde test puanı geçerliliğini kontrol etmek, kişi uyum analizi ile yapılabilir. Uyumsuz bireyler, model sonuçlarını hem test hem de madde düzeyinde bozabilir. Anormal yanıtların tespit edilmesinin önemi göz önünde alındığında, gerçekleştirilen bu çalışmanın amacı anormal yanıtın madde madde tepki kuramına dayalı kestirimler üzerindeki etkisinin incelenmesi olarak belirlenmiştir. Gerçekleştirilen bu çalışma betimsel araştırmadır ve türetilmiş veriler kullanılmıştır. Bu amaçla Türkiye genelinde 8 farklı üniversiteye kayıtlı 1104 üniversite öğrencisinden Yaygın Kaygı Bozukluğu-7 Ölçeği kullanılarak toplanmış ve madde tepki kuramı modeline dayalı parametre kestirimleri gerçekleştirildikten sonra elde edilen madde ve kişi parametreleri kullanılarak 100 adet very seti türetilmiştir. Bu sayede elde edilen bulguların genellenebilirliğinin arttırılması amaçlanmıştır. Veriler R ortamında “perfit” ve “mirt” paketleri kullanılarak analiz edilmiştir. Uyumsuz kişiler Lz, U3, G ve norma dayalı G kişi uyumu istatistikleri ile tespit edilmiştir. Elde edilen bulgular, uyumsuz kişilerin model uyumu istatistikleri, madde uyumu istatistikleri, madde ayırt edicilik değerleri, maddeler tarafından sağlanan bilgi miktarı, ölçeğin verdiği toplam bilgi miktarı ve kaygı özelliğinin farklı düzeyleri boyunca görgül güvenilirlik düzeyi üzerinde etkisi olduğunu göstermiştir. Ayrıca, madde tepki kuramına dayalı sonuçları iyileştirmek için lzpoly tekniğine dayalı belirlenen uyum göstermeyen bireyleri very setinden uzaklaştırmanın mevcut teknikler içierisinde en az etkilisi olduğu görülmüştür. Diğer taraftan, G istatistiği ise en etkili teknik olarak belirlenmiştir. Elde edilen sonuçlar dikkatle yorumlanmalıdır çünkü gerçekleştirilen bu çalışmada kullanılan türetilmiş veriler kaygının ölçümünü amaçlayan bir ölçüm aracı ile elde edilen veri setini temsil eden parametrelere dayalıdır ve elde edilen sonuçlar farklı özelliklerin ölçümüne genellenemeyebilir.

Anahtar Kelimeler

person fit, item response theory, university students, generalized anxiety, R program

Kaynakça

American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders (4th ed., Text Revision). Washington, DC.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: https://doi.org/10.18637/jss.v048.i06
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement, 41(3), 178-194. doi: https://doi.org/10.1177/0146621616677520
Conijn, J. M., Emons, W. H., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. doi: https://doi.org/10.1177/0146621613497568
Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010). Screening for atypical suicide risk with person fit statistics among people presenting to alcohol and other drug treatment. Drug and Alcohol Dependence, 106(2-3), 92-100. doi: https://doi.org/10.1016/j.drugalcdep.2009.07.023
Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (p. 577–636). Consulting Psychologists Press.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. doi: https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Emons, W. H. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224-247. doi: https://doi.org/10.1177/0146621607302479
Engelhard Jr, G. (2009). Using item response theory and model—data fit to conceptualize differential item and person functioning for students with disabilities. Educational and Psychological Measurement, 69(4), 585-602. 10.1177/0013164408323240
Gorsuch, R. L. (2003). Factor analysis. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology, Vol. 2 (p. 143–164). John Wiley & Sons Inc.
Guttman, L. (1944). A basis for scaling qualitative data. American sociological review, 9(2), 139-150. doi: https://doi.org/10.2307/2086306
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage publications.
Jordan, P., Shedden-Mora, M. C., & Löwe, B. (2017). Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory. PloS one, 12(8), e0182162.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. doi: https://doi.org/10.1207/S15324818AME1604_2
Karasar, N. (2005). Bilimsel araştırma yöntemi. Nobel Yayın Dağıtım
Kline, R. B. (2015). Principles and practice of structural equation modeling. New York: Guilford publications.
Konkan, R., ŞENORMANCIŞenormancı, Ö., Güçlü, O., Aydin, E., & Sungur, M. Z. (2013). Yaygın Anksiyete Bozukluğu-7 (YAB-7) Testi Türkçe Uyarlaması, Geçerlik ve Güvenirliği. Archives of Neuropsychiatry/Noropsikiatri Arsivi, 50(1), 53-59. doi: https://doi.org/10.4274/npa.y6308
Liu, T., Sun, Y., Li, Z., & Xin, T. (2019). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. doi: https://doi.org/10.1080/15366367.2019.1584848
Meijer, R. R. (1996). Person-fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
Meijer, R. R., & Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21(4), 321-336. doi: https://doi.org/10.1177/01466216970214003
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied psychological measurement, 25(2), 107-135. doi: https://doi.org/10.1177/01466210122031957
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741. doi: https://doi.org/10.1037/0003-066X.50.9.741
Miguel, J. P., Silva, J. T., & Prieto, G. (2013). Career decision self-efficacy scale—short form: a Rasch analysis of the Portuguese version. Journal of Vocational Behavior, 82(2), 116-123. https://doi.org/10.1016/j.jvb.2012.12.001
Molenaar, I. W. (1997). Nonparametric Models for Polytomous Responses. In W.J. van der Linden & R.K.
Hambleton (Eds.), Handbook of modern item response theory, 369-380. Springer.
Morizot J., Ainsworth A.T., & Krueger S.P. (2009). Toward modern psychometrics: Application of item response theory models in personality research: In Robins R.W., Fraley R.C., Krueger RF (editorsEds.). Handbook of Research Methods in Personality Psychology. New York: Guilford Press.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30. doi: https://doi.org/10.1177/014662169201600206
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of educational statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of statistical software, 48(2), 1-36. doi: https://doi.org/10.18637/jss.v048.i02
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. Psychometrika, 34: 1-97. doi: https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23(1), 41-53. Doi: Https://doi.org/10.1177/01466219922031176
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). Sage publications.
Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of internal medicine, 166(10), 1092-1097. doi: https://doi.org/10.1001/archinte.166.10.1092
Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259. doi: https://doi.org/10.1111/jedm.12046
Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1-27. doi: https://doi.org/10.18637/jss.v074.i05
Van Der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267-298. doi: https://doi.org/10.1177/0022002182013003001
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi: https://doi.org/doi.org/10.1177/014662168400800201

Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitim Üzerine Çalışmalar
Bölüm	Research Article
Yazarlar	Akif Avcu 0000-0003-1977-7592
Yayımlanma Tarihi	31 Aralık 2021
Kabul Tarihi	4 Haziran 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 29 Sayı: 5

Kaynak Göster

APA	Avcu, A. (2021). Impact of aberrant responses on Item Response Theory based model estimations. Kastamonu Education Journal, 29(5), 1024-1033. https://doi.org/10.24106/kefdergi.836241

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

10037