Araştırma Makalesi
BibTex RIS Kaynak Göster

Comparison Generalized Item Location Indices for Polytomous Items: A Monte Carlo Simulation Study

Yıl 2025, Cilt: 58 Sayı: 3, 1197 - 1228, 15.12.2025
https://doi.org/10.30964/auebfd.1675814

Öz

Generalized Item Location Indices (GILI) is a method that deals with the process of developing or selecting polytomous items based on a single value. This method converts multiple location indices obtained for polytomous items into a single indices. The purpose of this research is to examine the performance of GILI under different simulation conditions. For the study, how three different GILI (LImean -LImedian-LIIRF) change according to the number of categories (3, 5 and 7), location parameter (-2, -1, 0, 1 and 2) and sample size (200, 500 and 1000) were compared with a monte carlo simulation. According to the results, the LImedian was estimated with the highest error in all conditions. On the other hand, LImean and LIIRF produce similar error amounts for all conditions. Although LImean and LIIRF produce similar results at -1, 0 and +1 location levels, LImean makes more accurate predictions at -2 and +2 location levels. It was concluded that as the number of categories increases, the amount of error calculated in small samples increases. LImean -LImedian-LIIRF values, which are matched with the individual's ability in tests developed for different purposes and CAT applications, can be a good parameter for determining which item to choose next during the administration of the test. As a result, the fact that the proposed method is easier and faster will facilitate the practitioners in the item selection process.

Etik Beyan

There is no need for an ethical declaration

Kaynakça

  • Ali, U. S. (2011). Item selection methods in polytomous computerized adaptive testing (Unpublished doctoral dissertation). University of Illinois, Urbana, IL.
  • Ali, U. S., Chang, H.-H., & Anderson, C. J. (2015). Location indices for ordinal polytomous items based on itemresponse theory (Research Report No. RR-15-20). Princeton, NJ: Educational Testing Service. http://dx.doi.org/10.1002/ets2.12065
  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 27, 29–51. http://hdl.handle.net/10.1007/BF02291411
  • Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Christensen KB, Comins JD, Krogsgaard MR, et al.(2021) Psychometric validation of PROM instruments. Scand J Med Sci Sport.; 31: 1225–1238. https://doi.org/10.1111/sms.13908
  • DeMars, C. (2010) Item Response Theory: Understanding Statistics Measurement. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780195377033.001.0001
  • Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19(1), 5-22. https://doi.org/10.1177/014662169501900103
  • Dodd, B., Gıllon, G., Oerlemans, M., Russell, T., Syrmıs, M. & Wılson, H., (1995), Phonological disorder and the acquisition of literacy. In B. Dodd (ed.), Differential Diagnosis and Treatment of Children with Speech Disorder (London: Whurr), pp. 125–146.
  • Dubravka S. V. & Shenghai D. (2024) Number of Response Categories and Sample Size Requirements in Polytomous IRT Models, The Journal of Experimental Education, 92:1, 154-185, DOI: 10.1080/00220973.2022.2153783
  • Hambleton RK, Swaminathan H, & Rogers HJ. (1991). Fundamentals of Item Response Theory. Thousand Oaks
  • Hambleton RK, van der Linden WJ, & Wells CS. (2011). IRT models for the analysis of polytomous scored data: Brief and selected history of model building advances. In: Nering ML, Ostini R, editors. Handbook of Polytomous Item Response Theory Models. p. 21–42
  • Han, K.T. (2006). WinGen: Windows software that generates IRT parameters and item responses [Computer software]. Amherst, MA: Center for Educational Assessment, University of Massachusetts. http://people.umass.edu/kha/wingen/
  • Haag DG, Santiago PHR, Macedo DM, Bastos JL, Paradies Y, Jamieson L (2020) Development and initial psychometric assessment of the race-related attitudes and multiculturalism scale in Australia. PLoS ONE 15(4): e0230724. https://doi.org/10.1371/journal.pone.0230724
  • Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125. https://doi.org/10.1177/014662169602000201
  • Jacoby, J. & Matell, M.S. (1971), Three-point likert scales are good enough, Journal of Marketing Research, 9(4), 444-446. DOI:10.1177/002224377100800414
  • Kang, H.-A., Arbet, G., Betts, J., & Muntean, W. (2024). Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items. Applied Psychological Measurement, 48(1-2), 57-76. https://doi.org/10.1177/01466216241227548
  • Kılıç, A. F., Koyuncu, İ, & Uysal, İ. (2023). Scale development based on item response theory: A systematic review. International Journal of Psychology and Educational Studies, 10(1), 209-223. https://dx.doi.org/10.52380/ijpes.2023.10.1.982
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/014662169201600206
  • Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. SAGE.
  • Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15. https://doi.org/10.1016/S0001-6918(99)00050-5
  • R Core Team (2021) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org
  • Saepuzaman, D., Edi Istiyono, & Haryanto. (2022). Characteristics of fundamental physics higher-order thinking skills test using Item Response Theory analysis. Pegem Journal of Education and Instruction, 12(4), 269–279. https://doi.org/10.47750/pegegog.12.04.28
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17, 321–335. http://dx.doi.org/10.12738/estp.2017.1.0270
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika MonographNo. 17). Richmond, VA: Psychometric Society.
  • Santoso, P.H., Setiawati, F.A., Ismail, R. et al. (2023). Comparing IRT properties among different category numbers: a case from attitudinal measurement on physics education research. Discov Psychol 3, 39. https://doi.org/10.1007/s44202-023-00101-6
  • Sigal, M. J. & Chalmers, R. P. (2016). Play it again: Teaching statistics with monte carlo simulation. Journal of Statistics Education, 24(3), 136–156. https://doi.org/10.1080/10691898.2016.1246953
  • Talkkari, A. & Rosenström, T. H. (2024). Non-Gaussian Liability Distribution for Depression in the General Population. Assessment, 0(0). https://doi.org/10.1177/10731911241275327
  • Wright, B. D., & Stone, M. H. (1979). Best Test Design: Rasch Measurement. Chicago, IL: Mesa Press.

Çok Kategorili Maddeler İçin Genelleştirilmiş Madde Konum İndekslerinin Karşılaştırılması: Monte Carlo Simülasyon Çalışması

Yıl 2025, Cilt: 58 Sayı: 3, 1197 - 1228, 15.12.2025
https://doi.org/10.30964/auebfd.1675814

Öz

Genelleştirilmiş madde yer indeksleri (GILI), çok kategorili öğeleri tek bir değere göre geliştirme veya seçme sürecini ele alan bir yöntemdir. Bu yöntem, çok kategorili öğeler için elde edilen birden fazla konum endeksini tek bir endekse dönüştürür. Bu araştırmanın amacı, GILI'nin farklı simülasyon koşulları altında performansını incelemektir. Çalışmada, üç farklı GILI'nin (LImean-LImedian-LIIRF) kategori sayısına (3, 5 ve 7), konum parametresine (-2, -1, 0, 1 ve 2) ve örneklem büyüklüğüne (200, 500 ve 1000) göre nasıl değiştiği Monte Carlo simülasyonuyla karşılaştırılmıştır. Sonuçlara göre, LImedian tüm koşullarda en yüksek hata ile tahmin edilmiştir. Öte yandan, LImean ve LIIRF tüm koşullar için benzer hata miktarları üretmektedir. LImean ve LIIRF -1, 0 ve +1 konum düzeylerinde benzer sonuçlar üretmesine rağmen, LImean -2 ve +2 konum düzeylerinde daha doğru tahminler yapmaktadır. Kategori sayısı arttıkça küçük örneklemlerde hesaplanan hata miktarının arttığı sonucuna varılmıştır. Farklı amaçlar için geliştirilen testlerde ve CAT uygulamalarında bireyin yeteneği ile eşleştirilen LImean -LImedian-LIIRF değerleri, testin uygulanması sırasında hangi maddenin seçileceğini belirlemede iyi bir parametre olabilir. Sonuç olarak önerilen yöntemin daha kolay ve hızlı olması uygulayıcılara madde seçme sürecinde kolaylık sağlayacaktır.

Kaynakça

  • Ali, U. S. (2011). Item selection methods in polytomous computerized adaptive testing (Unpublished doctoral dissertation). University of Illinois, Urbana, IL.
  • Ali, U. S., Chang, H.-H., & Anderson, C. J. (2015). Location indices for ordinal polytomous items based on itemresponse theory (Research Report No. RR-15-20). Princeton, NJ: Educational Testing Service. http://dx.doi.org/10.1002/ets2.12065
  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 27, 29–51. http://hdl.handle.net/10.1007/BF02291411
  • Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Christensen KB, Comins JD, Krogsgaard MR, et al.(2021) Psychometric validation of PROM instruments. Scand J Med Sci Sport.; 31: 1225–1238. https://doi.org/10.1111/sms.13908
  • DeMars, C. (2010) Item Response Theory: Understanding Statistics Measurement. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780195377033.001.0001
  • Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19(1), 5-22. https://doi.org/10.1177/014662169501900103
  • Dodd, B., Gıllon, G., Oerlemans, M., Russell, T., Syrmıs, M. & Wılson, H., (1995), Phonological disorder and the acquisition of literacy. In B. Dodd (ed.), Differential Diagnosis and Treatment of Children with Speech Disorder (London: Whurr), pp. 125–146.
  • Dubravka S. V. & Shenghai D. (2024) Number of Response Categories and Sample Size Requirements in Polytomous IRT Models, The Journal of Experimental Education, 92:1, 154-185, DOI: 10.1080/00220973.2022.2153783
  • Hambleton RK, Swaminathan H, & Rogers HJ. (1991). Fundamentals of Item Response Theory. Thousand Oaks
  • Hambleton RK, van der Linden WJ, & Wells CS. (2011). IRT models for the analysis of polytomous scored data: Brief and selected history of model building advances. In: Nering ML, Ostini R, editors. Handbook of Polytomous Item Response Theory Models. p. 21–42
  • Han, K.T. (2006). WinGen: Windows software that generates IRT parameters and item responses [Computer software]. Amherst, MA: Center for Educational Assessment, University of Massachusetts. http://people.umass.edu/kha/wingen/
  • Haag DG, Santiago PHR, Macedo DM, Bastos JL, Paradies Y, Jamieson L (2020) Development and initial psychometric assessment of the race-related attitudes and multiculturalism scale in Australia. PLoS ONE 15(4): e0230724. https://doi.org/10.1371/journal.pone.0230724
  • Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125. https://doi.org/10.1177/014662169602000201
  • Jacoby, J. & Matell, M.S. (1971), Three-point likert scales are good enough, Journal of Marketing Research, 9(4), 444-446. DOI:10.1177/002224377100800414
  • Kang, H.-A., Arbet, G., Betts, J., & Muntean, W. (2024). Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items. Applied Psychological Measurement, 48(1-2), 57-76. https://doi.org/10.1177/01466216241227548
  • Kılıç, A. F., Koyuncu, İ, & Uysal, İ. (2023). Scale development based on item response theory: A systematic review. International Journal of Psychology and Educational Studies, 10(1), 209-223. https://dx.doi.org/10.52380/ijpes.2023.10.1.982
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/014662169201600206
  • Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. SAGE.
  • Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15. https://doi.org/10.1016/S0001-6918(99)00050-5
  • R Core Team (2021) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org
  • Saepuzaman, D., Edi Istiyono, & Haryanto. (2022). Characteristics of fundamental physics higher-order thinking skills test using Item Response Theory analysis. Pegem Journal of Education and Instruction, 12(4), 269–279. https://doi.org/10.47750/pegegog.12.04.28
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17, 321–335. http://dx.doi.org/10.12738/estp.2017.1.0270
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika MonographNo. 17). Richmond, VA: Psychometric Society.
  • Santoso, P.H., Setiawati, F.A., Ismail, R. et al. (2023). Comparing IRT properties among different category numbers: a case from attitudinal measurement on physics education research. Discov Psychol 3, 39. https://doi.org/10.1007/s44202-023-00101-6
  • Sigal, M. J. & Chalmers, R. P. (2016). Play it again: Teaching statistics with monte carlo simulation. Journal of Statistics Education, 24(3), 136–156. https://doi.org/10.1080/10691898.2016.1246953
  • Talkkari, A. & Rosenström, T. H. (2024). Non-Gaussian Liability Distribution for Depression in the General Population. Assessment, 0(0). https://doi.org/10.1177/10731911241275327
  • Wright, B. D., & Stone, M. H. (1979). Best Test Design: Rasch Measurement. Chicago, IL: Mesa Press.
Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları
Bölüm Araştırma Makalesi
Yazarlar

Emrah Gül 0000-0001-8799-3356

Gönderilme Tarihi 14 Nisan 2025
Kabul Tarihi 17 Ekim 2025
Yayımlanma Tarihi 15 Aralık 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 58 Sayı: 3

Kaynak Göster

APA Gül, E. (2025). Comparison Generalized Item Location Indices for Polytomous Items: A Monte Carlo Simulation Study. Ankara University Journal of Faculty of Educational Sciences (JFES), 58(3), 1197-1228. https://doi.org/10.30964/auebfd.1675814

Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, CC BY-NC-ND 4.0 lisansını kullanmaktadır.