The Examination of Item Difficulty Distribution, Test Length and Sample Size in Different Ability Distribution

Melek Gülşah Şahin; Yıldız Yıldırım

doi:10.21031/epod.385000

Research Article

Farklı Yetenek Dağılımlarında Madde Güçlük Dağılımı, Test Uzunluğu ve Örneklem Büyüklüğünün İncelenmesi

Year 2018, , 277 - 294, 29.09.2018

Melek Gülşah Şahin Yıldız Yıldırım

https://doi.org/10.21031/epod.385000

Cited By: 1

Abstract

Bu çalışma sağa ve sola çarpık
yetenek dağılımlarında farklı madde güçlük dağılımı, örneklem büyüklüğü ve test
uzunluklarının yetenek parametresi kestiriminde ölçme kesinliğine etkisinin
incelendiği bir post-hoc simülasyon çalışmasıdır. Araştırma kapsamında ilk
olarak, yetenek parametreleri 500, 1000, 2500, 5000 ve 10000 kişilik sağa ve
sola çarpık örneklem grupları için 20 maddelik gerçek test sonuçlarından elde
edilmiştir. Araştırmanın ikinci aşamasında b
parametre değerlerine göre normal dağılım, tekdüze dağılım, sola çarpık ve sağa
çarpık dağılım olmak üzere dört farklı test oluşturulmuştur. Test uzunluğu
değişkeni olarak 20 madde ve 30 madde seçilerek araştırma kapsamında toplam 80
koşul incelenmiştir. Veri üretimi WinGen 3 programı ile yapılırken parametre
kestirimleri MULTILOG 7.03 programıyla yapılmıştır. Kestirim yöntemi olarak ise
en çok olabilirlik yöntemi kullanılmıştır. Ölçme kesinliğini belirlemede ise
RMSE ve AAD değerleri hesaplanmıştır ve bu değerler kritere dayalı olarak ve
kıyaslanarak yorumlanmıştır. Sonuçlar, madde güçlük dağılımı, örneklem
büyüklüğü ve test uzunluğu kapsamında değerlendirilmiştir. Sağa çarpık yetenek
dağılımında en yüksek ölçme kesinliği normal b dağılımında, en düşük ölçme kesinliği ise sağa çarpık b dağılımında elde edilmiştir. 30
maddelik testte ölçme kesinliği daha yüksek elde edilirken, örneklem
büyüklüğünün değişiminin ölçme kesinliğine önemli bir etkisi olmadığı
gözlemlenmiştir. Sola çarpık yetenek dağılımında ise en yüksek ölçme kesinliği
normal b dağılımında en düşük ölçme
kesinliği ise sola çarpık b dağılımında
elde edilmiştir. Ayrıca sola çarpık yetenek dağılımında örneklem büyüklüğü ve
test uzunluğunun önemli bir etkisi olmamıştır.

Keywords

Madde tepki kuramı, madde güçlük dağılımı, örneklem büyüklüğü, test uzunluğu, yetenek dağılımı, test uzunluğu

References

Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7(4), 255-278. doi: 10.1207/s15324818ame0704_1
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. doi: 10.1007/BF02293814
Ankenmann, R. D., & Stone, C. A. (1992, April). A Monte Carlo study of marginal maximum likelihood parameter estimates for the graded model. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
Bahry, L. M. (2012). Polytomous item response theory parameter recovery: An investigation of non-normal distributions and small sample size (Unpublished Master Thesis, University of Alberta Department of Educational Psychology, Edmonton). Retrieved from https://era.library.ualberta.ca/items/55cebca1-82a2-44b5-ab78-aad933bbf147.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153-169. doi: 10.1177/01466216980222005
Bhakta, B., Tennant, A., Horton, M., Lawton, G., & Andrich, D. (2005). Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education. BMC Medical Education, 5(1), 9. doi: 10.1186/1472-6920-5-9
Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Çok kategorili parametrik ve parametrik olmayan madde tepki kuramı modellerinin karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 354-372. doi: 10.21031/epod.346650
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51. doi: 10.1007/BF02291411
Boughton, K. A., Klinger, D. A., & Gierl, M. J. (2001, April). Effects of random rater error on parameter recovery of the generalized partial credit model and graded response model. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.
Cheng, Y., & Yuan, K. H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280-291. doi: 10.1007/s11336-009-9144-x
Craig, S. B., & Kaiser, R. B. (2003). Applying item response theory to multisource performance ratings: What are the consequences of violating the independent observations assumption? Organizational Research Methods, 6(1), 44-60. doi: 10.1177/1094428102239425
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Belmont CA: Wadsworth group/Thomson learning.
Çelen, Ü., & Aybek, E. C. (2013). Öğrenci başarısının öğretmen yapımı bir testle klasik test kuramı ve madde tepki kuramı yöntemleriyle elde edilen puanlara göre karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 4(2), 64-75. Retrieved from http://dergipark.gov.tr/epod/issue/5800/77213.
De Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23(1), 3-19. doi: 10.1177/01466219922031130
DeMars, C. E. (2002, April). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
DeMars, C. E. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27(4), 275-288. doi: 10.1177/0146621603027004003
Doğan, N., & Tezbaşaran, A. A. (2003). Klasik test kuramı ve örtük özellikler kuramının örneklemler bağlamında karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 25(25), 58-67. Retrieved from http://dergipark.gov.tr/download/article-file/87861.
Dolma, S. (2009). Çok ihtimalli Rasch modeli ile derecelendirilmiş yanıt modelinin örtük özellikleri tahminleme performansı açısından simülasyon yöntemiyle karşılaştırılması (Yayımlanmamış Doktora Tezi, İstanbul Üniversitesi Sosyal Bilimler Enstitüsü, İstanbul). Erişim adresi: https://tez.yok.gov.tr/UlusalTezMerkezi/.
Fotaris, P., Mastoras, T., Mavridis, I., & Manitsaris, A. (2010, September). Performance evaluation of the small sample dichotomous IRT analysis in assessment calibration. In Computing in the Global Information Technology (ICCGI), 2010 Fifth International Multi-Conference on (pp. 214-219). IEEE. doi: 10.1109/ICCGI.2010.19
Guyer, R., & Thompson, N. (2011). Item response theory parameter recovery using Xcalibre 4.1. Saint Paul, MN: Assessment Systems Corporation.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory principles and applications (2. Ed.). USA: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459. doi: 10.1177/0146621607299271
Han, K. T., & Hambleton, R. K. (2007). User's manual: WinGen (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts, School of Education.
Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1), 1-24. Retrieved from http://pareonline.net/getvn.asp?v=17&n=1.
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. doi: 014662169602000201
İlhan, M. (2016). Açık uçlu sorularla yapılan ölçmelerde klasik test kuramı ve çok yüzeyli Rasch modeline göre hesaplanan yetenek kestirimlerinin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368. doi: 10.16986/HUJE.2016015182
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two-and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6(3), 249-260. doi: 10.1177/014662168200600301
Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419. doi: 10.1177/0146621612446170
Koğar, H. (2015). Madde tepki kuramına ait parametrelerin ve model uyumlarının karşılaştırılması: Bir Monte Carlo çalışması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 142-157. doi: 10.21031/epod.02072
Köse, İ. A. (2010). Madde tepki̇ kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması (Yayınlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/.
Lautenschlager, G. J., Meade, A. W., & Kim, S. H. (2006, April). Cautions regarding sample characteristics when using the graded response model. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. USA: Information Age Publishing.
Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. doi: 10.1007/BF02296272
Montgomery, M., & Skorupski, W. (2012, April). Investigation of IRT parameter recovery and classification accuracy in mixed format. Paper presented at the annual meeting of the Nation Council of Measurement in Education, British Columbia.
Muraki, E. (1992) A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. doi: 10.1002/j.2333-8504.1992.tb01436.x
OECD. (2017). PISA 2015 Technical Report. Paris: PISA, OECD Publishing.
Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65(2), 251-262. doi: 10.1111/j.2044-8317.2011.02020.x
Preston, K. S. J., & Reise, S. P. (2014). Estimating the nominal response model under nonnormal conditions. Educational and Psychological Measurement, 74(3), 377-399. doi: 10.1177/0013164413507063
Reise, S. P., & Yu, J. (1990). Parameter recover in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133-144. doi: 10.1111/j.1745-3984.1990.tb00738.x
Roberts, J. S., & Laughlin, J. E. (1996). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20(3), 231-255. doi: 10.1177/014662169602000305
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2), 100.
Seong, T. J., Kim, S. H., & Cohen, A. S. (1997, March). A comparison of procedures for ability estimation under the graded response model. Paper presented at the annual meeting of the Nation Council of Measurement in Education, Chicago.
Sen, S. (2014). Robustness of mixture IRT models to violations of latent normality (Doctoral dissertation, University of Georgia, Athens). Retrieved from http://tez.yok.gov.tr/ UlusalTezMerkezi/.
Sen S., Cohen A.S., Kim SH. (2015) Robustness of Mixture IRT Models to Violations of Latent Normality. In: Millsap R., Bolt D., van der Ark L., Wang WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. doi: 10.1007/978-3-319-07503-7_3
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two- parameter logistic response model: An evaluation of MULTILOG. Applied Pscyhological Measurement, 16(1), 1-16. doi: 10.1177/014662169201600101
Swaminathan, H. & Gifford, J. A. (1979, April). Estimation of parameters in the three- parameter latent trait model. Paper presented at the annual meeting of AERA-NCME, San Francisco.
Tate, R. (2000). Robustness of the school-level polytomous IRT model. Educational and Psychological Measurement, 60(1), 20-37. doi: 10.1177/00131640021970349
Thissen, D., Chen, W. H. & Bock, D. (2003). MULTILOG 7.03. Lincolnwood, IL: Scientific Software International.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397-412. doi: 10.1007/BF02293705
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16(1), 1-9. Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.
Wang, W. C., & Chen, C. T. (2005). Item parameter recovery, standard error estimates, and fit statistics of the Winsteps program for the family of Rasch models. Educational and Psychological Measurement, 65(3), 376-404. doi: 10.1177/0013164404268673
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y. S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological measurement, 26(3), 339-352. doi: 10.1177/0146621602026003007
Wollack, J. A., & Cohen, A. S. (1998). Detection of answer copying with unknown item and trait parameters. Applied Psychological Measurement, 22(2), 144-152. doi: 10.1177/01466216980222004
Yavuz, G., & Hambleton, R. K. (2017). Comparative analyses of MIRT models and software (BMIRT and flexMIRT). Educational and Psychological Measurement, 77(2), 263-274. doi: 10.1177/0013164416661220

The Examination of Item Difficulty Distribution, Test Length and Sample Size in Different Ability Distribution

Year 2018, , 277 - 294, 29.09.2018

Melek Gülşah Şahin Yıldız Yıldırım

https://doi.org/10.21031/epod.385000

Cited By: 1

Abstract

This is a post-hoc simulation study which investigates the effect of
different item difficulty distributions, sample sizes, and test lengths on
measurement precision while estimating the examinee parameters in right and
left-skewed distributions. First of all, the examinee parameters were obtained
from 20-item real test results for the right-skewed and left-skewed sample
groups of 500, 1000, 2500, 5000, and 10000. In the second phase of the study,
four different tests were formed according to the b parameter values: normal,
uniform, left skewed and right skewed distributions. A total of 80 conditions
were formed within the scope of this research by selecting 20-item and 30-item
condition as the test length variable. In determining the measurement precision,
the RMSE and AAD values were calculated. The results were evaluated in terms of
the item difficulty distributions, sample sizes, and test lengths. As a result,
in right-skewed examinee distribution, the highest measurement precision was
obtained at the normal b distribution and the lowest measurement precision was
obtained at the right skewed b distribution. A higher measurement precision was
obtained in the 30-item test, however, it was observed that the change in the
sample size didn’t affect the measurement precision significantly in
right-skewed examinee distribution. In the left skewed distribution, the
highest measurement precision was obtained at the normal b distribution and the
lowest measurement precision was obtained at the left-skewed b distribution.
Also it was observed that the change in the sample size and test length didn’t
affect the measurement precision significantly in the left-skewed distribution.

Keywords

Item response theory, examinee distribution, item difficulty distribution, sample size, test length

References

Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7(4), 255-278. doi: 10.1207/s15324818ame0704_1
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. doi: 10.1007/BF02293814
Ankenmann, R. D., & Stone, C. A. (1992, April). A Monte Carlo study of marginal maximum likelihood parameter estimates for the graded model. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
Bahry, L. M. (2012). Polytomous item response theory parameter recovery: An investigation of non-normal distributions and small sample size (Unpublished Master Thesis, University of Alberta Department of Educational Psychology, Edmonton). Retrieved from https://era.library.ualberta.ca/items/55cebca1-82a2-44b5-ab78-aad933bbf147.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153-169. doi: 10.1177/01466216980222005
Bhakta, B., Tennant, A., Horton, M., Lawton, G., & Andrich, D. (2005). Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education. BMC Medical Education, 5(1), 9. doi: 10.1186/1472-6920-5-9
Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Çok kategorili parametrik ve parametrik olmayan madde tepki kuramı modellerinin karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(4), 354-372. doi: 10.21031/epod.346650
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51. doi: 10.1007/BF02291411
Boughton, K. A., Klinger, D. A., & Gierl, M. J. (2001, April). Effects of random rater error on parameter recovery of the generalized partial credit model and graded response model. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.
Cheng, Y., & Yuan, K. H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280-291. doi: 10.1007/s11336-009-9144-x
Craig, S. B., & Kaiser, R. B. (2003). Applying item response theory to multisource performance ratings: What are the consequences of violating the independent observations assumption? Organizational Research Methods, 6(1), 44-60. doi: 10.1177/1094428102239425
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Belmont CA: Wadsworth group/Thomson learning.
Çelen, Ü., & Aybek, E. C. (2013). Öğrenci başarısının öğretmen yapımı bir testle klasik test kuramı ve madde tepki kuramı yöntemleriyle elde edilen puanlara göre karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 4(2), 64-75. Retrieved from http://dergipark.gov.tr/epod/issue/5800/77213.
De Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23(1), 3-19. doi: 10.1177/01466219922031130
DeMars, C. E. (2002, April). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
DeMars, C. E. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27(4), 275-288. doi: 10.1177/0146621603027004003
Doğan, N., & Tezbaşaran, A. A. (2003). Klasik test kuramı ve örtük özellikler kuramının örneklemler bağlamında karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 25(25), 58-67. Retrieved from http://dergipark.gov.tr/download/article-file/87861.
Dolma, S. (2009). Çok ihtimalli Rasch modeli ile derecelendirilmiş yanıt modelinin örtük özellikleri tahminleme performansı açısından simülasyon yöntemiyle karşılaştırılması (Yayımlanmamış Doktora Tezi, İstanbul Üniversitesi Sosyal Bilimler Enstitüsü, İstanbul). Erişim adresi: https://tez.yok.gov.tr/UlusalTezMerkezi/.
Fotaris, P., Mastoras, T., Mavridis, I., & Manitsaris, A. (2010, September). Performance evaluation of the small sample dichotomous IRT analysis in assessment calibration. In Computing in the Global Information Technology (ICCGI), 2010 Fifth International Multi-Conference on (pp. 214-219). IEEE. doi: 10.1109/ICCGI.2010.19
Guyer, R., & Thompson, N. (2011). Item response theory parameter recovery using Xcalibre 4.1. Saint Paul, MN: Assessment Systems Corporation.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory principles and applications (2. Ed.). USA: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459. doi: 10.1177/0146621607299271
Han, K. T., & Hambleton, R. K. (2007). User's manual: WinGen (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts, School of Education.
Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1), 1-24. Retrieved from http://pareonline.net/getvn.asp?v=17&n=1.
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. doi: 014662169602000201
İlhan, M. (2016). Açık uçlu sorularla yapılan ölçmelerde klasik test kuramı ve çok yüzeyli Rasch modeline göre hesaplanan yetenek kestirimlerinin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368. doi: 10.16986/HUJE.2016015182
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two-and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6(3), 249-260. doi: 10.1177/014662168200600301
Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419. doi: 10.1177/0146621612446170
Koğar, H. (2015). Madde tepki kuramına ait parametrelerin ve model uyumlarının karşılaştırılması: Bir Monte Carlo çalışması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 142-157. doi: 10.21031/epod.02072
Köse, İ. A. (2010). Madde tepki̇ kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması (Yayınlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/.
Lautenschlager, G. J., Meade, A. W., & Kim, S. H. (2006, April). Cautions regarding sample characteristics when using the graded response model. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. USA: Information Age Publishing.
Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. doi: 10.1007/BF02296272
Montgomery, M., & Skorupski, W. (2012, April). Investigation of IRT parameter recovery and classification accuracy in mixed format. Paper presented at the annual meeting of the Nation Council of Measurement in Education, British Columbia.
Muraki, E. (1992) A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. doi: 10.1002/j.2333-8504.1992.tb01436.x
OECD. (2017). PISA 2015 Technical Report. Paris: PISA, OECD Publishing.
Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65(2), 251-262. doi: 10.1111/j.2044-8317.2011.02020.x
Preston, K. S. J., & Reise, S. P. (2014). Estimating the nominal response model under nonnormal conditions. Educational and Psychological Measurement, 74(3), 377-399. doi: 10.1177/0013164413507063
Reise, S. P., & Yu, J. (1990). Parameter recover in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133-144. doi: 10.1111/j.1745-3984.1990.tb00738.x
Roberts, J. S., & Laughlin, J. E. (1996). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20(3), 231-255. doi: 10.1177/014662169602000305
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2), 100.
Seong, T. J., Kim, S. H., & Cohen, A. S. (1997, March). A comparison of procedures for ability estimation under the graded response model. Paper presented at the annual meeting of the Nation Council of Measurement in Education, Chicago.
Sen, S. (2014). Robustness of mixture IRT models to violations of latent normality (Doctoral dissertation, University of Georgia, Athens). Retrieved from http://tez.yok.gov.tr/ UlusalTezMerkezi/.
Sen S., Cohen A.S., Kim SH. (2015) Robustness of Mixture IRT Models to Violations of Latent Normality. In: Millsap R., Bolt D., van der Ark L., Wang WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. doi: 10.1007/978-3-319-07503-7_3
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two- parameter logistic response model: An evaluation of MULTILOG. Applied Pscyhological Measurement, 16(1), 1-16. doi: 10.1177/014662169201600101
Swaminathan, H. & Gifford, J. A. (1979, April). Estimation of parameters in the three- parameter latent trait model. Paper presented at the annual meeting of AERA-NCME, San Francisco.
Tate, R. (2000). Robustness of the school-level polytomous IRT model. Educational and Psychological Measurement, 60(1), 20-37. doi: 10.1177/00131640021970349
Thissen, D., Chen, W. H. & Bock, D. (2003). MULTILOG 7.03. Lincolnwood, IL: Scientific Software International.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397-412. doi: 10.1007/BF02293705
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16(1), 1-9. Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.
Wang, W. C., & Chen, C. T. (2005). Item parameter recovery, standard error estimates, and fit statistics of the Winsteps program for the family of Rasch models. Educational and Psychological Measurement, 65(3), 376-404. doi: 10.1177/0013164404268673
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y. S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological measurement, 26(3), 339-352. doi: 10.1177/0146621602026003007
Wollack, J. A., & Cohen, A. S. (1998). Detection of answer copying with unknown item and trait parameters. Applied Psychological Measurement, 22(2), 144-152. doi: 10.1177/01466216980222004
Yavuz, G., & Hambleton, R. K. (2017). Comparative analyses of MIRT models and software (BMIRT and flexMIRT). Educational and Psychological Measurement, 77(2), 263-274. doi: 10.1177/0013164416661220

There are 56 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Melek Gülşah Şahin This is me 0000-0001-5139-9777 Yıldız Yıldırım 0000-0001-8434-5062
Publication Date	September 29, 2018
Acceptance Date	August 6, 2018
Published in Issue	Year 2018

Cite

APA	Şahin, M. G., & Yıldırım, Y. (2018). The Examination of Item Difficulty Distribution, Test Length and Sample Size in Different Ability Distribution. Journal of Measurement and Evaluation in Education and Psychology, 9(3), 277-294. https://doi.org/10.21031/epod.385000

Journal of Measurement and Evaluation in Education and Psychology

Farklı Yetenek Dağılımlarında Madde Güçlük Dağılımı, Test Uzunluğu ve Örneklem Büyüklüğünün İncelenmesi

Abstract

Keywords

References

The Examination of Item Difficulty Distribution, Test Length and Sample Size in Different Ability Distribution

Abstract

Keywords

References

Details

Cite

Cited By

Drawing a Sample with Desired Properties from Population in R Package “drawsample”

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

Kübra ATALAY KABASAKAL

https://doi.org/10.21031/epod.790449