Toplam Test ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması

Sümeyra Soysal; Hülya Kelecioğlu

doi:10.21031/epod.404089

Research Article

Comparison of Estimation of Total Score and Subscores with Hierarchical Item Response Theory Models

Year 2018, Volume: 9 Issue: 2, 178 - 201, 30.06.2018

Sümeyra Soysal , Hülya Kelecioğlu

https://doi.org/10.21031/epod.404089

Cited By: 2

Abstract

In this study,
the relationship between subtest and total test was investigated by using
hierarchical item response theory models in order to contribute to reliable
subtest and total test score estimates. The RMSE and reliability of the total
test score and subtest scores estimated by the Higher Order, Bi-factor and
hierarchical MIRT models in the study were compared under the conditions of the
size of the correlations between the subtests, subtest length and number of
subtests. In addition, the performance of three models used in the research was
examined on TEOG 2015 data. As a result of the study, in almost all conditions,
when the correlation between the subtest and the subtest length increased, the
RMSE of the ability parameters decreased and the reliability increased for the
total test score obtained from the three estimation models. Under all
conditions, the lowest RMSE values and the highest reliability values were yielded
from Hierarchical MIRT model for subtest score recovery and from Hierarchical
MIRT model for total test score recovery. In addition, all models estimated
RMSE and reliability values close to each other at 0.8 level of correlation for
total test score recovery. The RMSE values of the ability parameters for the
subtest scores in two and three dimensional data were found to be not affected
by the correlation level between the subtests while the subtest length
decreased in the Hierarchical MIRT model; were found to decrease as the
correlation between subtest and subtest length in the Higher Order model and
were found to decrease as the subtest length increased, but significantly
increased as the correlation between the subtests increased in the Bi-factor
model.

Keywords

subtest scoring, overall test scoring, hierarchical item response theory models, higher order model, bi-factor model

References

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (1999). Standards for educational and psychological testing. American Educational Research Association, Washington, DC.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168, doi: 10.1002/j.2333-8504.1998.tb01752.x
Brandt, S., & Duckor, B. (2013). Increasing unidimensional measurement precision using a multidimensional item response model approach. Psychological Test and Assessment Modeling, 55(2), 148-161.
Brennan, R. L. (2012). Utility indexes for decisions about subscores (No. 33). Center for Advanced Studies in Measurement and Assessment (CASMA). Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-33.pdf
Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Doctoral Dissertation). Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/155592/Bulut_umn_0130E_13879.pdf?sequence=1&isAllowed=y
Chang, Y. F. (2015). A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses. (Doctoral Dissertation) Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/175551/CHANG_umn_0130E_16452.pdf?sequence=1&isAllowed=y
Çakıcı Eser, D. (2015). Çok boyutlu madde tepki kuramının farklı modellerinden çeşitli koşullar altında kestirilen parametrelerin incelenmesi. (Doktora tezi). Erişim adresi: http://tez2.yok.gov.tr/
de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional IRT in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311, doi: 10.3102/10769986030003295
de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33(6), 465–485, doi: 10.1177/0146621608329890
de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33(8), 620-639, doi: 10.1177/0146621608326423
de la Torre, J., Song, H., & Hong, Y. (2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement, 35(4), 296-316, doi: 10.1177/0146621610378653
Edwards, M. C., & Vevea, J. L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow?. Journal of Educational and Behavioral Statistics, 31(3), 241-259, doi: 10.3102/10769986031003241
ETS. (2014). ETS standards for quality and fairness. Educational Testing Service. .Retreived from https://www.ets.org/s/about/pdf/standards.pdf
Ferrara, S., & DeMauro, G. E. (2007). Standardized assessment of individual achievement in K–12. In R. L. Brennan (Eds.). Educational measurement, 579–622. Westport, CT: Praeger.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research in education. (8th edition). Boston: McGraw – Hill. Gall M. D., Gall, J. P., & Borg, W., R. (2003). Educational research: An introduction. (7th. Edition). Pearson Education, Inc.
Gibbons, R. D., & Hedeker, D. (1992). Full-information item Bi-factor analysis. Psychometrika, 57, 423–436.
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2), 204–229, doi:10.3102/1076998607302636
Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95, doi:10.1348/000711007X248875
Haladyna, T. M., & Kramer, G. A. (2004). The validity of subscores for a credentialing Test. Evaluation & The Health Professions, 27(4), 349–368, doi: 10.1177/0163278704270010
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125, doi: 10.1177/014662169602000201
Huang, H. Y., Wang, W. C., Chen, P. H., & Su, C. M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement, 37(8), 619-637, doi: 10.1177/0146621613488819
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7(109), 1-10, doi: 10.3389/fpsyg.2016.00109
Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press Kerlinger.
Kerlinger, F.N. (1973). Foundation of behavioural research. New York. Holt. Rinehand and Hinston.
Köse, İ.A. (2010). Madde tepki kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
Lee, J. (2012). Multidimensional item response theory: an investigation of interaction effects between factors on item parameter recovery using Markov Chain Monte Carlo. (Doctoral Dissertation). Retrieved from https://d.lib.msu.edu/islandora/object/etd:1577/datastream/OBJ/download/Multidimensional_item_response_theory__an_investigation_of_interaction_effects_between_factors_on_item_parameter_recovery_using_Markov_Chain_Monte_Carlo.pdf
Ling, G. (2012). Why the major field test in business does not report subscores: Reliability and construct validity evidence (No. RR-12-11). ETS Research Report. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-11.pdf
Lorenzo-Seva, U., & Ferrando, P.J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavioral Research Methods, Instruments and Computers, 38(1), 88-91.
Messick, S. (1989). Validity. In R. L. Linn (Eds.). Educational measurement, 13-103, New York, NY: Macmillan. Monaghan, W. (2006). The fact about subscores (No. RDC-04). ETS Research Report. Retrieved from https://www.ets.org/research/policy_research_reports/rdc-04
Özkan, Y. Ö. (2012). Öğrenci başarılarının belirlenmesi sınavından (ÖBBS) klasik test kuramı, tek boyutlu ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36, doi: 10.1177/0146621697211002
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53-61.
Sheng, Y., & Wikle, C. K. (2007). Comparing Multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6) 899–919, doi: 10.1177/0013164406296977
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68(3), 413–430, doi: 10.1177/0013164407308512
Shin, D. (2007). A comparison of methods of estimating subscale scores for mixed-format tests. Report for Pearson Educational Measurement. Retreived from https://images.pearsonassessments.com/images/tmrs/tmrs_rg/EstimatingSubscaleScoresforMixedFormatItemsforPEMreportfinal.pdf?WT.mc_id=TMRS_A_Comparison_of_Methods_of_Estimating
Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005, April). A comparison of methods of estimating objective scores. Annual meeting of the National Council on Measurement in Education, Montreal, Canada.
Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47(2),150-174.
Skorupski, W. P., & Carvajal, J. (2010). A comparison of approaches for improving the reliability of objective level scores. Educational and Psychological Measurement, 70(3), 357-375, doi: 10.1177/0013164409355694
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L.,Swygert, K. A., & Thissen, D. (2001). Augmented score—‘‘borrowing strength’’ to compute scores based on small numbers of items. In D. Thissen and H. Wainer (Eds.). Test scoring, (343-387). Mahwah, Lawrence Erlbaum Associates, Inc
Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149, doi: 10.1177/0146621604271053
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116, doi: 10.1037/1082-989X.9.1.116
Yao, L. (2003). SimuMIRT [Software]. Monterey, CA: Defense Manpower Data Center. Retreived from http://www.bmirt.com
Yao, L. (2010). Reporting valid and reliable overall scores and domain scores. Journal of Educational Measurement, 47(3), 339-360, doi: 10.1111/j.1745-3984.2010.00117.x
Yao, L. (2017). Comparing methods for estimating the abilities for the multidimensional models of mixed item types. Communications in Statistics-Simulation and Computation, 1-18, doi: 10.1080/03610918.2016.1277749
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83-105, doi: 10.1177/0146621606291559
Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469–492, doi: 10.1177/0146621605284537
Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for 2 latent trait models. Journal of Educational Measurement, 17(4), 297–311, doi: 10.1111/j.1745-3984.1980.tb00833.x
Yen, W. M. (1987,June). A Bayesian/IRT index of objective performance. Annual meeting of the Psychometric Society, Montreal, Quebec, Canada.

Toplam Test ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması

Year 2018, Volume: 9 Issue: 2, 178 - 201, 30.06.2018

Sümeyra Soysal , Hülya Kelecioğlu

https://doi.org/10.21031/epod.404089

Cited By: 2

Abstract

Bu
araştırmada güvenilir alt test ve toplam test puanı kestirimleri konusuna katkı
sağlamak amacıyla alt test ve toplam test arasındaki ilişki hiyerarşik madde
tepki kuramı modelleri ile araştırılmak istenmiştir. Çalışmada Üst Düzey Sıralı
(Higher Order), İki Faktör (Bi-factor) ve hiyerarşik çok boyutlu madde tepki
kuramı (ÇBMTK) modelleri ile kestirilen toplam test puanının ve alt test
puanlarının RMSE ve güvenirlik değerleri alt test sayısı, alt test uzunluğu ve
alt testler arasındaki korelasyonların büyüklüğü koşulları altında
karşılaştırılmıştır. Ayrıca TEOG 2015 verileri üzerinde araştırmada kullanılan
üç kestirim modelinin performansı incelenmiştir. Araştırmanın sonucunda iki ve üç boyutlu verilerde hemen hemen tüm
koşullarda alt test uzunluğu ve alt testler arasındaki korelasyonun arttıkça üç
kestirim modelinden elde edilen toplam test puanı için yetenek parametreleri
kestirim hatasının azaldığı, kestirim güvenirliğinin ise arttığı bulunmuştur.
Toplam test puanları için Hiyerarşik ÇBMTK model ile tüm koşullarda en düşük
RMSE değeri ve en yüksek güvenirlik değeri elde edilmiştir. Ayrıca korelasyonun
0.8 düzeyinde toplam test puanı için tüm modeller birbirine yakın RMSE ve
güvenirlik değerleri ile kestirim yapmıştır. İki ve üç boyutlu verilerde alt test
puanı için kestirilen yetenek parametrelerinin RMSE değerleri, Hiyerarşik ÇBMTK
modelde alt test uzunluğu arttıkça azalırken alt testler arasındaki korelasyon
düzeyinden etkilenmediği; Üst Düzey Sıralı modelde alt test uzunluğu ve alt
testler arasındaki korelasyon arttıkça azaldığı; İki Faktör modelde ise alt
test uzunluğu arttıkça azalırken alt testler arasındaki korelasyon arttıkça
önemli düzeyde arttığı bulunmuştur.

Keywords

Alt test puan kestirimi, toplam test puan kestirimi, hiyerarşik madde tepki kuramı modelleri, üst düzey sıralı model, iki faktör model

References

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (1999). Standards for educational and psychological testing. American Educational Research Association, Washington, DC.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168, doi: 10.1002/j.2333-8504.1998.tb01752.x
Brandt, S., & Duckor, B. (2013). Increasing unidimensional measurement precision using a multidimensional item response model approach. Psychological Test and Assessment Modeling, 55(2), 148-161.
Brennan, R. L. (2012). Utility indexes for decisions about subscores (No. 33). Center for Advanced Studies in Measurement and Assessment (CASMA). Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-33.pdf
Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Doctoral Dissertation). Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/155592/Bulut_umn_0130E_13879.pdf?sequence=1&isAllowed=y
Chang, Y. F. (2015). A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses. (Doctoral Dissertation) Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/175551/CHANG_umn_0130E_16452.pdf?sequence=1&isAllowed=y
Çakıcı Eser, D. (2015). Çok boyutlu madde tepki kuramının farklı modellerinden çeşitli koşullar altında kestirilen parametrelerin incelenmesi. (Doktora tezi). Erişim adresi: http://tez2.yok.gov.tr/
de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional IRT in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311, doi: 10.3102/10769986030003295
de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33(6), 465–485, doi: 10.1177/0146621608329890
de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33(8), 620-639, doi: 10.1177/0146621608326423
de la Torre, J., Song, H., & Hong, Y. (2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement, 35(4), 296-316, doi: 10.1177/0146621610378653
Edwards, M. C., & Vevea, J. L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow?. Journal of Educational and Behavioral Statistics, 31(3), 241-259, doi: 10.3102/10769986031003241
ETS. (2014). ETS standards for quality and fairness. Educational Testing Service. .Retreived from https://www.ets.org/s/about/pdf/standards.pdf
Ferrara, S., & DeMauro, G. E. (2007). Standardized assessment of individual achievement in K–12. In R. L. Brennan (Eds.). Educational measurement, 579–622. Westport, CT: Praeger.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research in education. (8th edition). Boston: McGraw – Hill. Gall M. D., Gall, J. P., & Borg, W., R. (2003). Educational research: An introduction. (7th. Edition). Pearson Education, Inc.
Gibbons, R. D., & Hedeker, D. (1992). Full-information item Bi-factor analysis. Psychometrika, 57, 423–436.
Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2), 204–229, doi:10.3102/1076998607302636
Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95, doi:10.1348/000711007X248875
Haladyna, T. M., & Kramer, G. A. (2004). The validity of subscores for a credentialing Test. Evaluation & The Health Professions, 27(4), 349–368, doi: 10.1177/0163278704270010
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125, doi: 10.1177/014662169602000201
Huang, H. Y., Wang, W. C., Chen, P. H., & Su, C. M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement, 37(8), 619-637, doi: 10.1177/0146621613488819
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7(109), 1-10, doi: 10.3389/fpsyg.2016.00109
Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press Kerlinger.
Kerlinger, F.N. (1973). Foundation of behavioural research. New York. Holt. Rinehand and Hinston.
Köse, İ.A. (2010). Madde tepki kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
Lee, J. (2012). Multidimensional item response theory: an investigation of interaction effects between factors on item parameter recovery using Markov Chain Monte Carlo. (Doctoral Dissertation). Retrieved from https://d.lib.msu.edu/islandora/object/etd:1577/datastream/OBJ/download/Multidimensional_item_response_theory__an_investigation_of_interaction_effects_between_factors_on_item_parameter_recovery_using_Markov_Chain_Monte_Carlo.pdf
Ling, G. (2012). Why the major field test in business does not report subscores: Reliability and construct validity evidence (No. RR-12-11). ETS Research Report. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-11.pdf
Lorenzo-Seva, U., & Ferrando, P.J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavioral Research Methods, Instruments and Computers, 38(1), 88-91.
Messick, S. (1989). Validity. In R. L. Linn (Eds.). Educational measurement, 13-103, New York, NY: Macmillan. Monaghan, W. (2006). The fact about subscores (No. RDC-04). ETS Research Report. Retrieved from https://www.ets.org/research/policy_research_reports/rdc-04
Özkan, Y. Ö. (2012). Öğrenci başarılarının belirlenmesi sınavından (ÖBBS) klasik test kuramı, tek boyutlu ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36, doi: 10.1177/0146621697211002
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53-61.
Sheng, Y., & Wikle, C. K. (2007). Comparing Multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6) 899–919, doi: 10.1177/0013164406296977
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68(3), 413–430, doi: 10.1177/0013164407308512
Shin, D. (2007). A comparison of methods of estimating subscale scores for mixed-format tests. Report for Pearson Educational Measurement. Retreived from https://images.pearsonassessments.com/images/tmrs/tmrs_rg/EstimatingSubscaleScoresforMixedFormatItemsforPEMreportfinal.pdf?WT.mc_id=TMRS_A_Comparison_of_Methods_of_Estimating
Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005, April). A comparison of methods of estimating objective scores. Annual meeting of the National Council on Measurement in Education, Montreal, Canada.
Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47(2),150-174.
Skorupski, W. P., & Carvajal, J. (2010). A comparison of approaches for improving the reliability of objective level scores. Educational and Psychological Measurement, 70(3), 357-375, doi: 10.1177/0013164409355694
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L.,Swygert, K. A., & Thissen, D. (2001). Augmented score—‘‘borrowing strength’’ to compute scores based on small numbers of items. In D. Thissen and H. Wainer (Eds.). Test scoring, (343-387). Mahwah, Lawrence Erlbaum Associates, Inc
Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149, doi: 10.1177/0146621604271053
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116, doi: 10.1037/1082-989X.9.1.116
Yao, L. (2003). SimuMIRT [Software]. Monterey, CA: Defense Manpower Data Center. Retreived from http://www.bmirt.com
Yao, L. (2010). Reporting valid and reliable overall scores and domain scores. Journal of Educational Measurement, 47(3), 339-360, doi: 10.1111/j.1745-3984.2010.00117.x
Yao, L. (2017). Comparing methods for estimating the abilities for the multidimensional models of mixed item types. Communications in Statistics-Simulation and Computation, 1-18, doi: 10.1080/03610918.2016.1277749
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83-105, doi: 10.1177/0146621606291559
Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469–492, doi: 10.1177/0146621605284537
Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for 2 latent trait models. Journal of Educational Measurement, 17(4), 297–311, doi: 10.1111/j.1745-3984.1980.tb00833.x
Yen, W. M. (1987,June). A Bayesian/IRT index of objective performance. Annual meeting of the Psychometric Society, Montreal, Quebec, Canada.

There are 49 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Sümeyra Soysal 0000-0002-7304-1722 Hülya Kelecioğlu 0000-0002-0741-9934
Publication Date	June 30, 2018
Acceptance Date	June 13, 2018
Published in Issue	Year 2018 Volume: 9 Issue: 2

Cite

APA	Soysal, S., & Kelecioğlu, H. (2018). Toplam Test ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 178-201. https://doi.org/10.21031/epod.404089

Cited By

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1290831

Simultaneous Estimation of Overall Score and Subscores Using MIRT, HO-IRT and Bi-factor Model on TIMSS Data

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

Ayşenur Erdemir

https://doi.org/10.21031/epod.645478

Download Cover Image

Article Files

Full Text