Investigating The Effect Of Testlets Consisting Of Open-Ended And Multiple-Choice Items On Reliability Via Generalizability Theory

Serpil Kocaoğlu; Melek Gülşah Şahin

doi:10.21031/epod.1429423

Research Article

Investigating The Effect Of Testlets Consisting Of Open-Ended And Multiple-Choice Items On Reliability Via Generalizability Theory

Year 2024, Volume: 15 Issue: 1, 65 - 78, 31.03.2024

Serpil Kocaoğlu , Melek Gülşah Şahin

https://doi.org/10.21031/epod.1429423

Abstract

This study aimed to reveal the effect of testlets consisting of open-ended and multiple-choice items with similar content on reliability. For this purpose, the Mathematics Achievement Test with four testlets, one of which consisted of open-ended items and the other of multiple-choice items, was administered to 128 8th-grade students. Reliability estimations on the obtained data were conducted in the Edu-G program based on the Generalizability Theory. A decision study was also performed in the study. In the achievement test with testlets consisting of open-ended items, p×i×r (p: person, i: item, r: rater) fully crossed design was used when testlet effect was not considered; p×(i:t)×r (t: testlet) nested design was used when testlet effect was considered. According to the results obtained, the reliability coefficient was estimated higher when the testlet effect was not considered. Similarly, in the achievement test with testlets consisting of multiple-choice items, the p×i crossed design was used when the testlet effect was not considered, and the p×(i:t) nested design was used when the testlet effect was considered. According to the results, the reliability coefficient was similarly estimated higher when the testlet effect was not considered. According to the data obtained within the scope of the study, the reliability coefficient was estimated according to the treatment of the testlet effect in the test with open-ended items.

Keywords

open-ended items, multiple-choice items, testlet, generalizability theory

References

Attali, Y., Laitusis, C., & Stone, E. (2016). Differences in reaction to immediate feedback and opportunity to revise answers for multiple-choice and open-ended questions. Educational and Psychological Measurement, 76(5), 787-802. https://journals.sagepub.com/doi/10.1177/0013164415612548
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM.
Berberoğlu, G. (2006). Sınıf içi ölçme değerlendirme teknikleri. Morpa Kültür.
Brennan, R. L. (2001). Generalizability theory. Springer-Verlag.
Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271. https://doi.org/10.2307/145138
Doğan, N. (2009). Yazılı yoklamalar. In H. Atılgan (Ed.), Eğitimde ölçme ve değerlendirme (p.148). Anı.
Doğan, N. (2019a). Geleneksel ölçme ve değerlendirme teknikleri I: Yanıtı seçmeyi gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp. 113-138). Pegem Akademi.
Doğan, N. (2019b). Geleneksel ölçme ve değerlendirme teknikleri II: Yanıtı yapılandırmayı gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp:140-179). Pegem Akademi.
Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-26).
Gessaroli, M. E., & Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2(3-4), 277-295. https://doi.org/10.1080/15305058.2002.9669496
Güler, N., Kaya Uyanık, G., & Taşdelen Teker, G. (2012). Genellenebilirlik Kuramı. Pegem Akademi.
Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Taylor & Francis Group. https://ebookcentral.proquest.com/lib/gazi-ebooks/detail.action?docID=255610
Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: A comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education.
Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177-189. https://www.tandfonline.com/doi/abs/10.1080/07481756.2002.12069034
Karasar, N. (1994). Bilimsel Araştırma Yöntemi. 3A Araştırma Eğitim Danışmanlık.
Karatoprak Erşen, R., & Gündüz, T. (2023). Seçme ve katkı gerektiren maddelerin yazımı ve düzenlenmesi için kontrol listeleri. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi (58), 2473-2493. https://doi.org/10.53444/deubefd.1279240
Kaya Uyanık, G., & Ertuna, L. (2022). Examination of testlet effect in open-ended items. SAGE Open, 1-12. https://doi.org/10.1177/21582440221079849
Kaya Uyanık, G., & Gelbal, S. (2018). Madde tepki modellemesinde genellenebilirlik ile iki yüzeyli desenlerin incelenmesi. Journal of Measurement and Evaluation in Education and Psychology, 9(1), 17-32. https://doi.org/10.21031/epod.349718
Ko, M. H. (2010). A comparision of reading comprehension tests: Multiple-choice vs. open-ended. English Teaching, 65(1), 137-159. doi:10.15858/engtea.65.1.201003.137
Lee, G., & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237–255. https://doi.org/10.1207/S15324818AME1203_2
Lee, G., & Park, I.-Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54. https://doi.org/10.1007/s12564-011-9170-0
Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Pratice, 19(4), 9-15. https://doi.org/10.1111/j.1745-3992.2000.tb00041.x
Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research and Evaluation, 7(10), 1-6. https://doi.org/10.7275/q7rm-gg74
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analyses: An expanded sourcebook. CA: Sage Publications.
Nitko, A. J., & Brookhart, S. M. (2014). Educational assessments of students (6th ed.). Essex: Pearson International.
Özçelik, D.A. (2013). Test hazırlama kılavuzu. Pegem Akademi.
Popham, J.W. (2014). Selected-response tests. In Classroom assessment: What teachers need to know (7th ed, pp. 155-180). Pearson Education Ltd.
Russell, M. & Airasian, P.(2008). Designing, administering, and scoring achievement tests. Classroom assessment: Concepts and applications içinde (7th ed, pp. 144-175). McGrawHill Higher Education.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: a primer. Sage Publicatons.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliabilty of testlet-based tests. Journal of Educational Measurement, 28(3), 237-247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x
Taşdelen Teker, G. (2014). Madde takımlarının güvenirlik ve değişen madde fonksiyonu üzerine etkisi. Doctoral Dissertation, Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
Taşdelen Teker, G., Şahin, M. G., & Baytemir, K. (2016). Using generalizability theory to investigate the reliability of peer assessment. Journal of Human Sciences, 13(3), 5574-5586. https://doi.org/10.14687/jhs.v13i3.4155
Tekin, H. (2009). Eğitimde ölçme ve değerlendirme. Yargı.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x
Turgut, M. F. (1992). Eğitimde ölçme ve değerlendirme metotları. Saydam.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186. https://doi.org/10.1207/s15324818ame0802_4
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29. http://doi: 10.1111/j.1745-3992.1996tb00803.x
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Springer. https://doi.org/10.1007/0-306-47531-6_13
Yaman, S. (2016). Çoktan seçmeli madde tipleri ve fen eğitiminde kullanılan örnekleri. Gazi Eğitim Bilimleri Dergisi, 2(2), 151-170. https://dergipark.org.tr/tr/pub/gebd/issue/35205/390659

Year 2024, Volume: 15 Issue: 1, 65 - 78, 31.03.2024

Serpil Kocaoğlu , Melek Gülşah Şahin

https://doi.org/10.21031/epod.1429423

Abstract

References

Attali, Y., Laitusis, C., & Stone, E. (2016). Differences in reaction to immediate feedback and opportunity to revise answers for multiple-choice and open-ended questions. Educational and Psychological Measurement, 76(5), 787-802. https://journals.sagepub.com/doi/10.1177/0013164415612548
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM.
Berberoğlu, G. (2006). Sınıf içi ölçme değerlendirme teknikleri. Morpa Kültür.
Brennan, R. L. (2001). Generalizability theory. Springer-Verlag.
Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271. https://doi.org/10.2307/145138
Doğan, N. (2009). Yazılı yoklamalar. In H. Atılgan (Ed.), Eğitimde ölçme ve değerlendirme (p.148). Anı.
Doğan, N. (2019a). Geleneksel ölçme ve değerlendirme teknikleri I: Yanıtı seçmeyi gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp. 113-138). Pegem Akademi.
Doğan, N. (2019b). Geleneksel ölçme ve değerlendirme teknikleri II: Yanıtı yapılandırmayı gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp:140-179). Pegem Akademi.
Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-26).
Gessaroli, M. E., & Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2(3-4), 277-295. https://doi.org/10.1080/15305058.2002.9669496
Güler, N., Kaya Uyanık, G., & Taşdelen Teker, G. (2012). Genellenebilirlik Kuramı. Pegem Akademi.
Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Taylor & Francis Group. https://ebookcentral.proquest.com/lib/gazi-ebooks/detail.action?docID=255610
Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: A comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education.
Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177-189. https://www.tandfonline.com/doi/abs/10.1080/07481756.2002.12069034
Karasar, N. (1994). Bilimsel Araştırma Yöntemi. 3A Araştırma Eğitim Danışmanlık.
Karatoprak Erşen, R., & Gündüz, T. (2023). Seçme ve katkı gerektiren maddelerin yazımı ve düzenlenmesi için kontrol listeleri. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi (58), 2473-2493. https://doi.org/10.53444/deubefd.1279240
Kaya Uyanık, G., & Ertuna, L. (2022). Examination of testlet effect in open-ended items. SAGE Open, 1-12. https://doi.org/10.1177/21582440221079849
Kaya Uyanık, G., & Gelbal, S. (2018). Madde tepki modellemesinde genellenebilirlik ile iki yüzeyli desenlerin incelenmesi. Journal of Measurement and Evaluation in Education and Psychology, 9(1), 17-32. https://doi.org/10.21031/epod.349718
Ko, M. H. (2010). A comparision of reading comprehension tests: Multiple-choice vs. open-ended. English Teaching, 65(1), 137-159. doi:10.15858/engtea.65.1.201003.137
Lee, G., & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237–255. https://doi.org/10.1207/S15324818AME1203_2
Lee, G., & Park, I.-Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54. https://doi.org/10.1007/s12564-011-9170-0
Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Pratice, 19(4), 9-15. https://doi.org/10.1111/j.1745-3992.2000.tb00041.x
Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research and Evaluation, 7(10), 1-6. https://doi.org/10.7275/q7rm-gg74
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analyses: An expanded sourcebook. CA: Sage Publications.
Nitko, A. J., & Brookhart, S. M. (2014). Educational assessments of students (6th ed.). Essex: Pearson International.
Özçelik, D.A. (2013). Test hazırlama kılavuzu. Pegem Akademi.
Popham, J.W. (2014). Selected-response tests. In Classroom assessment: What teachers need to know (7th ed, pp. 155-180). Pearson Education Ltd.
Russell, M. & Airasian, P.(2008). Designing, administering, and scoring achievement tests. Classroom assessment: Concepts and applications içinde (7th ed, pp. 144-175). McGrawHill Higher Education.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: a primer. Sage Publicatons.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliabilty of testlet-based tests. Journal of Educational Measurement, 28(3), 237-247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x
Taşdelen Teker, G. (2014). Madde takımlarının güvenirlik ve değişen madde fonksiyonu üzerine etkisi. Doctoral Dissertation, Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
Taşdelen Teker, G., Şahin, M. G., & Baytemir, K. (2016). Using generalizability theory to investigate the reliability of peer assessment. Journal of Human Sciences, 13(3), 5574-5586. https://doi.org/10.14687/jhs.v13i3.4155
Tekin, H. (2009). Eğitimde ölçme ve değerlendirme. Yargı.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x
Turgut, M. F. (1992). Eğitimde ölçme ve değerlendirme metotları. Saydam.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186. https://doi.org/10.1207/s15324818ame0802_4
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29. http://doi: 10.1111/j.1745-3992.1996tb00803.x
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Springer. https://doi.org/10.1007/0-306-47531-6_13
Yaman, S. (2016). Çoktan seçmeli madde tipleri ve fen eğitiminde kullanılan örnekleri. Gazi Eğitim Bilimleri Dergisi, 2(2), 151-170. https://dergipark.org.tr/tr/pub/gebd/issue/35205/390659

There are 40 citations in total.

Details

Primary Language	English
Subjects	Test Theories, Testing, Assessment and Psychometrics (Other)
Journal Section	Articles
Authors	Serpil Kocaoğlu 0009-0008-0566-4371 Melek Gülşah Şahin 0000-0001-5139-9777
Publication Date	March 31, 2024
Submission Date	January 31, 2024
Acceptance Date	March 25, 2024
Published in Issue	Year 2024 Volume: 15 Issue: 1

Cite

APA	Kocaoğlu, S., & Şahin, M. G. (2024). Investigating The Effect Of Testlets Consisting Of Open-Ended And Multiple-Choice Items On Reliability Via Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 15(1), 65-78. https://doi.org/10.21031/epod.1429423

Download Cover Image

Article Files

Full Text